Welcome to the Era of Small AI Models

2024-05-14 10:42:02

Microsoft recently announced the latest version of its lightweight artificial intelligence model, the Phi-3 Mini, which is the first of three smaller models planned for release. Specifically, the Phi-3 Mini has only 3.8 billion parameters, and its training was completed on a dataset smaller than that of colossal models like GPT-4. This model is now

Microsoft recently announced the latest version of its lightweight artificial intelligence model, the Phi-3 Mini, which is the first of three smaller models planned for release. Specifically, the Phi-3 Mini has only 3.8 billion parameters, and its training was completed on a dataset smaller than that of colossal models like GPT-4. This model is now available to users of Azure, Hugging Face, and Ollama. In the future, Microsoft will also release two other smaller AI model versions, namely Phi-3 Small (with 7 billion parameters) and Phi-3 Medium (containing 14 billion parameters).

Previously, Microsoft launched the Phi-2 model in December of the last year, which performs on par with larger models such as Llama 2. Microsoft claims that Phi-3 outperforms its predecessor and its response speed can nearly match that of models ten times its size. Eric Boyd, Vice President of Microsoft’s Azure AI platform, emphasized that Phi-3 Mini can compete with large language models (LLMs) such as GPT-3.5 in performance, with the advantage of being more compact.

Compared with large artificial intelligence models, small AI models typically have lower operating costs and perform well on personal devices such as phones and laptops. According to a report from “The Information” earlier this year, Microsoft is forming a team focused on the development of lightweight artificial intelligence models. Alongside the Phi model series, Microsoft has also developed an AI model called Orca-Math, specifically designed to solve mathematical problems.

Meanwhile, Microsoft’s competitors are also researching small-scale models that are mostly applied to simpler tasks such as document summarization or coding assistance. Google’s Gemma 2B and 7B models are very suitable for simple chatbots and language tasks. Anthropic’s Claude 3 Haiku can read and quickly summarize research papers containing charts, and Meta’s recent Llama 3 8B model can be applied in chatbot and coding assistance domains.

Boyd also revealed that Phi-3’s training was completed through a “curriculum” learning method, inspired by how children learn through simple books, bedtime stories, and discussing sentence structures of larger themes. Due to the lack of reading materials suitable for children on the market, they created a list that includes more than 3,000 vocabulary words and had large language models create children-friendly readings based on it to guide Phi. Boyd added that Phi-3 is built on the knowledge learned from previous iterations, with Phi-1 focusing on programming and Phi-2 starting to try learning reasoning, while Phi-3 performs more outstandingly in both programming and reasoning. Although the Phi-3 series models have some common sense recognition ability, they cannot surpass GPT-4 or other large language models in a broader range of applications.

Boyd points out that for many companies, small-scale models like Phi-3 are often more suitable for their specific needs. This is because many organizations have limited datasets at hand and do not require vast computational power to process information. Consequently, operating models with less computing power have an advantage in cost efficiency.

It goes without saying that this finding has spurred the pursuit of miniaturized, efficient models. For companies seeking maximum economic efficiency in AI technology, these types of models have demonstrated remarkable performance and return on investment.

Today, we have also noticed some significant trends, such as the German government migrating tens of thousands of systems from Windows to Linux, in an attempt to avoid a problem that arose twenty years ago. Additionally, the recent mass layoffs at Google have led to collective protests by the company’s veteran employees. They believe that the company leadership lacks effective management skills and that an incompetent middle management team is growing steadily. Meanwhile, a major incident resulting in the wrongful imprisonment of a hundred people due to system bugs has dealt a severe blow to a large corporation’s migration to cloud platforms, with an investment of up to 280 million yuan failing to achieve success. Finally, it’s worth noting that after Llama 3, Meta recently released an “Android” operating system for the MR field, which has attracted considerable attention in the industry.