The performance of the WizardLM-2 model is comparable to that of GPT-4.

Although it has been removed, the performance of the WizardLM-2 model seems to have reached the industry-leading level, on par with GPT-4. Just last Friday, the tech world witnessed another notable event, as Meta launched its open-source large-scale model, Llama 3. Thanks to its remarkable performance, it has become a hot topic of discussion. Before the buzz could subside, Microsoft, not to be outdone, launched its latest open-source model, WizardLM-2.

The dramatic scene unfolded soon after. Within hours of its release, Microsoft urgently pulled down the WizardLM-2 model. The reason was a mix of ridiculous and amusing oversight: Microsoft forgot to perform a crucial toxicity test before releasing the model. It is understood that this large-scale model was released last Monday and comes in three sizes: 8x22B, 70B, and 7B, catering to demands ranging from large to small. Among them, the flagship version 8x22B boasts an impressive 141 billion parameters, making it one of the most powerful models in the open-source community.

This unexpected removal has left many netizens and developers perplexed. In response, Microsoft developers posted a statement in community forums explaining in detail the reasons for the model’s takedown and expressing their deep apologies. They admitted that there was an oversight in the model’s release process and the critical step of toxicity testing was not completed. To prove to the community members that they would resolve the issue quickly, the development team promised to complete all necessary testing before the model goes live again.

According to insiders, the development team supporting WizardLM-2 is based in Beijing. To dispel any doubts about the model’s removal, they clarified, “The model was removed due to the omission of a critical testing step, and not as an intentional evasion of any review process.”

Toxicity in large language models refers to the potential risk of producing harmful or inappropriate content. The discovery of such “toxic” content in a model can not only affect its performance but may also draw widespread attention in a global climate of cautiousness towards AI technology, and could even trigger government investigations. Undoubtedly, no company wants such a disastrous event to occur.

Therefore, all related files of the WizardLM-2 model have been removed from GitHub and Hugging Face platforms. Currently, attempts to access these pages result in a 404 error message. Interestingly, although the model has been withdrawn, a number of people had already downloaded the model weights before the repo (code repository) was removed under the guidance of the Apache 2.0 license. Additionally, some attentive users backed up and shared the download links for the model on the Hacker News forum.

Even before the model was taken down, some users had already begun to preliminarily evaluate the performance of WizardLM-2 model in other benchmark tests.

WizardLM-2 is a command-based advanced artificial intelligence model, nurtured by Meta’s cutting-edge technology Llama. The model adopts an innovative Mixture of Experts (MoE) architecture and is equipped with a fully automated comprehensive training system. This enables it to excel in processing multilingual dialogues and performing complex reasoning tasks, particularly with its ability to generate precise responses based on context in a wide array of fields, including writing, coding, and mathematics.

The latest version of WizardLM-2 is built on the Mistral AI’s Mixtral 8x22B model and has been further fine-tuned with synthetic data. This model family includes three scales of models, including the 8x22B, 70B, and 7B versions, showing remarkable performance. Among all the models, the WizardLM-2 8x22B belongs to the highest level of product, with performance slightly inferior only to GPT-4-1106-preview. However, the 70B version achieves the highest performance at a comparable size, while the 7B version shows speeds faster than leading models nearly ten times its parameter scale.

Microsoft has also indicated that due to the gradual insufficiency of human-generated data for large language model training, the use of data created by artificial intelligence and models supervised by AI is becoming an essential path for developing more powerful AIs. In view of this, WizardLM-2 utilizes a fully automated synthetic training system to enhance its performance.

In tests on the comprehensive evaluation platform MT-Bench, WizardLM-2 demonstrated strong performance, capable of rivaling the market’s leading proprietary large models, whether in conversational AI enhancement or in supporting complex decision-making processes in enterprise environments. Comparison results on the platform showed that WizardLM-2, whether its high-end 8x22B or the smaller 7B and 70B models, all displayed outstanding competitiveness against models such as GPT-4-1106-preview, Command R Plus, Mistral Large, Qwen 1.5, and Straling LM 7B.

In recent years, Microsoft‘s pace of development has accelerated rapidly, becoming a focus of industry attention. Although the CEO of Hugging Face, Clément Delangue, expressed disappointment over the deletion of the WizardLM open-source model and stressed its profound impact on their platform, they are currently seeking a solution with Microsoft to continue meeting the needs of the broad community. In this incident, Microsoft’s commitment to responsible AI practices has come under scrutiny.

Although Microsoft had been criticized for poor product releases and lack of innovation, even being seen as representative of talent drain and stagnation, under the leadership of Satya Nadella, the company has undergone a massive transformation. Over the past decade, Microsoft’s share price has increased by more than 1000%, and in January of this year, the company’s market value briefly reached an astonishing 3 trillion US dollars, surpassing the total GDP of France. Microsoft’s revival has been founded on deep cultivation in the field of artificial intelligence.

Microsoft has integrated artificial intelligence technology into the Azure cloud computing platform, the Office productivity suite, and the Bing search engine. The key to this strategic shift lies in the company’s investment in OpenAI, a partnership that began in 2017 when, as a much-hyped startup, OpenAI spent one-quarter of its total expenditure on cloud computing, which deepened its relationship with Microsoft.

By 2019, Microsoft further became OpenAI’s “exclusive” cloud computing provider and, through a new billion-dollar investment, became the preferred partner for OpenAI’s commercialization. Microsoft quickly integrated OpenAI’s Large Language Models (LLM) into its Azure cloud services, which are now used by many businesses and developers for a wide range of applications, from chatbots to content generation, translation, and even personalized marketing.

It is reported that in the second quarter of this year, the number of Azure OpenAI users has grown by 50% compared to the previous 12 months. Nadella announced that there are now over 53,000 users utilizing these services, including more than half of the Fortune 500 companies. Clearly, OpenAI has played a vital role in the revival of Microsoft’s business empire.

Microsoft has revived itself with the help of OpenAI, but maintaining its leadership position in the industry through its own strength and remaining competitive is undoubtedly a huge challenge, especially in this rapidly changing industry.