NIO’s Artificial Intelligence Configuration and Exploration of AI Large Model Application Scenarios

2024-05-14 10:41:34

As the battle of new energy vehicles enters the “intelligent” second half, major automakers are vying to enter the emerging field of artificial intelligence (AI) large models. NIO has recently launched its self-developed NOMI GPT multimodal large model, aiming to lead China’s high-end pure electric vehicle market. How has NIO positioned itself in the field

As the battle of new energy vehicles enters the “intelligent” second half, major automakers are vying to enter the emerging field of artificial intelligence (AI) large models. NIO has recently launched its self-developed NOMI GPT multimodal large model, aiming to lead China’s high-end pure electric vehicle market. How has NIO positioned itself in the field of artificial intelligence, and how are AI large models being implemented across various scenarios? Facing the challenges in the research and development process, how will NIO delve deep into each subdivision? At the “Artificial Intelligence X Financial Technology Innovation Conference,” Pan Pengju, the expert in digital product algorithms and deputy director at NIO, addressed these questions and discussed the difficult issues in the development of AI large model architectures.

This sharing session focused on NIO’s business introduction and the layout of artificial intelligence large model applications, as well as NIO’s practical experience from the perspectives of AI algorithms and large model applications. NIO’s business encompasses products, services, community, and digitalization, with its core product being smart electric vehicles, particularly their significant investment in autonomous driving highlighting NIO’s core competitiveness.

NIO is committed to creating a user-centered ecosystem by providing high-quality services and innovative solutions to build a lively community. As of March 2024, NIO has already established numerous battery swap stations and charging stations, offering convenient energy replenishment for users. To support the entire system, digitalization and intelligence remain the core driving forces.

In artificial intelligence applications, we often talk about the “three essentials”: data, algorithms, and computing power. However, beyond these, we must also focus on “scenarios,” which are key to determining whether AI can truly serve the company. Choosing the wrong scenarios could have a massive negative impact on the business.

Engineering is one of the key elements to the application of artificial intelligence. Even if AI algorithms perform well, they cannot be practically deployed if the response time fails to meet business needs. For example, in the scenario of app personalization recommendations, the algorithm’s response time is extremely demanding. Users may not tolerate a 500 millisecond, or even a 200 millisecond, wait – every second of delay in response time can lead to user churn. Therefore, response time, as a crucial aspect of user experience, directly affects user satisfaction.

At the algorithm level, there are two paradigms: traditional small models and large models. In terms of NIO’s AI applications, we adopt a dual-drive strategy of both large and small models. We place more focus on developing large models while also paying attention to the practicality of small models. On the AI platform level, the “AI Training Framework” and “AI Inference Engine” are critical links in large model applications, requiring sufficient computing power and a robust engineering architecture for support. Many companies are finding that computing power has become a bottleneck when pushing forward with AI applications. Within the entire AI application architect, “computing power” and how to optimize its “engineering” deployment are core problems that must be addressed.

Whether it’s the scene, data, algorithm, computing power, or engineering, the “five elements” of artificial intelligence are essentially for the optimization around users and services. At NIO, from the product perspective, we are dedicated to developing Autonomous Driving (AD) and intelligent cockpits; from the user engagement channels, we focus on the efficiency of NIO House, service centers, and delivery centers.

Regarding the deployment of large model architectures, our conception is consistent with mainstream industry practices. Our overall architecture is based on open-source large models, combined with the company’s data to develop vertical large models, including four layers: infrastructure, model, development, and application. Infrastructure provides basic tools and resources, the model layer offers foundational large models, the development layer has a rich toolkit, and the application layer forms applications through proprietary data, covering multiple dimensions such as autonomous driving, user service, community, etc.

In terms of implementation, the core remains the dual-drive of small and large models. We maintain the usage of small models and enhance the intelligent experience with large models in specific scenarios. In autonomous driving, we attempt to use historical traffic information to generate realistic simulation environments and have achieved preliminary results. Furthermore, we have also focused on developing the NOMI robot, combining large model technology with previous R&D achievements to comprehensively improve the intelligence experience and interact with customers in intelligent cockpit customer service scenarios.

NIO’s artificial intelligence applications are divided into two major segments: enterprise-facing (to B) and consumer-facing (to C). Essentially, both use artificial intelligence to empower business but differ at various levels, some providing assistance, some fully replacing human labor, and others developing into independent robotic solutions.

Artificial Intelligence (AI) is gradually transforming various industries. They start from performing simple tasks and evolve to handle more complex affairs of intelligent systems. This evolution is reflected in the AI design across different scenarios, and the design concepts in the business (to B) and consumer (to C) markets also differ. In the practical application of AI, we use different cases to reflect its utility.

Case study one: Time-shifted Charging

Take an energy solution as an example, where we discuss a challenge a company faces. In providing battery swap services, swap stations need to charge batteries in advance. Charging costs vary at different times, especially during peak demand periods when electricity costs are relatively higher. If users charge at peak times, it directly affects overall electricity expenses. Imagine a scenario with over two thousand swap stations, each with ten batteries, meaning a total of more than twenty thousand batteries need to be charged. If every battery charges only one degree of electricity per day, the cost is over twenty thousand Yuan, and in reality, the cost far exceeds this number. In this case, how to use algorithms to minimize daily charging costs becomes the problem we need to address.

We launched the “Time-shifted Charging” project. “Time-shifted” means the stations choose to charge when electricity rates are lower and provide battery swaps to users when rates are higher. The first step of the project is to predict user battery swap demand. If no users come to charge, swap stations can reduce the amount of charging to save on costs. To ensure accurate predictions, it is necessary to precisely anticipate the times users need to swap batteries, so decisions about when and how much to charge can be made. This process involves many constraints and must be done on the basis of fulfilling user experience.

This strategy shares a similar logic with intelligent driving:

  • Firstly, perceive user demand through time series prediction.
  • Secondly, use operations research optimization algorithms to calculate the amount of rechargeable energy to maximize profits.
  • Lastly, execute command distribution, decision-making, and strategy.

In order to avoid potential issues throughout the entire business process, it is also necessary to achieve closed-loop management of the process through strategic design. We have tested many time series prediction algorithms and compared their effects. Overall, deep learning algorithms have better predictive capabilities than traditional algorithms or tree models, and the effects of different deep learning models are fairly similar.

After determining demand prediction, how we make decisions becomes key. We use different optimization algorithms to calculate the maximum profit per unit of charging and continuously optimize the model based on this. Our work focuses on reducing charging during peak periods, thus lowering costs.

The industry is currently exploring whether AI in the future needs to be combined with energy. Through this case, we can see that energy has a substantial impact on AI and various business aspects. For our company, this algorithmic strategy could potentially save tens of millions in costs and lead to significant gains.

With the continuous advancement of artificial intelligence (AI) technology, we are seeing a strong trend in development: AI applications are gradually moving towards the field of energy saving. In doing so, AI in the future will not only provide intelligent services but will also be more economical and efficient in energy consumption.

Intelligent O&M Case Study

In real life, as opposed to intelligent marketing, intelligent operation and maintenance (O&M) is one of the challenges we often face. For example, with the frequent use of charging guns, they inevitably age. This can lead to users experiencing issues such as disconnections or inability to charge during subsequent use. To address these problems, continuous monitoring of each device and component in battery swap stations and charging piles is required to promptly identify any charging guns that may have performance degradation, and thus carry out timely maintenance or replacement to improve user experience.

Traditional monitoring methods usually include the collection of environmental data such as temperature and humidity around the equipment to estimate its performance and thus determine if a charging gun’s charging function is abnormal. We analyze the temperature rise coefficient of the charging gun through physical models and physical signals such as current, voltage, and temperature, to perform fault diagnosis. Although initial physical models offer some accuracy, the distinction between normal and abnormal signals is not very clear.

Taking into account the above problems, we have iterated and upgraded our monitoring model, adopting a fault detection method combining the Conceptor-AI algorithm with physical models. This strategy, which combines human experience and machine intelligence, effectively reduces the number of false alarms, cutting the false alarm rate to 20% and improving the accuracy by 10 percentage points.

Personalized Recommendation Application Case

The third case is similar to problems encountered in the APPs of many internet companies. For example, in our APP, besides providing car sales information, it also encompasses diverse contents such as automotive news, product sales, and charging map services. The content recommendations in different sections of the APP are diverse, with each type of content having different business goals—for instance, the news section aims for click-through rates, while the objective for the products section is the gross merchandise volume (GMV).

Having numerous distinct scenes within an APP, each with specific targets, poses a challenge to the algorithm’s implementation. In the end, we only did two things. First, we abstracted the system architecture of personalized recommendation to create a system that is applicable to different business needs. Uniquely, we combined the search and recommendation functions, rather than just having a separate recommendation system. Second, we optimized and improved the algorithm’s target system around the business context, attempting to answer a core question: Is it possible to solve problems across all business scenarios with just a small amount of simple data and one or two core algorithms? This is currently the solution we are actively exploring.

After achieving data commonality, in terms of algorithms, we have adopted one of the more promising methods—Mixture of Experts (MoE). This method allows the use of different expert networks in various business environments to learn specific weights, and in conjunction with the application layer, revolves around the final business objective to output corresponding business indicators, aiming to minimize overall maintenance workload.

So why did we choose this approach? An important reason is that traditional scenario-based development methods rely too heavily on business team members’ deep understanding of the business domain—such as constructing characteristics of specific business scenarios. However, as the number of projects increased, we found that simplified data usage combined with complex models could more effectively improve the efficiency of recommendation systems, thus accelerating iteration processes and enhancing outcome production. This reflects our innovative thinking and practice in the field of APP personalized recommendations.

In the field of large-model applications, NIO’s exploration can be divided into four major sections: Knowledge Insight, Content Generation, Copilot (Intelligent Assistant), and Agent (Digital Proxy). Among these sections, what I particularly want to emphasize is the Knowledge Insight phase.

Knowledge Insight includes many aspects. The rich data generated from user interactions in the past has not been fully utilized—for example, the data generated at various touchpoints during voice interactions. Traditional data mining methods often rely on designing numerous tags for classification work, but this approach is inefficient. Although we have achieved some results in structured data mining, there is still a need for stronger efforts in unstructured data mining. For this reason, we have invested significant energy into the analysis of unstructured data.

In terms of the large model itself, we have primarily utilized its two core capabilities: one is outstanding comprehension; the other is powerful generative ability. We made extensive attempts in both areas, enhancing efficiency through the generative capacity of AIGC. Indeed, from the perspective of content generation, the model’s illusion effect is not entirely bad—in creative settings, this illusion could even play a positive role. Merely pursuing moderate content fails to stimulate work inspiration. However, in the aspect of comprehension, we still need to find ways to prevent misunderstandings caused by illusions.

In different business scenarios, large models need to address a variety of issues. For Copilot and Agent, we focused primarily on iterating and upgrading intelligent customer service. In Copilot’s application, we tried numerous innovations, especially in the field of user service, utilizing Copilot to distribute and retrieve a wealth of knowledge. While this has led to some internal work efficiency improvements, the results did not meet our expectations. In the case of Agent, we completely reinvented the intelligent customer service system using the large language model paradigm. Although some effectiveness has been seen, we still face many challenges.

Case Sharing: In a classic case of large model application, we tried to tag the content quality on the knowledge insight level based on the large model. Previously, tagging content quality was a simple task, typically based on the quantity of images, the quantity of text, and the diversity of themes in the content. Nowadays, the requirements go further, considering not only the richness of topics but also qualitative standards such as the aesthetic quality of images.

In today’s era of rapid technological advancements, demands for content quality improvement are constantly rising. This necessitates that we enrich content with more semantic information, where traditional methods require extensive human labor to annotate data to identify quality dimensions in images. However, with modern large models, we have entered a new working paradigm. Here, we share two key points:

Firstly, large models have achieved remarkable results in efficiency improvement. We only need to express our needs to the large model, such as “Help me summarize the number of topics in the content, assess image quality,” and it can perform initial understanding and assist us.

Secondly, a brand-new annotation process based on the large model has emerged. Before this, whether applying large model algorithms or traditional algorithms, the process involved producing samples according to business needs. But with the large model, the collaboration mode has changed. Business personnel can now directly write Prompts (instructions sent to the large model) and adjust if the results are not as expected. This cooperative mode brings two major advantages:

Advantage one, it greatly increases the business department’s trust in algorithms. Previously, business departments might be skeptical about algorithmic decisions, seeing algorithms as black boxes, and at a loss to tune them. Now, the power of logic adjustment is directly handed over to the business departments, the algorithms transitioned from being opaque to transparent, and the business departments’ control over algorithms is strengthened.

Advantage Two, the internal work efficiency has been significantly improved. With the prompts provided by the business team, our task is to fine-tune the large model using the new sample set, and then deploy it online. Algorithm workers can concentrate on the necessary algorithm adjustments, while the business department can deeply understand and optimize the actual performance of the algorithm. Whether it’s improving accuracy, recall rate, or shortening the development cycle, large models have made significant progress in the aspect of labeling.

The reason for the above example is that we see many business scenarios highly similar to annotation work. For example, the core task of a smart customer service system is to understand the customer’s intent. If we consider the customer’s intent as a label, smart customer service is essentially engaged in annotation work. Therefore, the annotation paradigm based on large language models has the potential for application in various business scenarios. In the future, we will be committed to productizing this process.

In the field of creative material generation, despite numerous attempts, there still exists a gap between rapid development and slow deployment in practical applications, which brings a series of inherent problems. When talking about the application value of large artificial intelligence models in the consumer market (to C), it needs to be considered from a long-term perspective. In the short term, the value of large models may not be as huge as people imagine.

For large artificial intelligence models (AIGC) to be truly promoted in the consumer market, they face four major challenges:

  • The first is regulatory issues. When AIGC is facing consumers and is applied on a large scale, it must go through regulatory filings, which is a time-consuming process, and related licenses are very rare, which constitutes a difficult challenge that must be faced.
  • The second is the so-called “illusion” problem that large models often bring. Sometimes it may be helpful, but other times we need to solve it. How the large model technology and business applications coexist, or how to avoid problems caused by illusions, remains an important subject that needs to be deeply discussed.
  • The third is the challenge of computing power. Due to the scarcity of computing power resources, it is a huge challenge for each company to independently develop large models. Not only are hardware resources limited, but data acquisition is also very difficult. Even with supervised fine-tuning (SFT), a lot of time and effort is needed.
  • Finally, performance issues are also a major difficulty encountered in the application of current AI large models. If the performance of the large model is insufficient or the response time is too long, its use scenarios will be severely limited. Although there are various solutions in the industry, they themselves have many flaws and need to be optimized and improved for the overall framework, and the performance requirements in different application scenarios also vary, so continuous tuning is needed.