Generative AI drives data weaving to become mainstream data management

2024-09-30 17:27:20

In today’s era marked by generative artificial intelligence (AI), data has become a vital industrial resource for driving the digital transformation of businesses. Against this backdrop, businesses face numerous challenges in data management and utilization: the growing scale of data, diversified data types, widespread data silo issues, complex data governance processes, and the time-consuming nature

In today’s era marked by generative artificial intelligence (AI), data has become a vital industrial resource for driving the digital transformation of businesses. Against this backdrop, businesses face numerous challenges in data management and utilization: the growing scale of data, diversified data types, widespread data silo issues, complex data governance processes, and the time-consuming nature of data acquisition. These challenges often lead to suboptimal outcomes for AI projects, resulting in delays or increased costs. Research data shows that 60% of business data in enterprises is considered valuable, yet only 56% of it is analyzed and utilized, and 18% of businesses report that insufficient data quality hinders the adoption of generative AI.

In addressing these challenges, Data Fabric is gradually becoming a technological solution that enterprises are focusing on. Data Fabric technology centralizes data distributed across different systems, forming a unified logical access layer for data analysis and management, while providing users with convenient data views. This allows users to query data across systems, delivering diversified and heterogeneous data to data consumers quickly, breaking down data silos.

Denodo, as one of the leading companies in the field of Data Fabric, has been dedicated to the field of logical data fabric for 25 years since its establishment in 1999. Their core product, the Denodo Platform, which is named after the company, relies on logical data fabric technology to provide enterprises with trustable, usable, and comprehensible data services, helping to improve decision-making efficiency. The scope of logical data fabric technology advocated by Denodo is extensive, including but not limited to data acquisition, processing, discovery, management, access, and intelligent querying capabilities. Particularly noteworthy is that the Denodo Platform emphasizes data virtualization as a core competence, a technology that is also the expertise of Denodo’s founder Angel Viña.

Angel Viña, heralded as the “Father of Data Virtualization” and the “Father of Logical Data Fabric”, focused his university research on real-time data management. Forty years ago, Angel Viña encountered the challenge of fragmented and dispersed data while participating in a project related to the predictive maintenance of nuclear reactor vessels. The traditional method of establishing data warehouses was too time-consuming and did not meet the requirements for real-time data management. Consequently, Angel Viña conceived the idea of establishing a virtual layer that connects different data in real-time. This virtual layer recorded key information of the data, completing the virtualization process, which meant that data did not need to be physically copied or transferred, greatly simplifying workflows and reducing the project duration from four weeks to just one week.

The pioneering data virtualization concept proposed by Angel Viña, the Father of Data Virtualization, has laid a solid foundation for the development of logical data fabric. Especially significant was the birth of this technology 25 years ago, with the establishment of Denodo Technologies. In the current context, demanding higher real-time and cost-effective data management, the importance of this technology remains pronounced.

Compared with traditional data management technologies such as physical databases, data warehouses, data lakes, etc., these technologies have always emphasized storing data in physical systems and aggregating data from various systems into a centralized system for user access. However, the process of data handling, including storage, transformation, and updating, is filled with complexity and high costs. Logical Data Fabric adopts an innovative approach, building a logical abstraction layer to encapsulate and package data, effectively hiding the complexity of distributed data environments. With this technology, various useful information can be woven together while ensuring compliance and security in providing access to users, effectively balancing timeliness, security, and cost-effectiveness.

The trend of data management technology evolving towards AI is becoming more apparent. Angel Viña pointed out in an interview that a core trend in current data management is automating the data management process. From data collection to migration to data visualization, traditional data management relies heavily on manpower, financial resources, and material resources. The essence of AI application in data management is to transform manual operations into automated processes, removing the manual steps, accelerating data collection, and shortening the cycle from data acquisition to interaction.

With continuous advancements in AI technology, Denodo began integrating AI into its technologies and products four years ago. For example, the Denodo Platform 8.0, released a year ago, was equipped with AI-driven intelligent query acceleration and automated security for cloud data integration, significantly enhancing platform functionality and user experience. In optimizing data query and analysis speeds on the Denodo platform, the role of AI is significant. For instance, for tens of thousands or even millions of query requests, AI can optimize decision-making processes through learning and automated processes.

Angel Viña once stated: “We often say that in the past people searched for data, but in the future, data will proactively find people, and we do not need to worry about where the data is stored.” In the backend of Denodo, after receiving a query request, the system automatically rewrites it into an executable and optimized form. The rewriting process will adopt certain optimization strategies and incorporate AI’s automation technology into these mechanisms. AI learns from past experiences and generates appropriate optimization strategies for different executors (i.e., different company’s backend systems).

With the rise of generative AI technologies represented by large models, Denodo has gradually integrated this technology into its Denodo 8.0 and Denodo 9.0 products since last year. These products now offer features such as natural language querying and user suggestions, lowering the barrier to entry for data management and significantly improving user experience.

Natural Language Queries Lead the New Era of Data Management

The advent of Denodo 8.0 and Denodo 9.0 versions marks a transformation in the field of data management. In these two versions, the natural language query function has become an eye-catching new feature, especially favored by users without a technical background. In the past, users had to request information through SQL; now, with the help of artificial intelligence technology, even individuals who know nothing about SQL can query the Denodo platform using just Chinese, English, or other natural languages, greatly accelerating the data retrieval and usage process. This innovation not only makes natural language data management a reality but also lowers the threshold for data management. For example, for a question like “Who was the highest profit client in 2023?”, users no longer need to rely on professional programmers to write SQL statements, as the AI model can understand the needs and automatically generate SQL to perform immediate calculations and provide answers.

Smart Data Recommendations and Suggestions

Thanks to the support of AI big models, the Denodo platform is able to provide helpful recommendations and suggestions during the data usage process. For instance, when a company is using a specific dataset, Denodo can point out other companies that are using the same data, and provide advice regarding the rationality and legality of these use cases. In the medical field, drug developers might need to analyze tens of thousands of drug molecules when developing new drugs. If there is related research already available on the market, the Denodo platform can provide pertinent suggestions, helping researchers shorten the drug development cycle.

Providing Data Support for Large Model Development and Application

As a data management supplier, Denodo plays a vital role in the development and practical application of large models. The effectiveness of large models relies on the support of a large amount of high-quality data, and the scale, quality, security, and diversity of data are key factors in driving the progress of large models. Currently, training for large models often depends on publicly available internet data, and it is crucial and challenging to comprehensively enhance the dimensions of corporate data. At this point, Denodo can quickly gather and integrate data from multiple sources and systems, ensuring its reliability, thereby supplying the training of large models, reducing the risk of model illusions, and improving the accuracy and relevance of generated content. As large models move into the application phase, we see more and more companies starting to embrace large model technology. The approaches for companies to embrace large models vary, including training on specific data based on a generic large model, or fine-tuning based on a generic model to build an industry-specific large model in a particular field. The key lies in how to effectively combine a company’s own professional data with a generic large model.

In the current data-driven business environment, the integrated application of large data models is becoming increasingly important for enterprises. However, organically integrating enterprise data with large common models is not an easy task. Since large models are mostly built on publicly available internet data, when it comes to enterprise applications, the integration of unique data must be considered, requiring a specific data management platform to establish an intermediate layer of data sources. For example, the Denodo platform can integrate a company’s private information such as financial data and business data into external data from a large generic model through an intermediate layer.

Many enterprises place great emphasis on the security and privacy of their data; they both want to protect their sensitive information and utilize the strong capabilities of large-scale models. In this case, adopting an intermediary layer to link the two can cleverly balance the interests of both parties. In this process, Denodo helps businesses bridge the gap between internal data and large-scale models through data weaving (data virtualization) and its intermediary layer architecture, thereby constructing an efficient and trustworthy data bridge. It enables large-scale models to better understand the business meaning of enterprise data while also alleviating concerns about security and privacy.

As time progresses, industry-specific large-scale models or big industry models will become the main trend for various industries to achieve digital transformation, with industry knowledge (Know-how) forming the core competitiveness of the big industry models. This knowledge often takes time to accumulate, especially for industries like healthcare and energy that demand extensive professional knowledge. Therefore, combining efforts with data management companies that have rich industry experience will be one effective way to quickly gain industry knowledge.

In the past 25 years, Denodo has provided services for various fields including finance, insurance, manufacturing, high-technology, retail, education, healthcare, and energy. These experiences enable Denodo to connect diverse data sources, combining generic data with industry-specific data to train large-scale data models, thereby creating value deep within the industrial chain. By utilizing logical data weaving and delivering data through business language and speed, the potential value of data is unleashed, bringing new-quality productivity to enterprises and offering strong support for the high-quality development of China’s digital economy.