What is DataOps?
Introduction
Data has emerged as an important asset that lies at the heart of all organizations. Data touches all significant initiatives such as digital transformation, and the adoption of analytics, machine learning and AI. Organizations who are able to tame, manage, and unlock their data assets stand to benefit in a myriad of ways including improvements to decision-making and operational efficiency, fraud prediction and prevention, better risk management and control, and more. In addition, as many companies have found, data products and services can often lead to new or additional revenue.
As companies start depending on data to power essential products and services, they begin investing in tools and processes to manage essential operations and services. In this post we describe these tools as well as the community of practitioners using them. One sign of growing maturing of these tools and practices is that a community of engineers and developers are beginning to coalesce around the term DataOps ("data operations").
Based on our conversations with some members of this nascent community, there are a few key activities associated with DataOps: automation, monitoring and incident response. In brief, DataOps is composed of tools and processes for monitoring and automating tasks and software that raise the efficiency of operations in support of all data products and services. DataOps tools and processes allow organizations to deliver data products and services quickly, reliably, and efficiently.
The need for DataOps
More than a decade after the rise of big data management systems the amount of data that companies need to collect, manage, and unlock keeps growing. Not only has data volume continued to grow, the number of data sources has exploded. The emergence of cloud computing, SaaS, mobile computing, and sensors have made operational tasks pertaining to data assets much more challenging. The types of data companies are collecting have also expanded. Machine learning tools have made it possible for companies to unlock unstructured data by incorporating new techniques from computer vision, language models, and speech technologies.
Companies are under increasing competitive pressure to use data and machine learning to modernize their operations and decision-making to gain a competitive advantage in the market. This means adopting tools that expand the pool of workers–beyond developers, engineers, and data scientists–who use data on a regular basis. Frontline workers, analysts, managers, and executives all need to incorporate data in their decision-making and operations. To raise the productivity of workers who use data, companies will need to adopt tools such as feature stores and data catalogs that facilitate collaboration, discovery and reuse.
Not only do more workers and services depend on data, these new users expect a certain amount of reliability and freshness (near real-time updates in certain scenarios) in their data assets. As more people come to rely on and use data, companies need to adopt technologies and processes that ensure all critical data pipelines and infrastructure are actively being monitored and managed. Failures are inevitable in a world of complex systems. The best companies have tools and processes in place that minimize their mean time to recovery from failures.
All these challenges are occurring at a time when there is increasing awareness around issues related to data privacy and security on the part of regulators and users. Landmark privacy regulations in many jurisdictions have forced companies to improve their tools not only for data security and privacy, but also for data retention and governance. Data teams are also increasingly under pressure to account for important items that fall under the umbrella of Responsible AI (besides security and privacy, this includes things like fairness and transparency).
The DataOps Landscape
In this section we organize the ecosystem of DataOps tools. First we distinguish two major areas that comprise the logicalarchitecture of data products and services. The data plane moves data between systems, transforms data, trains machine learning models and creates artifacts used in data products and services. The control plane monitors the data plane and initiates a response to an event or a failure.
While DataOps resides within the control plane, not all control plane elements are DataOps. For example, there are elements of Infrastructure Ops in figure 2, that include ITOps and other operational areas that are not focused on data or analytics. The Data Plane includes data from operational systems like CRM, ERP, and customer-facing websites. These are source systems that supply data into data warehouses, data lakes, and lakehouses. DataOps includes a Metadata stack that creates a map of organizational data flow, tracks the flow of the data, and enforces access and data quality.
Another layer consists of Development tools used to manage complex processes that may include multiple experiments and iterations, as well as collaboration that cuts across teams and units. The MLOps and DevOps for data systems reside in this layer. MLOps includes model management and operations for the entire model lifecycle, from data preparation, training, and inference. DevOps are a set of practices that combines software development and IT operations and aims to shorten the development life cycle and provide continuous delivery with high software quality.
The Data Product & Service layer are tools used to track business KPIs and machine learning products and services. MLOps is a new set of tools and processes that includes monitoring and managing models in production to identify data drifts, model accuracy decline, and adversarial attacks. A related set of tools and processes which we refer to as "BusinessOps" focus on key metrics and such as revenue, cost, and other business KPIs. Examples of BusinessOps capabilities include anomaly detection, forecasting, and root cause analysis.
Our final diagram displays a partial list of companies providing solutions in the areas we listed above. It describes a physical architecture and lists a representative sample of companies in each major category. We previously described the logical architecture of data products and services as being composed of a logical plane and control place. We note that there are companies that provide elements that reside in both the control and data planes. For example, some ETL companies provide tools for moving, loading and transforming data (ELT), and also supply accompanying tools for ELT monitoring and observability.
Closing Thoughts
In this post we describe a new set of tools and processes that are aimed at helping companies manage and control their data assets and infrastructure. DataOps are tools and processes used to automate and monitor everything that supports an organization's data products and services. We aren't alone in using this term: there are startups and data engineers who are beginning to coalesce around DataOps.
We attempted to offer detailed formal structure, while also highlighting a group of companies and projects in this exciting new area. Any "Ops"–including DataOps–are prime areas for engineers, entrepreneurs, and investors. What makes DataOps particularly exciting is that data is used everywhere and the opportunities to help organizations with DataOps cuts across domains, users, and systems.
Assaf Araki is an investment manager at Intel Capital. His contributions to this post are his personal opinion and do not represent the opinion of the Intel Corporation. Intel Capital is an investor in Anodot, Verta.ai, Immuta, and Hypersonix. #IamIntel
Ben Lorica is co-chair of the Ray Summit, chair of the NLP Summit, and principal at Gradient Flow.