London is home to many tech leaders and entrepreneurs who influence the future of Technology. Among these notable individuals Rajnish is recognized for his achievements in Data, Cloud and Data products. Rajnish Singh is Cloud and Data Architect having working exposure on Data mesh and Data Product concepts. This Article will share more details about Data Mesh & Data Product concepts and why you may need a Data Mesh Architecture in your organization.
What Is Data Mesh?
The term data mesh was coined by Zhamak Dehghani in 2019 and is based on four fundamental principles Domain ownership, Data as a product, Self-serve data infrastructure platform and Federated governance that bundle well known concepts:
The domain ownership principle mandates the domain teams to take responsibility for their data. According to this principle, analytical data should be composed around domains, like the team boundaries aligning with the system’s bounded context. Following the domain-driven distributed architecture, analytical and operational data ownership is moved to the domain teams, away from the central data team.
The data as a product principle projects a product thinking philosophy onto analytical data. This principle means that there are consumers for the data beyond the domain. The domain team is responsible for satisfying the needs of other domains by providing high-quality data. Basically, domain data should be treated as any other public API.
The idea behind the self-serve data infrastructure platform is to adopt platform thinking to data infrastructure. A dedicated data platform team provides domain-agnostic functionality, tools, and systems to build, execute, and maintain interoperable data products for all domains. With its platform, the data platform team enables domain teams to seamlessly consume and create data products.
The federated governance principle achieves interoperability of all data products through standardization, which is promoted through the whole data mesh by the governance group. The main goal of federated governance is to create a data ecosystem with adherence to the organizational rules and industry regulations.
Why Enterprises May Need a Data Mesh
Many organizations have invested in a central data lake and a data team with the expectation to drive their business based on data. However, after a few initial quick wins, they notice that the central data team often becomes a bottleneck. The team cannot handle all the analytical questions of management and product owners quickly enough. This is a massive problem because making timely data-driven decisions is crucial to stay competitive.
The data team wants to answer all those questions quickly. In practice, however, they struggle because they need to spend too much time fixing broken data pipelines after operational database changes. In their little time remaining, the data team has to discover and understand the necessary domain data. For every question, they need to learn domain knowledge to give meaningful insights. Getting the required domain expertise is a daunting task.
On the other hand, organizations have also invested in domain-driven design, autonomous domain teams (also known as stream-aligned teams or product teams) and a decentralized microservice architecture. These domain teams own and know their domain, including the information needs of the business. They design, build, and run their web applications and APIs on their own. Despite knowing the domain and the relevant information needs, the domain teams must reach out to the overloaded central data team to get the necessary data-driven insights.
With the eventual growth of the organization, the situation of the domain teams and the central data team becomes worse. A way out of this is to shift the responsibility for data from the central data team to the domain teams. This is the core idea behind the data mesh concept: Domain-oriented decentralization for analytical data. A data mesh architecture enables domain teams to perform cross-domain data analysis on their own and interconnects data, like APIs in a microservice architecture.
How To Design a Data Mesh?
A data mesh architecture is a decentralized approach that enables domain teams to perform cross-domain data analysis on their own. At its core is the domain with its responsible team and its operational and analytical data. The domain team ingests operational data and builds analytical data models as data products to perform their own analysis. It may also choose to publish data products with data contracts to serve other domains’ data needs.
The domain team agrees with others on global policies, such as interoperability, security, and documentation standards in a federated governance group, so that domain teams know how to discover, understand and use data products available in the data mesh. The self-serve domain-agnostic data platform, provided by the data platform team, enables domain teams to easily build their own data products and do their own analysis effectively. An enabling team guides domain teams on how to model analytical data, use the data platform, and build and maintain interoperable data products.
What is Data Product
A data product is a logical unit that contains all components to process and store domain data for analytical or data-intensive use cases and makes them available to other teams via output ports. You can think of a module or microservice, but for analytical data.
Data products connect to sources, such as operational systems or other data products and perform data transformation. Data products serve data sets in one or many output ports. Output ports are typically structured data sets, as defined by a data contract. Some examples:
- A Big Query dataset with multiple related tables
- Parquet files in an AWS S3 bucket
- Delta files in Azure Data Lake Storage Gen2
- Messages in a Kafka topic
A data product is owned by a domain team. The team is responsible for the operations of the data product during its entire lifecycle. The team needs to continuously monitor and ensure data quality, availability, and costs.
Conclusion
Data mesh is primarily an organizational construct and fits right into the principles of team topologies . It shifts the responsibilities for data toward domain teams which are supported by a data platform team and a data enabling team. Representatives of all teams come together in a federated governance group to define the common standards.
Data mesh offers new perspectives for members of the central data team as their analytical and data engineering skills remain highly necessary. Implementation of data mesh Architecture helps in scalability of a distributed data architecture, transformed data management efficiency, superior data governance and quality, and an empowering shift in data ownership to individual domain teams.