A Weekend of Enlightenment: Rethinking Data Architecture - DataMesh
Unlocking Data Architecture's Full Potential by Decentralizing Ownership and Governance
Hello Tech Explorers! | Khoi Nguyen, Data Engineer | Exploring Tech through Personal Insights & Stories
When building a data lake, a commonly used pattern is layered data architecture (dividing data into layered zones), similar to Medallion Architecture. However, over time, as your business grows, you will face multiple issues.
The Pain Points:
Entangled Dependencies: An example layered architecture, comprises raw, standardized, and insight (business) layers. With multiple projects layered upon these tiers, interdependencies have become opaque, rendering the system cumbersome. A single discrepancy in an upper-tier table cascades into a daunting task of identifying downstream impacts.
Ambiguous Data Contracts: Unclear data agreements between layers/tables hinder seamless upgrades of critical tables, introducing unnecessary friction. For example, a mismatch in data schema or an update event can cause significant issues.
Blurred Lines of Responsibility: The lack of clear ownership for each data table among teams obscures specific responsibilities, leading to accountability gaps.
So what is data mesh ? Why it is so important?
Data mesh is a decentralized approach to data management that treats data as a product and distributes ownership across various business domains.
Key Principles of Data Mesh
Domain Ownership: Data management responsibility is distributed among various business domains, allowing teams with the most relevant expertise to govern their own data.
Data as a Product: Each domain treats its data as a product, ensuring it is discoverable, accessible, and usable by other teams within the organization.
Self-Serve Data Platform: A platform is established that allows domain teams to build, deploy, and maintain their own data products without heavy reliance on centralized IT.
Data Contracts: Each domain establishes data contracts that define the expectations and responsibilities regarding data sharing and usage among domains. These contracts help ensure clarity in data ownership, quality standards, and compliance requirements, fostering trust and collaboration between teams
Imagine a data lake infrastructure where multiple subsidiary companies coexist, leveraging shared resources such as data governance frameworks and data quality tools. Each subsidiary operates within a specific domain, producing data products tailored to their expertise. A crucial aspect of this setup is that each company is responsible for ensuring the quality of its data outputs, which are then shared with other subsidiaries under predefined data contracts.
As the datalake owner, overseeing this interconnected ecosystem is made more manageable by focusing on the governance of three primary aspects:
Product Metadata: Maintaining up-to-date information about each data product.
Data Contracts: Ensuring adherence to agreed-upon schema and freshness standards across all data exchanges.
Inter-Domain Data Movement: Monitoring the flow of data between different domains.
Conclusion:
In conclusion, traditional layered data architecture can lead to entangled dependencies, ambiguous data contracts, and blurred lines of responsibility as organizations grow. Data mesh offers a revolutionary approach to data management by decentralizing ownership, treating data as a product, and establishing clear data contracts. By adopting a data mesh architecture, organizations can foster a culture of collaboration, increase data quality, and streamline governance, ultimately unlocking the full potential of their data lake infrastructure. As data becomes an increasingly critical asset, embracing data mesh can be a strategic differentiator for forward-thinking organizations.






