We believe data lineage is a key enabler of better data transparency and data understanding in your lakehouse, surfacing the relationships between data, jobs, and consumers, and helping organizations move toward proactive data management practices. The lakehouse provides a pragmatic data management architecture that substantially simplifies enterprise data infrastructure and accelerates innovation by unifying your data warehousing and AI use cases on a single platform. Effortless transparency and proactive control with data lineage Data lineage helps organizations be compliant and audit-ready, thereby alleviating the operational overhead of manually creating the trails of data flows for audit reporting purposes. As a result, data traceability becomes a key requirement in order for their data architecture to meet legal regulations. Many compliance regulations, such as the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), Health Insurance Portability and Accountability Act (HIPPA), Basel Committee on Banking Supervision (BCBS) 239, and Sarbanes-Oxley Act (SOX), require organizations to have clear understanding and visibility of data flow. This significantly reduces the debugging time, saving days, or in many cases, months of manual effort. Data lineage helps data teams perform a root cause analysis of any errors in their data pipelines, applications, dashboards, machine learning models, etc. You can have all the checks and balances in place, but something will eventually break. Finally, data stewards can see which data sets are no longer accessed or have become obsolete to retire unnecessary data and ensure data quality for end business users. Data lineage also empowers data consumers such as data scientists, data engineers and data analysts to be context-aware as they perform analyses, resulting in better quality outcomes. Data lineage is a powerful tool that enables data leaders to drive better transparency and understanding of data in their organizations. ![]() Organizations deal with an influx of data from multiple sources, and building a better understanding of the context around data is paramount to ensure the trustworthiness of the data. Lineage also helps IT teams proactively communicate data migrations to the appropriate teams, ensuring business continuity. impacted by data changes, understand the severity of the impact, and notify the relevant stakeholders. With data lineage, data teams can see all the downstream consumers - applications, dashboards, machine learning models or data sets, etc. Impact analysisĭata goes through multiple updates or revisions over its lifecycle, and understanding the potential impact of any data changes on downstream consumers becomes important from a risk management standpoint. To understand the importance of data lineage, we have highlighted some of the common use cases we have heard from our customers below. ![]() With a data lineage solution, data teams get an end-to-end view of how data is transformed and how it flows across their data estate.Īs more and more organizations embrace a data-driven culture and set up processes and tools to democratize and scale data and AI, data lineage is becoming an essential pillar of a pragmatic data management and governance strategy. Lineage includes capturing all the relevant metadata and events associated with the data in its lifecycle, including the source of the data set, what other data sets were used to create it, who created it and when, what transformations were performed, what other data sets leverage it, and many other events and attributes. What is data lineage and why is it important?ĭata lineage describes the transformations and refinements of data from source to insight. This blog will discuss the importance of data lineage, some of the common use cases, our vision for better data transparency and data understanding with data lineage, and a sneak peek into some of the data provenance and governance features we’re building. We are excited to announce that data lineage for Unity Catalog, the unified governance solution for all data and AI assets on lakehouse, is now available in preview. Update: Data Lineage is now generally available on AWS and Azure.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |