Azure Purview Data Lineage with Databricks

1.9k Views Asked by At

I am using Azure Purview for Data Governance, and Data Lineage. We use Databricks in our Data Architecture, but there isn't any native support for capturing Data Lineage with Databricks.

I found the following links that will allow you to create custom processes in Azure Purview.

Databricks notebooks lineage in Azure Purview

Can someone let me know if there is any recent methods of achieving Data Lineage in Azure Purview with Databricks?

1

There are 1 best solutions below

0
On

Data integration and ETL tools can push lineage into Microsoft Purview at execution time. Tools such as Data Factory, Data Share, Synapse, Azure Databricks, and so on, belong to this category of data processing systems. The data processing systems reference datasets as source from different databases and storage solutions to create target datasets. The list of data processing systems currently integrated with Microsoft Purview for lineage are listed in below table.

enter image description here

Refer - https://learn.microsoft.com/en-us/azure/purview/catalog-lineage-user-guide#data-processing-systems


EDIT: July 2022 - Since this question was answered, the Microsoft Purview team released an open source solution accelerator to extract lineage from Databricks and ingest it into Microsoft Purview: A connector to ingest Azure Databricks lineage into Microsoft Purview (github.com)

This solution accelerator, together with the OpenLineage project, provides a connector that will transfer lineage metadata from Spark operations in Azure Databricks to Microsoft Purview, allowing you to see a table-level lineage graph. It supports Delta, Azure SQL, Data Lake Gen 2, and more.