Understanding Technical Assets > Technical asset types > Databricks
  

Databricks

Databricks combines data warehouses and data lakes into an AI-driven Databricks Lakehouse platform. Databricks Delta Lake is an open source data format and a transactional data management system on the Databricks platform. Databricks Notebooks is a web-based interface to a document that contains code that you can run, visualizations, and narrative text.
You can use the Databricks catalog source to extract metadata from both Databricks Delta Lake and Databricks Notebooks source systems. You can run connection-aware scans on Databricks sources.
You can use SQL warehouse or all-purpose clusters to extract metadata.
You can extract metadata from Databricks Unity Catalog. Additionally, you can retrieve lineage captured by Databricks Unity Catalog.
Note: Databricks Unity Catalog retains lineage data for 90 days.
Note: To improve wildcard lineage at the directory or file level for Databricks assets, perform connection assignment, and run the Databricks catalog source again. These wildcards can refer to files in Amazon S3 and Microsoft Azure Data Lake Storage Gen2.

Objects extracted

You can extract the following objects from a Databricks Notebooks source system:
You can extract the following objects from a Databricks Delta Lake source system:
You can extract the following complex data type columns along with their nested fields from Databricks Delta Lake source systems:

Data profiling for Databricks objects

Configure data profiling to run profiles on the metadata extracted from Databricks Delta Lake source systems. You can run profiles on Databricks Delta tables created in all-purpose clusters or SQL warehouse. You can also run profiles on Databricks Unity Catalog objects.
You can run profiles on the following Databricks Delta Lake objects:
You can view the profiling statistics in Data Governance and Catalog. The data profiling task runs profiles on the following data types for Databricks Delta Lake objects:
Sampling type
Determine the sample rows on which you want to run the data profiling task. You can choose one of the following sampling types for a Databricks catalog source:

Data lineage

The following lineage data is available for Databricks assets:
You can extract lineage information from the following source systems:
You can extract lineage information from the following technologies, if available in Unity Catalog:
For more information about data lineage, see the Working With Assets help.