AWS Glue Sources > View results in Data Governance and Catalog > View data lineage
  

View data lineage

Data lineage is a visual representation of the flow of data across the systems in your organization. Lineage depicts how the data flows from the system of its origin to the system of its destination.
Data lineage views are available for technical assets in the catalog source. You can view lineage at the catalog source, data set, or data element level.
The lineage at the catalog source level shows how data flows from one catalog source to another. The lineage at the data set and the data element levels show how other technical assets such as files or tables contribute to the selected asset.
If linking catalog sources is available for your catalog source, you can use Metadata Command Center to generate data lineage based on rules or by generating automated lineage with CLAIRE. You can choose source and target catalog sources and objects to link and generate lineage.
To determine whether linking catalog sources is available for your catalog source, navigate to the Configuration tab of the Link Catalog Sources page. The catalog source must appear in the list of source and target catalog sources.
For information about linking catalog sources, see Link catalog sources.

View lineage at the catalog source level

The catalog source level shows how data flows from one catalog source to another with the lineage aggregating data from the data set and data element levels.
To view data lineage at the catalog source level, open a technical asset, click the Lineage tab, and then verify that the level is set to Catalog Source Level.

View lineage at the data set level

The data set level is a view that shows individual sets of data in the data flow.
To view lineage at the data set level, open a technical asset, click the Lineage tab, and then verify that the level is set to Data Set Level.
The following image shows how the redshift-job-hawk job instance uses data from the redshifthawk referenced data set to generate output to the target redshift and cust_new_table_redshift referenced data sets before connection assignment:
The data set level lineage diagram shows how the redshift-job-hawk job instance uses data from the redshifthawk referenced data set to generate output to the target redshift and cust_new_table_redshift referenced data sets.
The following image shows how the redshift-job-hawk job instance uses data from the redshifthawk actual data set to generate output to the target redshift and cust_new_table_redshift actual data sets after connection assignment:
The data set level lineage diagram shows how the redshift-job-hawk job instance uses data from the redshifthawk referenced data set to generate output to the target redshift and cust_new_table_redshift referenced data sets.

View lineage at the data element level

The lineage at the data set level and the data element level shows how technical assets such as files and commands contribute to the selected asset.
Data sets are technical assets that contain sets of data. Examples include files, databases, or temp files that hold the results of calculations. Data elements are objects upstream or downstream of a data set, and are accessible when you expand a data set to the data element level. For example, a table is a data set, and a column in a source object is a data element.

View lineage at the data set level

The data set level is a view that shows individual sets of data in the data flow. To view lineage at the data set level, open a technical asset, click the Lineage tab, and then verify that the level is set to Data Set Level.
The following image shows how the redshift-job-hawk job instance uses data from the redshifthawk referenced data set to generate output to the target redshift and cust_new_table_redshift referenced data sets before connection assignment:
The data set level lineage diagram shows how the redshift-job-hawk job instance uses data from the redshifthawk referenced data set to generate output to the target redshift and cust_new_table_redshift referenced data sets.
The following image shows how the redshift-job-hawk job instance uses data from the redshifthawk actual data set to generate output to the target redshift and cust_new_table_redshift actual data sets after connection assignment:
The data set level lineage diagram shows how the redshift-job-hawk job instance uses data from the redshifthawk referenced data set to generate output to the target redshift and cust_new_table_redshift referenced data sets.

View lineage at the data element level

The data element level displays details of the data set level. At the data element level, you can see the input sources for expressions or commands and calculations or transformations on the data. To view data lineage at the data element level, open a technical asset, click the Lineage tab, and then verify that the level is set to Data Element Level.
The following image shows the lineage where the cust_city referenced data elements of the cust_new_table_redshift and redshift referenced data sets get processed data before connection assignment:
The data element level lineage diagram shows the lineage where the cust_city referenced data elements of the cust_new_table_redshift and redshift referenced data sets get data from the cust_city referenced data element of the redshifthawk_redshift referenced data set using the cust_city#1 and cust_city#2 calculations of the redshift-job-hawk job instance.
The source data from the cust_city referenced data element of the redshifthawk_redshift referenced data set is processed using the cust_city#1 and cust_city#2 calculations of the redshift-job-hawk job instance.
The following image shows the lineage where the cust_city actual data elements of the cust_new_table_redshift and redshift actual data sets get processed data after connection assignment:
The data element level lineage diagram shows the lineage where the cust_city referenced data elements of the cust_new_table_redshift and redshift referenced data sets get data from the cust_city referenced data element of the redshifthawk_redshift referenced data set using the cust_city#1 and cust_city#2 calculations of the redshift-job-hawk job instance.
The source data from the cust_city actual data element of the redshifthawk_redshift actual data set is processed using the cust_city#1 and cust_city#2 calculations of the redshift-job-hawk job instance.