Select source and target catalog sources and schemas to link and generate lineage.
Generate automated lineage with CLAIRE or define rules to use name-based matching or construct an inclusion rule with expressions. Save and run the configuration to start a lineage generation job.
Step 1. Register general information
Provide general information about the configuration on the Registration tab.
1In Metadata Command Center, go to the Configure page.
2Select the Lineage tab and then select the Link Catalog Sources (Preview) tab.
3Click the Add icon.
The Link Catalog Sources page appears.
The following image shows the Registration tab of the Link Catalog Sources page:
4On the General Information area, enter a name and an optional description for the configuration.
5Click Next.
The Configuration tab appears.
Step 2. Configure source and target catalog sources
Select source and target catalog sources and schemas on the Configuration tab.
1In the Source Catalog Source area of the Configuration tab, select a source catalog source from which you want to link and generate lineage.
The following image shows the Configuration tab of the Link Catalog Sources page:
The Select Catalog Source dialog box appears.
The following image shows the Select Catalog Source dialog box:
2Choose a source catalog source. The overview and related assets of the catalog source appear on the preview pane.
You can filter the list based on the catalog source type and name.
The following image shows a selected source catalog source on the Select Catalog Source dialog box:
3Click Select to select the source catalog source.
4 Select a schema of the source catalog source.
The following image shows a selected schema of the source catalog source on the Select Schema dialog box:
5Click Select to select the schema.
6In the Target Catalog Source area, select a target catalog source and schema to which you want to link and generate lineage.
The following image shows selected source and target catalog sources and schemas on the Configuration tab:
7Click Next.
The Rule Definition tab appears.
Step 3. Perform rule-based or automated linking, save, and run the configuration
Generate automated lineage with CLAIRE or define rules to use name-based matching or construct an inclusion rule with expressions on the Linking Method tab.
1On the Linking Method tab, choose to either generate automated lineage with CLAIRE or define rules to generate catalog source links between assets of the source and target catalog sources.
The following image shows the Linking Method tab of the Link Catalog Sources page:
2To refresh catalog source links whenever the source or target catalog source job is run, click Refresh Lineage.
3Choose the linking method.
- Rule-based Linking. Define rules to use name-based matching or construct an inclusion rule with expressions.
- Automated Linking. Generate lineage automatically with CLAIRE.
4If you choose the Automated Linking option, you can either automatically accept CLAIRE-generated lineage recommendations or manually accept them.
The following image shows the Linking Method tab with the Automated Linking option selected:
The following table describes the properties that you can enter for automated linking:
Property
Description
Enable auto-acceptance
Select to automatically accept CLAIRE-generated lineage recommendations.
If disabled, you must manually accept the lineage recommendations.
Confidence Score Threshold for Auto-Acceptance
If you enable auto-acceptance, specify a threshold limit based on which the CLAIRE-generated lineage recommendations are automatically accepted.
Specify a percentage from 80 to 100. If the confidence score of the catalog source links generated between a source and target asset is higher than the configured threshold limit, the recommended links are automatically accepted. Default is 95%.
Stakeholders of the source and target catalog sources can reject the auto-accepted and manually accepted catalog source links generated by CLAIRE in Data Governance and Catalog.
5If you choose the Rule-based Linking option, choose the rule type.
- Name Matching. Ignores specified prefixes and suffixes of an asset name and matches the rest of the asset name to generate catalog source links.
- Expression. Constructs an inclusion rule using expressions. Use a combination of attributes, operators, functions, or comments to build an inclusion rule.
6If you choose the Name Matching rule type, select the asset types to specify prefix and suffix strings to ignore.
The following image shows the Linking Method tab with the Name Matching rule type selected:
The following table describes the properties that you can enter for name matching:
Property
Description
Source Data Set - Ignore Prefix
Specify the prefix of source data set names to ignore and match the rest of the source data set names with target data set names.
Source Data Set - Ignore Suffix
Specify the suffix of source data set names to ignore and match the rest of the source data set names with target data set names.
Target Data Set - Ignore Prefix
Specify the prefix of target data set names to ignore and match the rest of the target data set names with source data set names.
Target Data Set - Ignore Suffix
Specify the suffix of target data set names to ignore and match the rest of the target data set names with source data set names.
Source Data Element - Ignore Prefix
Specify the prefix of source data element names to ignore and match the rest of the source data element names with target data element names.
Source Data Element - Ignore Suffix
Specify the suffix of source data element names to ignore and match the rest of the source data element names with target data element names.
Target Data Element - Ignore Prefix
Specify the prefix of target data element names to ignore and match the rest of the target data element names with source data element names.
Target Data Element - Ignore Suffix
Specify the suffix of target data element names to ignore and match the rest of the target data element names with source data element names.
Prefixes and suffixes that you specify can contain alphanumeric characters, underscore (_), and hyphen (-).
Examples:
- To match the source data set, "STG_CUSTOMER", with the target data set, "CUSTOMER", specify "STG_" in the Ignore Prefix field for the source data set.
- To match the target data set, "TMP_ACCOUNT_STG", with the source data set, "ACCOUNT", specify "TMP_" in the Ignore Prefix and "_STG" in the Ignore Suffix fields for the target data set.
- To match the source data element, "CUSTOMER_LND", with the target data element, "CUSTOMER", specify "_LND" in the Ignore Suffix field for the source data element.
- To match the target data element, "TMP_CUSTOMER_LND", with the source data element, "CUSTOMER", specify "TMP_" in the Ignore Prefix and "_LND" in the Ignore Suffix fields for the target data element.
Note: If you don't select an asset type, you can't enter a prefix or suffix. In such cases, the lineage generation job searches for and matches exact source and target asset names.
7If you choose the Expression rule type, construct an inclusion rule using expressions.
The following image shows the Linking Method tab with the Expression rule type selected:
You can use a combination of attributes, operators, functions, and comments to define an inclusion rule. You can type your expressions directly and view autocompleted suggestions as you type your expression in the editor. Expressions are created using a Spark SQL-based language. Expression values cannot exceed 5000 characters.
You can use the following components to construct an inclusion rule:
- Attributes: Attributes can be source data set, source data element, target data set, and target data element values that you obtain from the catalog. Values are case-sensitive.
Example: srcDataElement.name == 'email'
- Operators: Use operators to compare values of columns. For example, you can use an equality operator to check if the names of two columns are the same.
- Functions: Use functions to calculate values and manipulate data. For example, a function can be changing a name to all upper case or lower case using the upper or lowerfunctions.
Other supported functions include, but are not limited to:
▪ replace
▪ regexp_replace
▪ regexp_match
▪ substring
▪ length
Example: replace('EMPLoyee', 'oyee', 'OYEE')
- Comments: Use comments to summarize the constructed inclusion rule.
Example: /* source data element is changed to lowercase */
Example of a valid inclusion rule:
srcDataElement.name == tgtDataElement.name and srcDataSet.name == tgtDataSet.name
/* The source data element name must be the same as the target data element name, and the source data set name must be the same as the target data set name. */
Important: Construct expressions with both data sets and data elements to avoid generating unnecessary catalog source links.
8Click Validate to validate your expression.
If the validation is successful, a success message appears.
9To save and run the configuration, click Save and then Run.
A Lineage Generation job is created to link catalog sources and to generate catalog source links. Check the status of the job on the Monitor page.