Creating a catalog source

1On the Configuration page, select one or more capabilities that you want to perform on the catalog source.

The following table describes the capabilities of the catalog source:

Capability	Description
Incremental metadata extraction	An incremental metadata extraction extracts only the changed and new objects since the last catalog source job run. Incremental metadata extraction doesn’t remove deleted objects from the catalog and doesn’t extract metadata of code-based objects if applicable.
Serverless Runtime Environment	A serverless runtime environment is an advanced serverless deployment solution that doesn't require downloading, installing, configuring, or maintaining a Secure Agent or Secure Agent group. You can use a serverless runtime environment in the same way that you use a Secure Agent when you configure a catalog source.
Advanced Programming Language Parsing	Advanced Programming Language Parsing parses the source system code in addition to extracting objects from the source system.
Lineage Discovery	Builds the complete lineage of a catalog source by recommending endpoint catalog source objects to assign to reference catalog source connections. When you run the catalog source job, Metadata Command Center assigns the reference catalog source connections to CLAIRE recommended endpoint catalog source objects. You can then view the list of CLAIRE recommendations and accept or reject them.
Data Profiling and Quality	- Data Profiling. Assesses source metadata and analyzes the collected statistics to discover content and structure, such as value distribution, patterns, and data types. - Data Quality. Measures the reliability of the data and enables data usage. - Data Observability. Identifies anomalies in the characteristics of the data.
Data Classification	Data classification is the process of identifying and organizing data into relevant categories based on the functional meaning of the data. Classifying data can help your organization manage risks, compliance, and data security.
Relationship Discovery	The relationship discovery capability identifies pairs of similar columns and relationships between tables within a catalog source.
Glossary Association	You can associate terms that are in the glossary with technical assets to provide user-friendly business names to technical assets. Glossary Association automatically associates glossary terms with technical assets or recommends glossary terms that you can manually associate with technical assets in Data Governance and Catalog.

2On the Metadata Extraction tab, select any runtime environment that is available in Informatica Intelligent Cloud Services to run the metadata extraction task.

A runtime environment is either Informatica Cloud Secure Agent or a serverless runtime environment.

3Choose to retain or delete objects that are deleted from the source system in the catalog using the Metadata Change Option.

- Retain. Retains objects that are deleted from the source system in the catalog. If you update or add a filter, the catalog retains objects extracted from the previous job and extracts additional objects that match the current filter. Objects deleted from the source system are not deleted from the catalog. Enrichments added on deleted objects and relationships are retained.
- Delete. Deletes metadata from the catalog based on objects deleted from the source system and changes you make to the filter. Enrichments added on deleted objects and relationships are also permanently lost. Objects renamed in the source system are removed and recreated in the catalog.

Note: You can also change the configured metadata change option when you run a catalog source.

4You can add filter conditions to extract metadata from a specific set of objects in the source system. In the Filters section, select Filter Conditions to add filters.

The system extracts the source metadata based on the object types and conditions specified in this page.

aFrom the first list, choose to include or exclude specific metadata in an extraction run.
bFrom the second list, you can choose the type of object from which you want to include or exclude metadata.

The object type can be a table, view, schema, path, file or folder depending on the catalog source type that you configure.

cFrom the third list, choose the filter condition based on the object type that you have selected.

You can choose to specify the name or pattern of the path to an object that you want to include or exclude.

dClick Select to enter one or more values for the specified object type that you want to include or exclude from the extraction run.

Note: The Select option appears depending on the catalog source type that you configure.

The values that you enter differ based on the catalog source types. The following table shows the different values that you can enter for different catalog source types:

Catalog Source Type	Values
Relational database catalog sources, such as Oracle and Azure SQL Server	Full names of tables, schemas, or views. You can use wildcard characters in the name.
ETL catalog sources, such as Azure Data Factory	Fully qualified paths to the objects. You can use wildcard characters in the pattern.
File system based catalog sources, such as Amazon S3 and Azure Data Lake Storage	Full names of files or folders. You can use wildcard characters in the name.

eYou can add multiple filter conditions for each catalog source to include and exclude specific metadata from the source systems. To add more filters, click the Add icon.

The following images are a few examples of filters:

▪ The following filter extracts metadata from a table named Employee_Table that belongs to the Ora1 schema in an Oracle source system:

The image shows the filter condition for an Oracle catalog source to include metadata from an Employee_Table table located in the Ora1 schema.

▪ The following filter excludes metadata from an activity named Activity1 located at AzureDatafactoryName/dev_pipeline/Activity1 in a Microsoft Azure Data Factory source system:

The image shows the filter condition for a Microsoft Azure Data Factory catalog source to exclude metadata from the activity named Activity1 located in the AzureDataFactory1/Pipeline1/Activity1 path.

Note: For File System catalog sources, if you add multiple filters with different wildcard usage for the same object type, only the last wildcard condition is considered.

Some catalog sources have additional parameters that you can configure on this tab.

Note: Use expert parameters when it is recommended by Informatica Global Customer Support.

5On the Lineage Discovery tab, enable the capability to build the complete lineage of a catalog source by recommending endpoint catalog source objects to assign to reference catalog source connections.

Optionally, you can add filters to include or exclude catalog sources for lineage discovery based on filter parameters such as catalog source types, catalog source names, and asset groups.

Note: The Lineage Discovery tab appears if it is available for the catalog source.

For more information about lineage discovery, see Lineage discovery.

6On the Data Profiling and Quality tab, enable the capabilities and enter values to determine the type of data that you want the data profiling and quality task to collect, the scope of the data profile and quality run, and the sample rows on which you want to run the data profiling and quality task.

Optionally, you can add filters conditions to create subsets of metadata that you can use to run a data profiling task on a catalog source.

For more information about data profiling and data quality configuration options, see the Metadata Command Center Administration help.

7On the Data Classification tab, enable the capability and select one or both of the following options:

Option	Description
Generated Data Classifications	A CLAIRE powered solution. The system automatically generates data classifications for the data elements in the catalog.
Data Classification Rules	Choose from predefined or custom data classifications. Click Add Data Classification and select the data element classification that you want to apply to the catalog source. Note: Data classifications that you create using the New > Data Classification menu appear in this list.

For more information about creating data classifications, see the Administrator help.

8On the Relationship Discovery tab, enable the capability and enter values for the following properties to determine column similarity and joinable table relationships:

The following table describes the properties:

Property	Description
Relationship Inference Model	Select the predefined relationship inference model that Metadata Command Center provides for discovering column similarity relationships within the catalog source. You can also choose a relationship inference model that you have imported.
Containment Score Threshold	Specify a score from 0 to 1 inclusive to identify joinable table relationships within the catalog source. This score is an indicator of the data overlap between the two given columns which determines whether the tables are joinable. A higher score means more similarity of data and a greater probability of the tables being joinable.

For more information about relationship discovery, see the Administrator help.

9On the Glossary Association tab, enable the capability and configure settings for a catalog source to automatically associate or recommend glossary terms as business names for data elements in technical assets.

The following table describes the settings for a catalog source:

Property	Description
Enable auto-acceptance	When enabled, this option automatically associates glossary terms with data elements based on the threshold limit that you specify. The automatically accepted glossary terms appear as business names of data elements in Data Governance and Catalog.
Confidence Score Threshold for Auto-Acceptance	Specify a percentage from 80 to 100 inclusive to set a threshold limit. If a glossary term matches a data asset within the threshold specified, Metadata Command Center automatically assigns the matching glossary term to the data element. The name and description of the glossary term with the highest confidence score appears as the name and description of the data element asset in Data Governance and Catalog.
Enable below-threshold recommendations	If you enable auto-acceptance, you can select this option to receive glossary association recommendations below the auto-acceptance threshold.
Confidence score threshold for recommendations	If you enable auto-acceptance, specify a percentage between 80% and the selected confidence score threshold for auto-acceptance. If you disable auto-acceptance, specify a percentage between 80% and 100%.
Assign business names and descriptions	Choose to automatically assign business names and descriptions to technical assets.
Keep existing business names and descriptions	Applicable if you choose to assign business names and descriptions. Choose to retain existing assignments and only assign business names and descriptions to assets that don't have assignments, or allow overwrite of existing assignments. By default, existing assignments are retained.
Ignore Keywords	Choose to ignore specific parts of data elements when making recommendations. You can enter multiple unique prefix and suffix keywords. Keyword values are case insensitive.
Glossary Association Scope	Choose specific top-level business glossary assets to associate with technical assets. Selecting a top-level asset selects its child assets as well.
Use Abbreviation and Synonym Definitions	Enable this option to use a lookup table with abbreviations and synonyms to improve glossary association accuracy. To upload a lookup synonym file with the abbreviations and synonyms, select Yes to enable, and then click Select.

For more information about the glossary association settings, see the Administrator help.

10Click Next to move to the Associations step.

Creating a catalog source

Step 1. Register a catalog source

Step 2. Configure a catalog source

Step 3. Associate stakeholders and asset groups

Step 4. Schedule a job

Step 5. Run the catalog source job