Microsoft Fabric Data Warehouse Sources > Create catalog sources in Metadata Command Center > Step 2. Configure capabilities

Step 2. Configure capabilities

When you configure the Microsoft Fabric Data Warehouse catalog source, you define the settings for the metadata extraction capability and other optional capabilities.

The metadata extraction capability extracts source metadata from external source systems. You can also configure other capabilities that the catalog source includes.

You can save the catalog source configuration at any point after you enter the connection information. After you save the catalog source, you can choose to run the catalog source job. To run the job once, click Run. To run metadata extraction and other capabilities on a recurring schedule, configure schedules on the Schedule tab.

Configure metadata extraction

When you configure a Microsoft Fabric Data Warehouse catalog source, you choose a runtime environment, define filters, and enter configuration parameters for metadata extraction.

Before you configure metadata extraction, configure runtime environments in the IDMC Administrator.

1In the Connection and Runtime area, choose a serverless runtime environment or the Secure Agent group where you want to run catalog source jobs.

Note:

Serverless runtime environment options are available if the catalog source works with a serverless runtime environment.

2Choose to retain, delete, or deprecate objects that are deleted from the source system in the catalog with the Metadata Change Option.

- Retain. Retains objects that are deleted from the source system in the catalog. If you update or add a filter, the catalog retains objects extracted from the previous job and extracts additional objects that match the current filter. Objects deleted from the source system are not deleted from the catalog. Enrichments added on deleted objects and relationships are retained.
- Delete. Deletes metadata from the catalog based on objects deleted from the source system and changes you make to the filter. Enrichments added on deleted objects and relationships are also permanently lost. Objects renamed in the source system are removed and recreated in the catalog.
- Deprecate. The lifecycle of objects imported into the catalog moves to Obsolete based on objects deleted from the source system and changes you make to the filter. This does not impact enrichments added on deprecated objects and relationships. Objects renamed in the source system are removed and recreated in the catalog. When you run the catalog source job again for other capabilities such as data classification, relationship discovery, or glossary association, the job doesn't consider obsolete objects. Obsolete objects remain in the catalog until they are purged when you run a Purge Obsolete Objects job on the Explore page.

Note:

You can also change the configured metadata change option when you run a catalog source.

3In the Filters area, define one or more filter conditions to apply for metadata extraction:

To define filters, you can either select an object type and enter the path to the object as the filter value, or select an object from a list of objects available in the source system.

aSelect Yes to view filter options.
bFrom the Include/Exclude list, choose to include or exclude metadata based on the filter parameters.
cPerform one of the following steps:

▪ From the Object type list, select Tables, Views, Stored procedures, or All depending on the object that you want to extract metadata from. Enter the path to the object as the filter value.
▪ In the filter value field, click the Search button and select an object from a list of objects available in the source system.

The Object type field updates based on the selected object.

If you select an object type and then click the Search button, the list of objects includes all object types, but you can only select objects that match the selected object type.

You can edit the filter value after you select an object from the list.

Note:

You can only search for object types that work with the search functionality. If you don't see the Search button for the selected object, enter the object path as the filter value.

Note:

If the object metadata is available in

Data Governance and Catalog

, a check mark appears next to the object.

Note:

To select an object, you need to have permissions on the connection to the source system.

Filters can contain the following wildcards:

▪ Question mark. Represents a single character.
▪ Asterisk. Represents multiple characters or empty text.

For object hierarchies, use a dot as a separator.

When you enter values for filters, enclose them in double quotes if you use a space or a dot in a single segment.

The following image shows the filter condition options: The filters contain options to include or exclude metadata based on specific object types.

The filters contain options to include or exclude metadata based on specific object types.

Examples:

▪ To include or exclude all object types in the ‘Schema1’ schema in the ‘dwName1’ data warehouse, select All as the object type and enter dwName1.Schema1 in the value field.
▪ To include or exclude the ‘Table1’ table in the ‘Schema1’ schema in the ‘dwName1’ data warehouse, select Tables as the object type and enter dwName1.Schema1.Table1 in the value field.
▪ To include or exclude all views in the “dwName1” data warehouse, select Views as the object type and enter dwName1 in the value field.
▪ To include or exclude all stored procedures in the ‘dwName1’ data warehouse, select Stored procedures as the object type and enter dwName1 in the value field.
▪ To include or exclude all tables in data warehouses with names that start with ‘dwName’ followed by one additional character, select Tables as the object type and enter dwName? in the value field.
▪ To include or exclude all tables in the ‘Schema1’ schema in the ‘dwName1’ data warehouse, select Tables as the object type and enter dwName1.Schema1 in the value field.
▪ To include or exclude all stored procedures in schemas with names that start with ‘abc’ followed by an additional character, in data warehouses with names that start with 'dwName' followed by any number of characters, select Stored procedures as the object type and enter dwName*.abc? in the value field.

dOptionally, to define an additional filter with an OR condition, click the Add icon.

4Optionally, in the Configuration Parameters area, enter properties to override default context values and job parameters.

Note:

Click

Show Advanced

to view all configuration parameters.

The following table describes the properties that you enter for Catalog Source Configuration Options:

Parameter	Description
Default variables values	Specify a default value for variables used in the programmable objects.
MetaTables Include Filter	Advanced parameter. When you process PL/SQL statements, Metadata Command Center does not read tables or view content by default. If you want to use the content, for example, to process dynamic SQL statements, use the MetaTables Include Filter parameter. This parameter prompts the database for the required metadata. Verify that the user has SELECT permissions for metatables. Note: Don't use this option to specify filters for tables that you want to include or exclude during the metadata extraction run.

Parameter

Description

Default variables values

Specify a default value for variables used in the programmable objects.

MetaTables Include Filter

Advanced parameter. When you process PL/SQL statements, Metadata Command Center does not read tables or view content by default. If you want to use the content, for example, to process dynamic SQL statements, use the MetaTables Include Filter parameter. This parameter prompts the database for the required metadata. Verify that the user has SELECT permissions for metatables.

Note:

Don't use this option to specify filters for tables that you want to include or exclude during the metadata extraction run.

The following table describes the property that you can enter for Additional settings:

Note:

The Additional settings section appears when you click

Show Advanced

Property	Description
Expert parameters	Enter additional configuration options to be passed at runtime. Required if you need to troubleshoot the catalog source job. Caution: Use expert parameters when it is recommended by Informatica Global Customer Support.

5Configure additional capabilities for the catalog source by clicking on the tabs.

Configure lineage discovery

Enable the lineage discovery capability and use CLAIRE to build complete lineage by recommending endpoint catalog source objects to assign to reference catalog source connections.

1Click the Lineage Discovery tab.

2Select Enable Lineage Discovery.

3In the Filters area, define one or more filter conditions to apply for lineage discovery.

To define filters, you can choose to select catalog source types, asset groups, or enter a catalog source name or search from a list of catalog sources.

aSelect Yes to view filter options.
bFrom the Include/Exclude list, choose to include or exclude catalog sources for lineage discovery based on the filter parameters.
cFrom the filter type list, select catalog source type, catalog source name, or asset group.
dIn the filter value field, select the required catalog source types, or click the Search button and select catalog sources or asset groups.

Filters can contain the asterisk wildcard to represent multiple characters or empty text.

The filter options appear. The filter options include multiple filter conditions that you can choose.

Examples:

▪ To include or exclude all Oracle catalog sources, select Catalog Source Type as the filter type and select Oracle in the filter value field.
▪ To include or exclude the 'Oracle_Retail' catalog source, select Catalog Source Name as the filter type and search for the catalog source or enter Oracle_Retail in the filter value field.
▪ To include or exclude all catalog sources with names that start with 'Oracle', select Catalog Source Name as the filter type and search for the catalog source or enter Oracle* in the filter value field.
▪ To include or exclude all catalog sources with names that end with 'Retail', select Catalog Source Name as the filter type and search for the catalog source or enter *Retail in the filter value field.
▪ To include or exclude all catalog sources with names that contain 'Ret', select Catalog Source Name as the filter type and search for the catalog source or enter *Ret* in the filter value field.
▪ To include or exclude all catalog sources that are part of the 'Financial Group' asset group, select Asset Group as the filter type and search Financial Group in the filter value field.

Note:

You can't add more than one include or exclude filter for the same filter type.

eOptionally, to define an additional filter with an AND condition, click the Add icon.

For more information about lineage discovery, see Lineage discovery.

Configure data profiling and quality

Enable the data profiling capability to evaluate the quality of metadata extracted from the Microsoft Fabric Data Warehouse source system.

1Click the Data Profiling and Quality tab.

2Expand Data Profiling and select Enable Data Profiling.

Note:

Ensure that you have permissions on all the staging connections that you use in your data profiling configuration. You can't run the job if you don't have permissions on the connections that you use. Select connections that you have access to, or ask the administrator to grant the necessary permissions on the connections that you want to use.

3Optional. In the Filters area, specify additional filters in addition to metadata filters:

aSelect Yes to view filter options.
bFrom the Include or Exclude metadata list, choose to include or exclude metadata based on the filter parameters.
cFrom the object type list, select Tables or Views depending on the object that you want to extract metadata from. Select All to extract metadata from all objects in the schema.
dEnter the path to the object as the filter value.

Examples:

▪ You extracted metadata of all tables and views from a schema and now you want to profile a few selected tables or views from the schema. Select All/Views/Tables from the Object type option and then enter the data warehouse name followed by the Schema name and table/view name in the input field. For example, Datawarehouse_name.Schema_Name.TABLE_NAME or Datawarehouse_name.Schema_Name.VIEW_NAME
▪ You extracted metadata from multiple schemas and now you want to run a profile on all the objects in a particular schema. Select All from the Object type option and then enter the data warehouse name followed by the Schema name in the input field. For example, Datawarehouse_name.Schema_Name1 or Datawarehouse_name.Schema_Name2

To include or exclude multiple objects, click the Add icon to add filters with the OR condition.

4In the Parameters area, configure the following parameters based on your requirements:

Parameter	Description
Modes of Run	Determines the type of data that you want the data profiling task to collect. Choose one of the following options: - Keep Signatures Only. Collects only aggregate information such as data types, average, standard deviation, and patterns. - Keep Signatures and Values. Collects both signatures and data values.
Profiling Scope	Determines whether to run data profiling only on the changes made to the source system or on the entire source system. Choose one of the following options: - Incremental. Includes only source metadata that is changed or updated since the last profile run. - Full. Includes the entire metadata that is extracted based on the filters applied for extraction.
Sampling Type	Determines the sample rows on which you want to run the data profiling task. Choose any of the following options: - All rows. Runs data profiling on all rows in the metadata. - Limit N Rows. Runs data profiling on a limited number of rows. - Custom Query. Provides an SQL clause to select sample rows to run the data profiling task. For example, where column1='X'; TABLESAMPLE(X ROWS); TABLESAMPLE(X PERCENT)
No of rows to limit	Required if you select Limit N Rows in Sampling Type. Specify the number of rows that you want to run the profile on. Default is 1000.
Maximum Precision of String Fields	The maximum precision set for profiling fields of the string data type.
Text Qualifier	The character that defines string boundaries. If you select a quote character, profiling ignores delimiters within the quotes. Select a qualifier from the list. Default is Double Quote.

5Expand Data Quality and select Enable Data Quality.

Note:

You can click

Use Data Profiling Parameters

to use the same parameters as in the

Data Profiling

section.

Note:

Ensure that you have permissions on all the staging and flat file connections that you use in your data quality configuration. You can't run the job if you don't have permissions on the connections that you use. Select connections that you have access to, or ask the administrator to grant the necessary permissions on the connections that you want to use.

6In the Parameters area, configure the following parameters based on your requirements:

Parameter	Description
Data Quality Rule Automation	Enable the option to automatically create or update rule occurrences for data elements in the catalog source. Choose one of the following options: - Apply on Data Elements linked with Business Dataset. Creates rule occurrences for all data elements that are linked with business data sets in the catalog source. - Apply on all Data Elements. Creates rule occurrences for all data elements in the catalog source.
Cache Result	Select Agent Cache if you want to generate a cache file in the runtime environment and to preview the cached results faster in subsequent data preview runs. The results are cached for seven days by default after the first run in the runtime environment. Select No Cache if you don't want to cache the preview results and view the live results.
Run Rule Occurrence Frequency	Specify whether you want to run data quality rules based on the frequency defined for the rule occurrence in Data Governance and Catalog.
Sampling Type	Determines the sample rows on which you want to run the data quality task. Choose any of the following options: - All rows. Runs data profiling on all rows in the metadata. - Limit N Rows. Runs data profiling on a limited number of rows. - Custom Query. Provides an SQL clause to select sample rows to run the data profiling task. For example, where column1='X'; TABLESAMPLE(X ROWS); TABLESAMPLE(X PERCENT)
No of rows to limit	Required if you select Limit N Rows in Sampling Type. Specify the number of rows that you want to run the profile on. Default is 1000.
Maximum Precision of String Fields	The maximum precision set for profiling fields of the string data type.
Text Qualifier	The character that defines string boundaries. If you select a quote character, profiling ignores delimiters within the quotes. Select a qualifier from the list. Default is Double Quote.

Configure data observability

Enable the data observability capability to receive anomaly notifications for metadata extracted from the source system.

To include profiling anomalies, enable data profiling for the catalog source. For accurate results, set the Profiling Scope option to Full in the Data Profiling and Quality configuration.

1Click the Data Observability tab.

2Select Enable Data Observability.

3In the Parameters area, enter a value to specify the minimum number of catalog source job runs required to start detecting anomalies. For example, if you enter a value of 5, then the job starts to detect anomalies from the sixth run.

Note:

This option does not apply to anomalies that require a minimum of two runs to generate.

4Specify the maximum number of anomaly events to generate for each catalog source run. Default is 1000.

5Optionally, in the Asset Filters area, enable filters to run data observability on specific assets.

- Specify volume filters if you want to extract volume anomaly data for specific metadata extracted.
- Specify data profiling filters if you want to override the data profiling filter and extract profiling anomalies for assets based on a different filter.

6To add a volume filter, perform the following steps:

aSelect Yes to view filter options.
bFrom the Include/Exclude list, choose to include or exclude metadata.
cEnter a value to specify the object location.

For example, you want to observe volume metrics of a specific table from the schema extracted. Enter the schema name followed by the table name in the input field.

Example: Schema_Name.Table_Name

dTo include or exclude multiple objects, click the Add icon to add filters with the OR condition.

7To add a data profiling filter, perform the following steps:

aSelect Yes to view filter options.
bFrom the Include/Exclude list, choose to include or exclude metadata.
cFrom the Specify data profiling filters list, select an object type.
dEnter a value to specify the object location.

For example, you ran a profile on multiple schemas, and now you want to generate profiling anomalies on a specific schema. Enter the schema name in the input field.

eTo include or exclude multiple objects, click the Add icon to add filters with the OR condition.

8Optionally, in the Metric Filter area, specify filters to generate events based on specific metrics. If you don't include a filter, the job detects both profiling and other anomalies applicable to the catalog source.

aSelect Filter conditions.
bFrom the Include Metric list, choose to include or exclude a metric.
cFrom the All Metrics list, select a metric for which you want to generate events.
dFrom the Sensitivity list, select the sensitivity of the anomaly.
eFrom the Select detection rules list, select rules that you apply to detect anomalies.

The following table describes the detection rules that you can select:


Option	Description
Static Data	Detects the following anomalies: - Percentage variation. If percentage values other than 0% or 100% remain constant for three or more profile runs, any subsequent change generates a Percentage Variation anomaly. - Count variation. A change in a value that is usually constant indicates that there is an anomaly.
100% or 0% Change Detection	Detects the following anomalies: - Drop from maximum. A drop from 100% to a lower value generates a Drop from Maximum anomaly. - Surge from minimum. A spike from 0% to any higher value generates a Surge from Minimum anomaly.
Standard Deviation	Detects the following anomalies: - Drop in transition. A drop from a value that has been decreasing at a constant rate to a significantly lower value generates a Drop in Transition anomaly. - Surge in transition. A spike from a value that has been increasing at a constant rate to a significantly higher value generates a Surge in Transition anomaly. - Deviation. If the percentage values of a record change with each profile run, the algorithm calculates an expected range based on the changes observed in each profile run. A new value that falls outside of this expected range generates a Deviation anomaly.
Breaking Trends	Detect the following types of count-based anomalies: - Drop. A drop in a value that has been increasing to a lower value generates a Drop anomaly. - Surge. A spike in a value that has been decreasing to a higher value generates a Surge anomaly.

9Applicable if you include data volume anomalies in the metric filters. In the Metric Configuration option, choose how you want to measure metadata volume.

You can choose one of the following options:

- Statistics. Data observability volume that is extracted from the objects of the source system. Here, the volume measured might be outdated.
- Calculated. Data observability volume that is measured when a data observability job is run on the catalog source. Here, the volume measured is more accurate.

Configure data classification

Enable the data classification capability to identify and organize data into relevant categories based on the functional meaning of the data.

1Click the Data Classification tab.

2Select Enable Data Classification.

3Choose one or both of the following options:

- Generated Data Classifications. CLAIRE automatically generates data classifications for the data elements.
- Data Classification Rules. Choose from predefined or custom data classifications.

1Click Add Data Classification. The Select Data Classifications dialog box appears.

The Select Classifications dialog box includes a list of data classifications with a preview panel, the OK button, and the Cancel button.

2Select the data classifications that you want to use.
3Click OK.

Configure relationship discovery

Enable the relationship discovery capability to identify pairs of similar columns and relationships between tables within a catalog source.

Before you configure relationship discovery, perform the following tasks:

•Import a relationship inference model. For more information about importing a relationship inference model, see Import a relationship inference model.
•Enable data profiling on the Data Profiling and Quality tab, and select Keep Signatures and Values as the run mode in the Parameters section. These configurations enable you to retain values of the columns in the profiling results and discover relationships.

1Click the Relationship Discovery tab.

2Select Enable Relationship Discovery.

3In the Column Similarity area, select the Relationship Inference Model.

Note:

The relationship inference models that you imported appear in the

Relationship Inference Model

field.

4In the Joinable Tables Relationship area, specify the Containment Score Threshold to identify joinable table relationships within the catalog source. This score is an indicator of the data overlap between any two given columns which determines whether the tables are joinable.

Note:

A higher score means that the objects have more overlapping data and a lower score means lesser overlapping data between the two objects. A containment score threshold lower than 0.4 might result in a large number of false positives.

After you run the catalog source job, you can view the inferred relationships on the Relationships tab of the extracted assets in Data Governance and Catalog.

Configure glossary associations

Enable the glossary association capability to associate glossary terms with technical assets, or to get recommendations for glossary terms that you can manually associate with technical assets in Data Governance and Catalog.

Metadata Command Center considers all published business terms in the glossary while making recommendations to associate your technical assets.

1Click the Glossary Association tab.

2Select Enable Glossary Association.

3Select Enable auto-acceptance to automatically accept glossary association recommendations.

4Specify the Confidence Score Threshold for Auto-Acceptance to set a threshold limit based on which the glossary association capability automatically accepts the recommended glossary terms.

Note:

Specify a percentage from 80 to 100. If the score is higher than the specified limit, the glossary association capability automatically assigns a matching glossary term to the data element.

5Select Enable Below-threshold Recommendations to receive glossary association recommendations below the auto-acceptance threshold. If you enable auto-acceptance, you can enable below-threshold recommendations to receive glossary recommendations below the auto-acceptance threshold.

6Specify the Confidence Score Threshold for Recommendations to set a threshold based on which the glossary association capability makes recommendations

If you enable auto-acceptance, specify a percentage from 80 to the selected auto-acceptance threshold. You can accept or reject the recommended glossary terms that fall within this range in Data Governance and Catalog.

If you disable auto-acceptance, specify a percentage from 80 to 100 inclusive.

7Choose to automatically assign business names and descriptions to technical assets. You can then choose to retain existing assignments and only assign business names and descriptions to assets that don't have assignments, or allow overwrite of existing assignments.

By default, existing assignments are retained.

8Optional. Choose to ignore specific parts of data elements when making recommendations. Select Yes and enter prefix and suffix keyword values as needed.

Click Select to enter a keyword. You can enter multiple unique prefix and suffix keywords. Keyword values are case insensitive.

9Optional. Choose specific top-level business glossary assets to associate with technical assets. Selecting a top-level asset selects its child assets as well. Select Top-level Glossary Assets and specify the assets on the Select Assets page.

10Optional. Choose to use abbreviations and synonym definitions from lookup tables for accurate glossary association. Select Yes to enable, and then click Select to upload a lookup table.

11Click Next.

The Associations page appears.