Microsoft SharePoint Online Sources > Create catalog sources in Metadata Command Center > Step 2. Configure capabilities
  

Step 2. Configure capabilities

When you configure the Microsoft SharePoint Online catalog source, you define the settings for the metadata extraction capability and other optional capabilities.
The metadata extraction capability extracts source metadata from external source systems. You can also configure other capabilities that the catalog source includes.
You can save the catalog source configuration at any point after you enter the connection information. After you save the catalog source, you can choose to run the catalog source job. To run the job once, click Run. To run metadata extraction and other capabilities on a recurring schedule, configure schedules on the Schedule tab.

Configure metadata extraction

When you configure the Microsoft SharePoint Online catalog source, you choose a runtime environment, define filters, and enter configuration parameters for metadata extraction.
    1In the Connection and Runtime area, choose a serverless runtime environment or the Secure Agent group where you want to run catalog source jobs.
    Note:
    Serverless runtime environment options are available if the catalog source works with a serverless runtime environment.
    2Choose to retain, delete, or deprecate objects that are deleted from the source system in the catalog with the Metadata Change Option.
    Note:
    You can also change the configured metadata change option when you run a catalog source.
    3In the Filters area, define one or more filter conditions to apply for metadata extraction:
    1. aSelect Yes to view filter options.
    2. bFrom the Include/Exclude list, choose to include or exclude metadata based on the filter parameters.
    3. cFrom the Object type list, select an object type, depending on the object that you want to extract metadata from.
    4. dEnter the path to the object as the filter value.
    5. The filter conditions for the catalog source with the File, Folder, and Path object types visible.
    6. eTo define an additional filter with an OR condition, click the Add icon.
    4In the Configuration Parameters area, enter the catalog source configuration options and enable or disable partition detection in files.
    The following table describes the properties that you can enter:
    Property
    Description
    Extract Group Elements from Hierarchical Files
    Select one of the following options to extract group or leaf elements from hierarchical files:
    • - Yes. Extracts group elements from hierarchical files with the complete hierarchy of hierarchical fields. You can view the hierarchy of hierarchical files in the Hierarchy tab of assets in Data Governance and Catalog.
    • - No. Extracts only leaf elements from hierarchical files without the complete hierarchy of hierarchical fields.
    You can extract group elements from hierarchical files for the following file types:
    • - AVRO. Extracts and groups hierarchical files and hierarchical fields.
    • - Parquet. Extracts and groups hierarchical files and hierarchical fields.
    • - JSON. Extracts and groups hierarchical files and hierarchical fields.
    • - XML. Extracts and groups elements and attributes. For XML file types, a maximum depth of 1000 elements is permitted within a single element in the hierarchy.
    • - XSD. Extracts and groups elements and attributes.
    If you modify the
    Extract Group Elements from Hierarchical Files
    field and run the catalog source again, the asset page doesn't display the hierarchical elements in the correct hierarchy groups. If you modify the property value, purge the catalog source before you run it again.
    Enable Extension-Based File Type Detection
    Select one of the following options to detect file types by file extensions or by parsing the file contents:
    • - Yes. Detects file types by file extensions.
    • - No. Parses the file contents to detect file types.
    Note:
    You can detect file types by file extensions for the following file types:
    • - CSV
    • - TSV
    • - TXT
    • - XML
    Use First Row as Header of Delimited Files
    Select one of the following options to use the first row as the header or detect headers automatically for delimited files:
    • - Yes.
    • Detects column headers based on the following rules:
      • - Duplicate headers get suffixed with ‘#’ followed by a number, for example, ABC#1, ABC#2. The detection is not case-sensitive.
      • - Empty column header values appear as UnknownColumn<position>, for example UnknownColumn2.
      • - The header row in the file is detected even if it has a different number of columns than the data rows.
    • - No. Detects headers automatically for delimited files.
    Headers of Delimited Files
    Specify values to determine headers of delimited files. Separate multiple values by commas. If any value from the list is found in the first row of the delimited file, then the first row is used as the header.
    Note:
    This parameter appears only if you choose No for the
    Use First Row as Header of Delimited Files
    parameter.
    Treat Files Without Extension As
    Select one of the following options to identify files without an extension:
    • - Parquet
    • - AVRO
    • - JSON
    Enter File Delimiter
    Specify the file delimiter if the file from which you extract metadata uses a delimiter other than the following list of delimiters:
    • - Comma (,)
    • - Horizontal tab ( \t)
    • - Semicolon (;)
    • - Colon (:)
    • - Pipe symbol (|)
    Enclose the delimiter in single quotes, such as '$'. Use a comma to separate multiple delimiters.
    Note:
    Adding a custom delimiter overrides the default list of delimiters. If you specify a delimiter, characters from the default list are not considered as delimiters.
    Files to be excluded during partition discovery
    Specify the regular expression of the files that you want to exclude during partition discovery.
    Enclose each regular expression in double quotes such as, ".*json","Customer.csv","Parquet.*". Use a comma to separate multiple regular expressions.
    Partition Detection
    Enable or disable detection of partitions in files.
    Enable partition detection to identify horizontally partitioned files and publish them in a directory and files organized in hierarchical Hive-style directory structures as a single partitioned file.
    5Click Next.
    The Associations page appears.

Filter guidelines and examples

You can add metadata extraction filters when you configure the catalog source. To create a filter, you can use choose from file names, folder names, or paths.
Consider the following rules and guidelines when you enter filter values:
Important:
It is recommended that you either use only a path filter or use a combination of a folder and a file filter.

Examples

You can include or exclude metadata from folders, files, or paths.
Folder filters
Folder filters apply to folders included in the source system.
For example:
File filters
File filters apply to the files included in folders that you filter. The file filter is recursive. If you don’t provide any folder filters, the file filters apply to the entire folder hierarchy.
For example:
Path filters
Path filters apply to the files and folders in the path that you filter. The path filter is non-recursive. If you provide only the file or folder names, the path filters apply to the first level files or directories.
For example:

Configure data classification

Enable the data classification capability to identify and organize data into relevant categories based on the functional meaning of the data.
    1Click the Data Classification tab.
    2Select Enable Data Classification.
    3Choose one or both of the following options:

Configure glossary association

Enable the glossary association capability to associate glossary terms with technical assets, or to get recommendations for glossary terms that you can manually associate with technical assets in Data Governance and Catalog.
Metadata Command Center considers all published business terms in the glossary while making recommendations to associate your technical assets.
    1Click the Glossary Association tab.
    2Select Enable Glossary Association.
    3Select Enable auto-acceptance to automatically accept glossary association recommendations.
    4Specify the Confidence Score Threshold for Auto-Acceptance to set a threshold limit based on which the glossary association capability automatically accepts the recommended glossary terms.
    Note:
    Specify a percentage from 80 to 100. If the score is higher than the specified limit, the glossary association capability automatically assigns a matching glossary term to the data element.
    5Select Enable Below-threshold Recommendations to receive glossary association recommendations below the auto-acceptance threshold. If you enable auto-acceptance, you can enable below-threshold recommendations to receive glossary recommendations below the auto-acceptance threshold.
    6Specify the Confidence Score Threshold for Recommendations to set a threshold based on which the glossary association capability makes recommendations
    If you enable auto-acceptance, specify a percentage from 80 to the selected auto-acceptance threshold. You can accept or reject the recommended glossary terms that fall within this range in Data Governance and Catalog.
    If you disable auto-acceptance, specify a percentage from 80 to 100 inclusive.
    7Choose to automatically assign business names and descriptions to technical assets. You can then choose to retain existing assignments and only assign business names and descriptions to assets that don't have assignments, or allow overwrite of existing assignments.
    By default, existing assignments are retained.
    8Optional. Choose to ignore specific parts of data elements when making recommendations. Select Yes and enter prefix and suffix keyword values as needed.
    Click Select to enter a keyword. You can enter multiple unique prefix and suffix keywords. Keyword values are case insensitive.
    9Optional. Choose specific top-level business glossary assets to associate with technical assets. Selecting a top-level asset selects its child assets as well. Select Top-level Glossary Assets and specify the assets on the Select Assets page.
    10Optional. Choose to use abbreviations and synonym definitions from lookup tables for accurate glossary association. Select Yes to enable, and then click Select to upload a lookup table.
    11Click Next.
    The Associations page appears.