Apache Atlas Sources > Create catalog sources in Metadata Command Center > Step 2. Configure capabilities
  

Step 2. Configure capabilities

When you configure the Apache Atlas catalog source, you define the settings for the metadata extraction capability.
The metadata extraction capability extracts source metadata from external source systems. You can also configure other capabilities that the catalog source includes.
You can save the catalog source configuration at any point after you enter the connection information. After you save the catalog source, you can choose to run the catalog source job. To run the job once, click Run. To run metadata extraction and other capabilities on a recurring schedule, configure schedules on the Schedule tab.

Configure metadata extraction

When you configure the Apache Atlas catalog source, you choose a runtime environment, define filters, and enter configuration parameters for metadata extraction.
    1In the Connection and Runtime area, choose a serverless runtime environment or the Secure Agent group where you want to run catalog source jobs.
    Note:
    Serverless runtime environment options are available if the catalog source works with a serverless runtime environment.
    2Choose to retain, delete, or deprecate objects that are deleted from the source system in the catalog with the Metadata Change Option.
    Note:
    You can also change the configured metadata change option when you run a catalog source.
    3In the Filters area, define one or more filter conditions to apply for metadata extraction:
    1. aFrom the Include or Exclude metadata list, choose to include or exclude metadata based on the filter parameters.
    2. bFrom the Object type list, select Hive Database, HDFS Path, or HBase Namespace.
    3. cEnter the filter values.
    4. Filters can contain the following wildcards:
      The following image shows the filter condition options:
      The image shows the Filters area for the Apache Atlas catalog source. You can select to include or exclude metadata from Hive Database or HDFS path and enter a value to specify the object location.
    5. dTo define an additional filter with an OR condition, click the Add icon.
    6. The following image shows that the filter includes metadata related to Hive tables in the HR database with names that start with EMP followed by a single character, includes metadata related to the table named HbaseTable located in the HbaseNS namespace, and excludes metadata related to all files in the hdfsfolder1 folder and its subfolders:
      The image shows the filter conditions for an Apache Atlas catalog source that includes metadata related to Hive tables in the HR database with names that start with EMP followed by a single character, includes metadata related to the table named HbaseTable located in the HbaseNS namespace, and excludes metadata related to all files in the hdfsfolder1 folder and its subfolders.
    Exclude filter conditions are considered if the assets in the include filter conditions are not related or linked through lineage to the excluded assets. For example, add a filter condition to include metadata related to all tables with the name EMP across all databases (*.EMP) and then add another filter condition to exclude metadata related to the EMP table located in the HR database (HR.EMP). Here, the exclude filter condition is considered as the assets are not related or linked through lineage.
    Exclude filter conditions are not considered if the assets in the include filter conditions are related or linked through lineage to the excluded assets. For example, add a filter condition to include metadata related to EMP table in the HR database (HR.EMP) and then add another filter condition to exclude metadata related to SAL table in the same HR database (HR.SAL). Here, the exclude filter condition is not considered due to the presence of lineage links between the EMP and SAL tables.
    If you add a filter condition to include metadata from a table deleted from the Apache Atlas source system, Metadata Command Center ignores the filter condition.
    If the value of the HDFS Path filter contains special characters, replace the special characters with an asterisk wildcard character. For example, replace /Test$~^!()*<>_Folder with /Test*Folder.
    4In the Configuration Parameters area, enter configuration properties.
    Note:
    Click
    Show Advanced
    to view all configuration parameters.
    The following table describes the properties that you can enter:
    Property
    Description
    Lineage Direction
    The direction of data flow between assets that you extract from Apache Atlas with the direction parameter of the LineageRESTAPI.
    Select one of the following options:
    • - BOTH. Extracts both input and output data flow between assets.
    • - INPUT. Extracts only input data flow between assets.
    • - OUTPUT. Extracts only output data flow between assets.
    Lineage Depth
    The number of lineage hops to extract from Apache Atlas for filtered assets with the depth parameter of the LineageRESTAPI.
    Default is 3.
    Page Result Limit
    Advanced parameter. The maximum number of search result entries per page from a fetch using the limit parameter of the DiscoveryRESTAPI.
    Default is 1000.
    Entity Bulk Fetch Count
    Advanced parameter. The maximum number of entities to include in a bulk fetch when you use the BulkEntityRESTAPI.
    Default is 100.
    Connection Timeout
    Advanced parameter. The maximum amount of time, in milliseconds, that the Secure Agent waits to set up an HTTP connection to communicate and get a response from the Apache Atlas server. 
    Default is -1 which means timeout is disabled.
    Parallel Lineage Fetch Count
    Advanced parameter. The maximum number of LineageRESTAPI calls that can run simultaneously to retrieve lineage data.
    Default is 5.
    5Optional. In the Configuration Parameters area, enter additional settings.
    The following table describes the property that you enter for additional settings:
    Note:
    The
    Additional Settings
    section appears when you click
    Show Advanced
    .
    Property
    Description
    Expert Parameters
    Enter additional configuration options to be passed at runtime. Required if you need to troubleshoot the catalog source job.
    Caution:
    Use expert parameters when it is recommended by Informatica Global Customer Support.
    6Configure additional capabilities for the catalog source by clicking on the tabs.

Configure lineage discovery

Enable the lineage discovery capability and use CLAIRE to build complete lineage by recommending endpoint catalog source objects to assign to reference catalog source connections.
    1Click the Lineage Discovery tab.
    2Select Enable Lineage Discovery.
    3In the Filters area, define one or more filter conditions to apply for lineage discovery.
    To define filters, you can choose to select catalog source types, asset groups, or enter a catalog source name or search from a list of catalog sources.
    1. aSelect Yes to view filter options.
    2. bFrom the Include/Exclude list, choose to include or exclude catalog sources for lineage discovery based on the filter parameters.
    3. cFrom the filter type list, select catalog source type, catalog source name, or asset group.
    4. dIn the filter value field, select the required catalog source types, or click the Search button and select catalog sources or asset groups.
    5. Filters can contain the asterisk wildcard to represent multiple characters or empty text.
      The filter options appear.The filter options include multiple filter conditions that you can choose.
      Examples:
      Note:
      You can't add more than one include or exclude filter for the same filter type.
    6. eOptionally, to define an additional filter with an AND condition, click the Add icon.
    7. For more information about lineage discovery, see Lineage discovery.