Catalog Source Configuration > Amazon Redshift > Create catalog sources in Metadata Command Center
  

Create catalog sources in Metadata Command Center

Use Metadata Command Center to configure a catalog source for Amazon Redshift and run the catalog source job.
When you configure a catalog source, you define the source system that you want to extract metadata from. Configure filters to include or exclude source system metadata before you run the job. Optionally, configure other capabilities, such as data profiling and quality, data classification, relationship discovery, and glossary association.
To provide stakeholders access to technical assets, you can assign access through stakeholder roles. You can also associate technical assets extracted from the catalog source to asset groups. If your catalog source references other source systems, you can create a connection assignment to the endpoint catalog source to view complete lineage.

Step 1. Register a catalog source

When you register a catalog source, provide general information and connection values.
    1Log in to Informatica Intelligent Cloud Services.
    The My Services page appears.
    2Click Metadata Command Center.
    The following image shows the Metadata Command Center box on the My Services page:
    The screenshot shows the Informatica Intelligent Cloud Services selelction screen with the Metadata Command Center highlighted.
    The Metadata Command Center home page appears.
    3Click New.
    4Select Catalog Source from the list of asset types.
    5Select Amazon Redshift from the list of catalog source types.
    The following image shows the Amazon Redshift catalog source: The image shows the list of catalog sources with the Amazon Redshift catalog source selected and an option to create a new catalog source.
    6Click Create.
    The New Catalog Source page opens.
    The following image shows the Registration tab on the New Catalog Source page:
    The image shows the New Catalog Source page with the Amazon Redshift catalog source type selected.
    7In the General Information section, enter a name and an optional description for the catalog source.
    Note: You can rename a catalog source after you create it, but to apply the change to all associated objects you must rerun the metadata extraction job.
    8In the Connection Information area, select the connection that you created in Administrator.
    9Click Connection Properties to expand and view the connection properties for the selected connection.
    10Click Test Connection to test your connection to the source system.
    11Click Next.
The Configuration page appears.

Step 2. Configure capabilities

When you configure the Amazon Redshift catalog source, you define the settings for the metadata extraction capability and other optional capabilities.
The metadata extraction capability extracts source metadata from external source systems. You can also configure other capabilities that the catalog source includes.
You can save the catalog source configuration at any point after you enter the connection information. After you save the catalog source, you can choose to run the catalog source job. To run the job once, click Run. To run metadata extraction and other capabilities on a recurring schedule, configure schedules on the Schedule tab.

Configure metadata extraction

When you configure the Amazon Redshift catalog source, you choose a runtime environment, define filters, and enter configuration parameters for metadata extraction.
    1In the Connection and Runtime area, choose a serverless runtime environment or the Secure Agent group where you want to run catalog source jobs.
    Note: Serverless runtime environment options are available if the catalog source works with a serverless runtime environment.
    2Choose to retain or delete objects that are deleted from the source system in the catalog using the Metadata Change Option.
    Note: You can also change the configured metadata change option when you run a catalog source.
    3In the Filters area, define one or more filter conditions to apply for metadata extraction:
    1. aSelect Filter conditions.
    2. b From the Include or Exclude metadata list, choose to include or exclude metadata based on the filter parameters.
    3. cFrom the Object type list, select Tables, Views, External tables, or Stored procedures depending on the object that you want to extract metadata from. Select All to extract metadata from all objects.
    4. dEnter the object location as the filter value.
    5. Filters can contain the following wildcards:
      For object hierarchies, use a dot as a separator. When you enter values for filters, enclose them in double quotes if you use a space or a dot in a single segment.
      The following image shows the filter condition options:
      The image shows the filter conditions for the Amazon Redshift catalog source.
    4To define an additional filter with an OR condition, click the Add icon.
    The following image shows a filter that includes metadata from all objects in schemas with names that start with Schema and excludes metadata from all tables with names that start with table followed by one additional character in the Schema1 schema.
    The image shows a sample filter with one include condition and one exclude condition.
    5Optional. In the Configuration Parameters area, enter properties to override default content values and job parameters. Click Show Advanced to view all configuration parameters.
    The following table describes the properties that you enter for Catalog Source Configuration options:
    Parameter
    Description
    Default variables values
    Specify a default value for variables used in the programmable objects.
    MetaTables Include Filter
    Advanced parameter. When you process PL/SQL statements, Metadata Command Center does not read tables or view content by default. If you want to use the content, for example, to process dynamic SQL statements, use the MetaTables Include Filter parameter. This parameter prompts the database for the required metadata. Verify that the user has SELECT permissions for metatables.
    Note: Don't use this option to specify filters for tables that you want to include or exclude during the metadata extraction run.
    6Optional. In the Configuration Parameters area, enter additional settings.
    The following table describes the property that you enter for additional settings:
    Note: The Additional Settings section appears when you click Show Advanced.
    Property
    Description
    Expert Parameters
    Enter additional configuration options to be passed at runtime. Required if you need to troubleshoot the catalog source job.
    Caution: Use expert parameters when it is recommended by Informatica Global Customer Support.
    7Configure additional capabilities for the catalog source by clicking on the tabs.

Configure data profiling and quality

Enable the data profiling capability to evaluate the quality of metadata extracted from the Amazon Redshift source system.
    1Click the Data Profiling and Quality tab.
    2Expand Data Profiling and select Enable Data Profiling.
    3In the Connection and Runtime area, choose the Secure Agent group where you want to run catalog source jobs.
    4Optionally, specify data profiling filters to run the profile on a subset of the metadata that you extract.
    1. aSelect Yes to view filter options.
    2. bFrom the Include/Exclude list, choose to include or exclude metadata.
    3. cFrom the Object type list, select Schema.
    4. dEnter a value to specify the object location.
    5. Examples:
      To include or exclude multiple objects, click the Add icon to add filters with the OR condition.
    5In the Parameters area, configure the parameters.
    The following table describes the parameters that you can enter:
    Parameter
    Description
    Modes of Run
    Determine the type of data that you want the data profiling task to collect.
    Choose one of the following options:
    • - Keep signatures only. Collects only aggregate information such as data types, average, standard deviation, and patterns.
    • - Keep signatures and values. Collects both signatures and data values.
    Profiling Scope
    Determine whether you want to run data profiling only on the changes made to the source system or on the entire source system.
    Choose one of the following options:
    • - Incremental. Includes only source metadata that is changed or updated since the last profile run.
    • - Full. Includes the entire metadata that is extracted based on the filters applied for extraction.
    Sampling Type
    Determine the sample rows on which you want to run the data profiling task.
    Choose one of the following options:
    • - All Rows. Runs data profiling on all rows in the metadata.
    • - Limit N Rows. Runs data profiling on a limited number of rows.
    • - Random N Rows. Runs data quality on the selected number of random rows.
    No of rows to limit
    Required if you select Limit N Rows in Sampling Type. Specify the number of rows on which you want to run data profiling.
    No of random rows to limit
    Required if you select Random N Rows in Sampling Type. Specify the number of random rows on which you want to run data profiling.
    S3 Bucket Name
    The path to the Amazon S3 bucket that is used to store staging data.
    Maximum Precision of String Fields
    The maximum precision value for profiles on string data type. You can set a maximum precision value of 255 characters. Default is 50.
    Text Qualifier
    The character that defines string boundaries. If you select a quote character, profiling ignores delimiters within the quotes. Select a qualifier from the list. Default is Double Quote.
    6Expand Data Quality and select Enable Data Quality.
    Note: You can click Use Data Profiling Parameters to use the same parameters as in the Data Profiling section.
    7In the Connection and Runtime area, choose the Secure Agent group where you want to run catalog source jobs.
    8In the Parameters area, configure the parameters.
    The following table describes the properties that you can enter:
    Parameter
    Description
    Data Quality Automation
    Enable the option to automatically create or update rule occurrences for data elements in the catalog source.
    Choose one of the following options:
    • - Apply on Data Elements linked with Business Dataset. Creates rule occurrences for all data elements that are linked with business data sets in the catalog source.
    • - Apply on all Data Elements. Creates rule occurrences for all data elements in the catalog source.
    Cache Result
    Specify how you want to preview rule occurrence results. Select Agent Cache if you want to generate a cache file in the runtime environment and to preview the cached results faster in subsequent data preview runs. The results are cached for seven days by default after the first run in the runtime environment. Select No Cache if you don't want to cache the preview results and view the live results.
    Run Rule Occurrence Frequency
    Specify whether you want to run data quality rules based on the frequency defined for the rule occurrence in Data Governance and Catalog.
    Sampling Type
    Determine the sample rows on which you want to run the data quality task. Choose one of the following options:
    • - All Rows. Runs data quality on all rows in the metadata.
    • - Limit N Rows. Runs data quality on a limited number of rows.
    • - Random N Rows. Runs data quality on the selected number of random rows.
    No of rows to limit
    Required if you select Limit N Rows in Sampling Type. Specify the number of rows on which you want to run data quality.
    No of random rows to limit
    Required if you select Random N Rows in Sampling Type. Specify the number of random rows on which you want to run data quality.
    S3 Bucket Name
    The path to the Amazon S3 bucket that is used to store staging data.
    Maximum Precision of String Fields
    The maximum precision value for profiles on string data type. You can set a maximum precision value of 255 characters. Default is 50.
    Text Qualifier
    The character that defines string boundaries. If you select a quote character, data quality ignores delimiters within the quotes. Select a qualifier from the list. Default is Double Quote.
    9To enable the data observability capability, expand Data Observability and select Enable Data Observability.

Configure data classification

Enable the data classification capability to identify and organize data into relevant categories based on the functional meaning of the data.
    1Click the Data Classification tab.
    2Select Enable Data Classification.
    3Choose one or both of the following options:

Configure relationship discovery

Enable the relationship discovery capability to identify pairs of similar columns and relationships between tables within a catalog source.
Before you configure relationship discovery, perform the following tasks:
    1Click the Relationship Discovery tab.
    2Select Enable Relationship Discovery.
    3In the Column Similarity area, select the Relationship Inference Model.
    Note: The relationship inference models that you imported appear in the Relationship Inference Model field.
    4In the Joinable Tables Relationship area, specify the Containment Score Threshold to identify joinable table relationships within the catalog source. This score is an indicator of the data overlap between any two given columns which determines whether the tables are joinable.
    Note: A higher score means that the objects have more overlapping data and a lower score means lesser overlapping data between the two objects. A containment score threshold lower than 0.4 might result in a large number of false positives.
After you run the catalog source job, you can view the inferred relationships on the Relationships tab of the extracted assets in Data Governance and Catalog.

Configure glossary associations

Enable the glossary association capability to associate glossary terms with technical assets, or to get recommendations for glossary terms that you can manually associate with technical assets in Data Governance and Catalog.
Metadata Command Center considers all published business terms in the glossary while making recommendations to associate your technical assets.
    1Click the Glossary Association tab.
    2Select Enable Glossary Association.
    3Select Enable auto-acceptance to automatically accept glossary association recommendations.
    Note: If disabled, you must manually accept glossary association recommendations.
    4Specify the Confidence Score Threshold for Auto-Acceptance to set a threshold limit based on which the glossary association capability automatically accepts the recommended glossary terms.
    Note: Specify a percentage from 80 to 100. If the score is higher than the specified limit, the glossary association capability automatically assigns a matching glossary term to the data element.
    5Optional. Choose to ignore specific parts of data elements when making recommendations. Select Yes and enter prefix and suffix keyword values as needed.
    Click Select to enter a keyword. You can enter multiple unique prefix and suffix keywords. Keyword values are case insensitive.
    6Optional. Choose specific top-level business glossary assets to associate with technical assets. Selecting a top-level asset selects its child assets as well. Select Top-level Glossary Assets and specify the assets on the Select Assets page.
    7Click Next.
    The Associations page appears.

Step 3. Associate stakeholders and asset groups

Associate users or user groups within a stakeholder role as stakeholders for technical assets in Data Governance and Catalog. Also, you can choose to assign technical assets extracted from the catalog source to asset groups. You can then use access policies to control permissions on assets that are assigned to asset groups.
Verify that the administrator assigned users and user groups to the stakeholder role that you want to associate with technical assets.
    1To associate users or user groups as stakeholders with technical assets extracted from the catalog source, perform the following steps:
    1. aOn the Associations page, click Stakeholders.
    2. bSelect Assign Stakeholders.
    3. c Select a stakeholder role.
    4. dClick Select to add users and user groups from the stakeholder role as stakeholders for the technical assets.
    5. The Add Users & User Groups dialog box displays a list of users and user groups assigned to the selected stakeholder role.
      The image shows the Add Users and User Groups dialog box. You can choose to add a user or a user group. Depending on your choice, the page displays a list of users or user groups you can assign as stakeholders for the asset.
    6. eSelect one or more users or user groups to assign as stakeholders for the technical assets, and click OK.
    7. Only the selected users and user groups belonging to the specified stakeholder role are granted the permissions to technical assets.
    8. fTo assign users or user groups from another stakeholder role, click Add and then repeat the steps.
    2To assign asset groups to technical assets extracted from the catalog source, perform the following steps:
    1. aOn the Associations page, click Asset Groups.
    2. bSelect Assign Asset Groups.
    3. cClick Select.
    4. The Select Asset Groups dialog box displays the list of asset groups.
    If you enabled an access policy that includes an asset group, you can only view assets that belong to that asset group.
    3Select the asset groups to which you want to assign technical assets extracted from the catalog source, and click OK.
    The image shows the Select Asset Groups box with one asset group selected.
    4Choose to save and run the job or to schedule a recurring job.

Step 4. Run or schedule the job

Choose to run a catalog source job manually, or configure it to run on schedule.
Note: You can't run multiple jobs simultaneously.
You can choose to perform a full or an incremental metadata extraction. A full metadata extraction extracts all objects from the source to the catalog. An incremental metadata extraction extracts only the changed and new objects since the last successful catalog source job run. Incremental metadata extraction doesn’t remove deleted objects from the catalog and doesn’t extract metadata of code-based objects if applicable.
When you run an incremental metadata extraction job with a filter to include metadata from objects, the job extracts only the objects that have the latest timestamp since the last successful job.
Note: The incremental extraction option appears if it is available for the catalog source.

Run the job manually

Click Save to save the catalog source and click Run. On the Run Catalog Source Job window, click Run to run the job.
Note: You can choose incremental metadata extraction for subsequent runs only after one full metadata extraction job completes successfully.

Run the job on a schedule

You can choose to run metadata extraction and other capabilities on a recurring schedule. You can't choose incremental metadata extraction and full metadata extraction in the same schedule. To create a schedule for incremental metadata extraction, you must have completed at least one full metadata extraction job successfully. If not, first create a schedule for a full metadata extraction.
If an incremental metadata extraction is scheduled to run when the last run details aren't available, the job first performs a full metadata extraction, followed by incremental metadata extraction on subsequent runs.
For example, this can happen in the following scenarios:
  1. 1On the Schedule tab, select Run on Schedule.
  2. The Schedule configuration page opens.
  3. 2Click the checkbox corresponding to each capability that you want to include in the schedule.
  4. 3Enter the start date, time zone, and the interval at which you want to run the job.
  5. 4You can manage additional schedules using the following options:
  6. Note: You can create a maximum of one schedule per capability that you enable. If you purged a catalog source or did not run the metadata extraction job, the catalog source job runs metadata extraction before running other scheduled capabilities.
  7. 5Click Save to save the schedule.

Monitor job status

After the job runs, you can monitor the status of the job on the Overview page of the job.
For more information about job monitoring, see Monitor jobs for technical assets.

Step 5. Assign reference catalog source connections to endpoint catalog source objects

When you run the catalog source job, if the catalog source references another source system, a reference catalog source and connection get created that point to the reference source system. To view the complete lineage for your catalog source, you can perform connection assignment from the reference catalog source connection to the objects in the reference source system. A reference source system might be a cloud service, such as Informatica Intelligent Cloud Services. You must first create and run an endpoint catalog source that connects to the reference source system.
Before you assign a connection, ensure that you have created and run an endpoint catalog source for each reference source system.
Note: If the source schema contains case-sensitive tables or if the reference objects contain multiple objects with the same name in different cases, perform case-sensitive connection assignment to get correct lineage.
    1On the Configure page, select the Lineage tab and then select the Assign Connections tab.
    The Assign Connections panel displays a list of assigned and unassigned connections along with details for each connection.
    2Select the connection to the reference source system and click Assign.
    The connection name appears prefixed to the reference catalog source name on the Hierarchy tab of your catalog source in Data Governance and Catalog.
    The Assign Connection dialog box appears with a list of objects of the endpoint catalog sources.
    3Select one or more objects from the endpoint catalog sources and click Assign.
    You can connect to the following referenced source systems:
    The referenced catalog source must belong to the Database class type.
When you click Assign, Metadata Command Center creates links between matching objects in the connected catalog sources, and it calculates the percentage of matched and unmatched objects. The higher the percentage of matched objects, the more accurate the lineage that you view in Data Governance and Catalog.