Enterprise Data Catalog Scanner Configuration Guide > Configuring Informatica Platform Resources > Informatica Data Quality
  

Informatica Data Quality

When you run a profile in Informatica Analyst or Informatica Developer, the Data Integration Service stores the profile results in the profiling warehouse. To view these results in Enterprise Data Catalog, you can create an Informatica Data Quality resource in Catalog Administrator and associate the profiling warehouse to the resource. When you run the resource, the Informatica Data Quality scanner extracts and migrates the profile results in the associated profiling warehouse to the catalog. The migrated profile results include column profile results, data domain discovery results, rule profiling results, scorecards and value frequencies. You can migrate the profile results from the current or earlier versions on any Informatica domain. Informatica Data Quality resource supports relational database systems.
In Catalog Administrator, you can create multiple profiling warehouse resources. When you create a profiling warehouse resource, you associate a profiling warehouse to the resource. Enterprise Data Catalog scans the associated profiling warehouse and lists all the schemas with profile results. You can choose one or more schemas for which the Informatica Data Quality scanner migrates the results to the catalog.
When you run the Informatica Data Quality scanner, the catalog compares the timestamp of the results in the catalog with the migrated results. Enterprise Data Catalog displays the latest profile results and curation results based on the timestamp. Enterprise Data Catalog merges the inference results with the existing results in the catalog.
You can view the value frequency for a column or rule in the Developer tool or Analyst tool. The Informatica Data Quality scanner migrates the value frequencies for the columns along with the profile results to the catalog. The profiling warehouse stores a maximum of 16,000 values for a column. The scanner identifies the top 1,000 maximum values for a column and migrates these values to Enterprise Data Catalog.
Example
Assume that you have more than 1 million tables spread across 700 schemas and multiple databases in your enterprise. Over the years, you have run profiles on these schemas in the Developer tool or Analyst tool. All the profiling results reside in one or more profiling warehouse databases. Now, you want to implement Enterprise Data Catalog in your enterprise and want to access and view the existing profile results for the schemas in the catalog. Instead of running the profiles on the schemas and databases in Catalog Administrator, which is a time-consuming and resource-intensive effort, you decide to migrate the results to the catalog. Additionally, you want the developers and analysts to continue using the Developer tool and Analyst tool.
In this case, you can run the core scanners to extract the metadata from the schemas and databases to the catalog. Then, you can create a profiling warehouse resource, choose the schemas and connections for which you want to migrate the profile results, and run the profiling warehouse resource. The scanner migrates the profile results of the selected schemas to the catalog. This action saves time and effort. An added advantage is that the users of the Developer tool and Analyst tool can continue to use the tools and you can migrate the results as and when required.

Extracting Rule Profile Results

You can apply rules and run rule profiles in the Developer tool and the Analyst tool. You can use the following methods to create or apply rules in the column profiles:
When you run Informatica Data Quality resource, the resource extracts the rule profile results to the catalog. You cannot apply expression rules to the profile. You cannot associate business terms with rule asset in the Enterprise Data Catalog. To learn how to create and apply rules to the profiles in the Developer tool and Analyst tool, see Informatica Data Discovery Guide.

Extracting Scorecard Results

You can create and edit a scorecard in the Developer tool and Analyst tool. A scorecard has multiple components, such as metrics, metric groups, scores, and thresholds. Scorecards help an enterprise to measure the value of data quality at the metric level. You can create metric groups to group related metrics. After you run profile, you can add columns from the profile results as metric to a scorecard. When you run a scorecard, the Analyst tool and Developer tool generates weighted average value for each metric group. When you run data quality resource, the resource extracts the scorecard results to the catalog.
You cannot associate business terms with scorecard asset in the Enterprise Data Catalog. To learn how to create, edit, and run scorecards in the Developer tool and Analyst tool, see Informatica Data Discovery Guide.

Objects Extracted

The Informatica Data Quality scanner extracts the following profiling metrics from the profiling warehouse to the catalog:

Prerequisites

Perform the following step to complete the prerequisites:

Basic Information

The General tab includes the following basic information about the resource:
Information
Description
Name
The name of the resource.
Description
The description of the resource.
Resource type
The type of the resource.
Execute On
You can choose to execute on the default catalog server or offline.

Resource Connection Properties

The General tab includes the following properties:
Property
Description
Extract
To extract the results from the profiling warehouse, choose one of the following options:
  • - Rule
  • - Scorecard
  • - Profile
  • - Profile and Value Frequency
Domain Name
Name of the Informatica domain.
Note: This property applies to the versions 10.2.0 HotFix 2 and later.
Domain User Name
User name that the Data Integration Service uses to access the Model Repository Service.
Domain Password
Password for the Model repository user.
Security Domain
Name of the security domain to which the domain user belongs.
Domain Host
Informatica domain host name.
Domain Port
Informatica domain port number.
Node Name
Name of the node on which the Data Integration service runs.
Repository Service Name
Name of the Model Repository Service.
Profiling Warehouse Name
Name of the Profiling Warehouse.
The Metadata Load Settings tab includes the following properties:
Property
Description
Enable Source Metadata
Select the option to extract metadata from the data sources.
Profiled Schema Connections
Select one or more schemas in the Select Profiled Schema Connections dialog box. The profiling warehouse scanner migrates the profiling results of the selected schemas to the catalog.
Cumulative
Select or clear the option as necessary.
  • - Select the option if you want the scanner to scan all the profile results for a data source to extract the latest column results based on the timestamp.
  • For more information about the option, see the examples in the Troubleshooting topic.
  • - Clear the option if you want to fetch the latest profile results.
Auto Accept Percentage
Enter a value from 0 to 100. Enterprise Data Catalog automatically accepts the data domains when the inference percentage exceeds the configured number.
Incremental
Select or clear the option as necessary.
  • - Select the option if you want to migrate the delta of profile results in each run. In the first profiling warehouse resource run, the scanner migrates the profile results for all the tables.. In subsequent resource runs, only the delta of profile results are migrated.
  • For more information about the option, see the examples in the Troubleshooting topic.
  • - Clear the option if you want to fetch the latest profile results.
Memory
Specify the memory required to run the scanner job.
Select one of the following values based on the migrated data set size:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
Custom Options
Enter the JVM parameters to configure the scanner container.
Track Data Source Changes
Select the option to view the metadata source change notification in Enterprise Data Catalog.

Informatica Data Quality Resource Tasks

When you run the Informatica Data Quality resource, the scanner performs the following tasks:
    1. Verifies that the catalog has the source metadata for the resource.
    For example, to migrate Oracle profile results, you must first create and run the Oracle resource in Catalog Administrator.
    2. Verifies that the matching resource has profile results.
    3. Verifies that the matching resource has curation decisions.
    4. If curation results do not exist, the scanner migrates the results to the catalog.
    5. If curation results exist, the scanner compares the timestamp of the curation decisions in the catalog with the curation decision in the profiling warehouse. The scanner migrates and overwrites the curation results if the curation results in the profiling warehouse have the latest timestamp.
    For example, you reject the SSN data domain for a column in Employee table in the catalog on 05/01/19. You accept the SSN data domain for the same column in the Developer tool on 05/02/2019. You run the Informatica Data Quality scanner on 05/03/19. The curation decision for the data domain SSN in the profiling warehouse has the latest timestamp. The scanner migrates and overwrites the curation decision in the catalog. The data domain SSN displays accepted in the catalog.
    Note: Enterprise Data Catalog displays SYSTEM in the Assigned By column for a data domain asset when you curate a data domain in the Developer tool or Analyst tool and the decision is migrated to the catalog.
    6. Merges the data domain inferred results.
    For example, in the Developer tool, you accept the data domains SSN and Age in the Employee table. In Enterprise Data Catalog, you accept the data domains Address and City in the same Employee table. When you run the Informatica Data Quality scanner, the scanner merges the data domain accepted results and displays SSN , Age, Address, and City as inferred data domains for the Employee table in the catalog.
    7. Automatically accepts a data domain if the inference percentage exceeds the Auto Accept Percentage value.
    8. Identifies the top 1,000 maximum values for a column and migrates the values to the catalog.
    9. In Catalog Administrator, when you click Missing Report link tab in the Connection Assignment section, the exported file does not display the details of source and target objects that are not linked.