Enterprise Data Catalog Scanner Configuration Guide > Configuring Informatica Platform Resources > Informatica Data Quality
  

Informatica Data Quality

When you run a profile in Informatica Analyst or Informatica Developer, the Data Integration Service stores the profile results in the profiling warehouse. To view these results in Enterprise Data Catalog, you can create an Informatica Data Quality resource in Catalog Administrator and associate the profiling warehouse to the resource. When you run the resource, the Informatica Data Quality scanner extracts and migrates the profile results in the associated profiling warehouse to the catalog. The migrated profile results include column profile results, data domain discovery results, rule profiling results, scorecards and value frequencies. You can migrate the profile results from the current or earlier versions on any Informatica domain. Informatica Data Quality resource supports relational database systems.
In Catalog Administrator, you can create multiple profiling warehouse resources. When you create a profiling warehouse resource, you associate a profiling warehouse to the resource. Enterprise Data Catalog scans the associated profiling warehouse and lists all the schemas with profile results. You can choose one or more schemas for which the Informatica Data Quality scanner migrates the results to the catalog.
When you run the Informatica Data Quality scanner, the catalog compares the timestamp of the results in the catalog with the migrated results. Enterprise Data Catalog displays the latest profile results and curation results based on the timestamp. Enterprise Data Catalog merges the inference results with the existing results in the catalog.
You can view the value frequency for a column or rule in the Developer tool or Analyst tool. The Informatica Data Quality scanner migrates the value frequencies for the columns along with the profile results to the catalog. The profiling warehouse stores a maximum of 16,000 values for a column. The scanner identifies the top 1,000 maximum values for a column and migrates these values to Enterprise Data Catalog.
Example
Assume that you have more than 1 million tables spread across 700 schemas and multiple databases in your enterprise. Over the years, you have run profiles on these schemas in the Developer tool or Analyst tool. All the profiling results reside in one or more profiling warehouse databases. Now, you want to implement Enterprise Data Catalog in your enterprise and want to access and view the existing profile results for the schemas in the catalog. Instead of running the profiles on the schemas and databases in Catalog Administrator, which is a time-consuming and resource-intensive effort, you decide to migrate the results to the catalog. Additionally, you want the developers and analysts to continue using the Developer tool and Analyst tool.
In this case, you can run the core scanners to extract the metadata from the schemas and databases to the catalog. Then, you can create a profiling warehouse resource, choose the schemas and connections for which you want to migrate the profile results, and run the profiling warehouse resource. The scanner migrates the profile results of the selected schemas to the catalog. This action saves time and effort. An added advantage is that the users of the Developer tool and Analyst tool can continue to use the tools and you can migrate the results as and when required.

Extracting Rule Profile Results

You can apply rules and run rule profiles in the Developer tool and the Analyst tool. You can use the following methods to create or apply rules in the column profiles:
When you run Informatica Data Quality resource, the resource extracts the rule profile results to the catalog. You cannot apply expression rules to the profile. You cannot associate business terms with rule asset in the Enterprise Data Catalog. To learn how to create and apply rules to the profiles in the Developer tool and Analyst tool, see Informatica Data Discovery Guide.

Extracting Scorecard Results

You can create and edit a scorecard in the Developer tool and Analyst tool. A scorecard has multiple components, such as metrics, metric groups, scores, and thresholds. Scorecards help an enterprise to measure the value of data quality at the metric level. You can create metric groups to group related metrics. After you run profile, you can add columns from the profile results as metric to a scorecard. When you run a scorecard, the Analyst tool and Developer tool generates weighted average value for each metric group. When you run data quality resource, the resource extracts the scorecard results to the catalog.
You cannot associate business terms with scorecard asset in the Enterprise Data Catalog. To learn how to create, edit, and run scorecards in the Developer tool and Analyst tool, see Informatica Data Discovery Guide.

Objects Extracted

The Informatica Data Quality scanner extracts the following profiling metrics from the profiling warehouse to the catalog:

Prerequisites

Perform the following step to complete the prerequisites:

Resource Connection Properties

The General tab includes the following properties:
Property
Description
Name
Enter a name for the profiling warehouse resource.
Description
Optionally, you can enter a brief description for the resource.
Resource type
Choose Informatica Data Quality option as the resource type.
Profiling Warehouse Database Type
Choose Oracle, IBM DB2, or Microsoft SQL Server as the profiling warehouse database type.
User
Enter the user name to access the profiling warehouse database.
Password
Enter the password to access the profiling warehouse database.
Host
Enter the host name or IP address of the Informatica domain, or enter the fully qualified JDBC connection string.
For sample JDBC connection strings, see the Troubleshooting topic.
Port
Enter the Informatica domain port number, or the port number of the database engine service. The default value is 1521.
Service
Enter the Oracle service name.
This option appears when the profiling warehouse database is Oracle.
Database
Enter the Microsoft SQL Server database name.
The scanner uses the value as a constraint to import the Microsoft SQL Server database. You can enter one or more database names separated by commas. When you specify a database name, the scanner imports only the schemas and tables in the database. For example: if you enter AdvDW, the scanner imports tables and schemas from the AdvDW database.
This option appears when the profiling warehouse database is Microsoft SQL Server or IBM DB2.
Schema Name
Enter the schema name.
The scanner uses the value as a constraint to specify the non-default schema of the Microsoft SQL Server database.
This option appears when the profiling warehouse database is Microsoft SQL Server.
Extract Results of
Choose rule, scorecard, profile, or profile and value frequency to extract the results from the profiling warehouse.
Domain Name
Name of the Informatica domain.
Note: This property applies to the versions 10.2.0 HotFix 2 and later.
Domain User Name
User name that the Data Integration Service uses to access the Model Repository Service.
Domain Password
Password for the Model repository user.
Security Domain
Name of the security domain to which the domain user belongs.
Domain Host
Host name of the master gateway node.
Domain Port
Port number of the master gateway node.
Repository Service Name
Name of the Model Repository Service.
Warehouse connection Name
Name of the Profiling Warehouse.
The Metadata Load Settings tab includes the following properties:
Property
Description
Enable Source Metadata
Select the option to extract metadata from the data sources.
Profiled Schema Connections
Select one or more schemas in the Select Profiled Schema Connections dialog box. The profiling warehouse scanner migrates the profiling results of the selected schemas to the catalog.
Cumulative
Select or clear the option as necessary.
  • - Select the option if you want the scanner to scan all the profile results for a data source to extract the latest column results based on the timestamp.
  • For more information about the option, see the examples in the Troubleshooting topic.
  • - Clear the option if you want to fetch the latest profile results.
Auto Accept Percentage
Enter a value from 0 to 100. Enterprise Data Catalog automatically accepts the data domains when the inference percentage exceeds the configured number.
Incremental
Select or clear the option as necessary.
  • - Select the option if you want to migrate the delta of profile results in each run. In the first profiling warehouse resource run, the scanner migrates the profile results for all the tables.. In subsequent resource runs, only the delta of profile results are migrated.
  • For more information about the option, see the examples in the Troubleshooting topic.
  • - Clear the option if you want to fetch the latest profile results.
Memory
Specify the memory required to run the scanner job.
Select one of the following values based on the migrated data set size:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
JVM Options
Enter the JVM parameters to configure the scanner container.
Track Data Source Changes
Select the option to view the metadata source change notification in Enterprise Data Catalog.