Data Discovery Guide > Part III: Data Discovery with Informatica Developer > Data Domain Discovery in Informatica Developer > Data Domain Discovery Options in Informatica Developer
  

Data Domain Discovery Options in Informatica Developer

You can select the source columns, data domains, and inference options when you create a profile to perform data domain discovery. You can also choose to omit columns from data domain discovery based on their data types and data length.

Data Domain Selection in Informatica Developer

The Data Domain Selection options list all the domains from the data domain glossary. You can search for specific data domains and select them before you run them as a part of data domain discovery.
The following table describes the Data Domain Selection options for data domain discovery:
Option
Description
Enabled as part of the "Run Profile" action
Includes the data domain discovery options when you run the profile.
Name
Data domain name.
Description
Description for the data domain.
Data Domain Group
Name of the data domain groups to which the data domain belongs.
Show data domain group in hierarchy
Lists all data domain groups with the data domains grouped under each data domain group.

Data Domain Column Selection in Informatica Developer

You use the Column Selection options to choose the columns you want to run as a part of data domain discovery.
The following table describes the Column Selection options for data domain discovery:
Option
Description
Column
Column name.
Data type
Data type of the column.
Precision
Maximum precision for the column.
Scale
Scale of the column.
Nullable
Indicates a column that can have null values.
Description
Description for the column.

Data Domain Inference Options in Informatica Developer

The inference options determine whether domain discovery must run on column data, column name, or both. You can specify whether the profile needs to process all rows in the data source. You can choose a conformance criteria for data domain match and choose to exclude nulls from data domain discovery.
The following table describes the Inference options for data domain discovery:
Option
Description
Override the default inference options
Enables you to change the predefined inference options.
Data
Profile runs on column data.
Column name
Profile runs on column titles.
Data and Column name
Profile runs on both column data and column titles.
Maximum rows to profile
The maximum number of rows the profile can run on. The Developer tool chooses the rows starting from the first row in the source.
Minimum percentage of rows
The minimum conformance percentage of rows in the data set required for a data domain match.
Minimum number of rows
The minimum number of rows in the data set required for a data domain match.
Exclude null values from data domain discovery
Excludes the null values from the data set for data domain discovery.

Minimum Conformance Percentage

You can choose a minimum percentage of rows in the data set as a conformance criteria for data domain discovery.
The conformance percentage is the ratio of the number of matching rows divided by the total number of rows.
Note: The Developer tool considers null values as nonmatching rows. Columns containing a high number of null values might not result in data domain inference unless you specify a low value for minimum conformance percentage.

Example

You have a data source with 10,000 rows where the Comments column has Social Security Numbers in 2,500 rows. You create a column profile and data domain discovery and set a minimum percentage of rows to 30% as the conformance criteria. When you run the profile, the profile results do not display the Social Security Numbers as an inferred data domain because the minimum conformance criteria is 30% of rows or 3,000 rows in the data source.

Minimum Conforming Rows

You can choose a minimum number of rows in the data set as a conformance criteria for data domain discovery.

Example

You have a data source with 10,000 rows where the Comments column has email address in three rows. You create a column profile and data domain discovery profile and set the minimum number of rows to 1 as the conformance criteria. When you run the profile, the profile results display the email address as an inferred data domain with three conforming rows along with the other inferred data domains.

Exclude Null Values

You can exclude null values when you perform data domain discovery on a data source. When you select the minimum percentage of rows with the exclude null values option, the conformance percentage is the ratio of number of matching rows divided by the total number of rows minus the null values in the column.
The data domain discovery process differs when you choose the Exclude null values from data domain discovery option and the multiple sampling options or filters.
The following scenarios explain the data domain discovery results when you choose the exclude null values option along with a sampling option and filters:

Example

You have a data source with 10,000 rows where 3,000 rows have Social Security Numbers in the Comments column. You create a column profile and data domain discovery and choose the following options:
When you run the profile, the profile runs on the data set and ignores the null values for data domain discovery.