Configuration Options for Enterprise Discovery

The data domain discovery settings include choosing whether data domain discovery must run on column data, column name, or both column data and column name. You can choose data domains and specify whether data domain discovery needs to process all the rows in the data source. You can choose a conformance criteria for data domain discovery. You can exclude nulls from data domain discovery.

Option	Description
Enable data domain discovery	Performs data domain discovery as part of enterprise discovery.
Run data domain discovery on data	Performs data domain discovery on column data.
Run data domain discovery on column name	Performs data domain discovery on the name of each column.
Minimum conformance percentage	The minimum conformance percentage of rows in the data set required for a data domain match. The conformance percentage is the ratio of number of matching rows divided by the total number of rows. Note: The Analyst tool considers null values as nonmatching rows.
Minimum conforming rows	The minimum number of rows in the data set required for a data domain match.
Exclude null values from data domain discovery	Excludes the null values from the data set for data domain discovery.
Exclude columns with approved data domains	Excludes columns with approved data domains from the data domain inference of the profile run.
All rows	Performs data domain discovery on all source rows.
First	The maximum number of rows the profile can run on. The Analyst tool chooses rows starting from the first row in the source. You can choose a maximum of 2,147,483,647 rows.

Column Profile Settings

Option	Description
Enable column profiling	Runs a column profile as part of enterprise discovery.
Exclude approved data types and data domains from the data type and data domain inference in the subsequent profile runs	Excludes the approved data type or data domain from data type and data domain inference from the next profile run.

Option	Description
Native	The Analyst tool submits the profile jobs to the Profiling Service Module. The Profiling Service Module then breaks down the profile jobs into a set of mappings. The Data Integration Service runs these mappings and writes the profile results to the profiling warehouse.
Blaze	The Data Integration Service pushes the profile logic to the Blaze engine on the Hadoop cluster to run profiles.
Spark	The Data Integration Service pushes the profile logic to the Spark engine on the Hadoop cluster to run profiles.

Option	Description
All Rows	Runs a column profile on all rows in the data source. Supported on Native, Blaze, and Spark run-time environment.
First <number> Rows	Runs a profile on the sample rows from the beginning of the rows in the data object. You can choose a maximum of 2,147,483,647 rows. Supported on Native and Blaze run-time environment.
Limit n <number> Rows	Runs a profile based on the number of rows in the data object. When you choose to run a profile in the Hadoop validation environment, Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size. The Limit n sampling option supports Oracle, SQL Server, and DB2 databases. You cannot apply the Advanced filter with the Limit n sampling option. You can select a maximum of 2,147,483,647 rows. Supported on Spark run-time environment.
Random percentage	Runs a profile on a percentage of rows in the data object. Supported on Spark run-time environment.

Configuration Options for Enterprise Discovery

Data Domain Discovery Settings

Column Profile Settings