Option | Description |
---|---|
Enable data domain discovery | Performs data domain discovery as part of enterprise discovery. |
Run data domain discovery on data | Performs data domain discovery on column data. |
Run data domain discovery on column name | Performs data domain discovery on the name of each column. |
Minimum conformance percentage | The minimum conformance percentage of rows in the data set required for a data domain match. The conformance percentage is the ratio of number of matching rows divided by the total number of rows. Note: The Analyst tool considers null values as nonmatching rows. |
Minimum conforming rows | The minimum number of rows in the data set required for a data domain match. |
Exclude null values from data domain discovery | Excludes the null values from the data set for data domain discovery. |
Exclude columns with approved data domains | Excludes columns with approved data domains from the data domain inference of the profile run. |
All rows | Performs data domain discovery on all source rows. |
First | The maximum number of rows the profile can run on. The Analyst tool chooses rows starting from the first row in the source. You can choose a maximum of 2,147,483,647 rows. |
Option | Description |
---|---|
Enable column profiling | Runs a column profile as part of enterprise discovery. |
Exclude approved data types and data domains from the data type and data domain inference in the subsequent profile runs | Excludes the approved data type or data domain from data type and data domain inference from the next profile run. |
Option | Description |
---|---|
Native | The Analyst tool submits the profile jobs to the Profiling Service Module. The Profiling Service Module then breaks down the profile jobs into a set of mappings. The Data Integration Service runs these mappings and writes the profile results to the profiling warehouse. |
Blaze | The Data Integration Service pushes the profile logic to the Blaze engine on the Hadoop cluster to run profiles. |
Spark | The Data Integration Service pushes the profile logic to the Spark engine on the Hadoop cluster to run profiles. |
Option | Description |
---|---|
All Rows | Runs a column profile on all rows in the data source. Supported on Native, Blaze, and Spark run-time environment. |
First <number> Rows | Runs a profile on the sample rows from the beginning of the rows in the data object. You can choose a maximum of 2,147,483,647 rows. Supported on Native and Blaze run-time environment. |
Limit n <number> Rows | Runs a profile based on the number of rows in the data object. When you choose to run a profile in the Hadoop validation environment, Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size. The Limit n sampling option supports Oracle, SQL Server, and DB2 databases. You cannot apply the Advanced filter with the Limit n sampling option. You can select a maximum of 2,147,483,647 rows. Supported on Spark run-time environment. |
Random percentage | Runs a profile on a percentage of rows in the data object. Supported on Spark run-time environment. |