Administration > Data profiling > Data profiling configuration options
  

Data profiling configuration options

Based on your requirements, configure the options to determine the type of data that you want the data profiling task to collect, the scope of the profile run, and the sample rows on which you want to run the data profiling task.
After you enable Data Profiling on the Configuration wizard while creating a catalog source, you can configure the following options on the Data Profiling tab:

Profile filters

You can use filter conditions to reduce the scope of a data profiling task. The profile filters help you create subsets of metadata that you can use to run a data profiling task on a catalog source. For example, you choose to extract metadata from six schemas, but want to run a profiling task on three of the schemas.
You can add multiple filter conditions for each catalog source. Any user with the create, edit, or run catalog source permissions can configure the profile filters. You can apply filters on all relational and file system catalog sources that support profiling capabilities.
The profile filters depend on the filters you apply for metadata extraction. For example, your source includes a schema named Cust_Emp with tables such as Customer1, Customer2, Employee1, and Employee2. You have applied a metadata extraction filter to include metadata from Customer1, Customer2, and Employee1 tables from the schema. Now, when you apply a profile filter, you can only consider Customer1, Customer2, and Employee1 tables to run profiles.
The following are types of profile filters:
Profile filter examples for relational catalog sources
Profile filter examples for file system catalog sources
The image shoes a profile filter condition that does not run a profile on the metadata extracted from the CUSTOMER.CSV file:

Runtime environment

Select a runtime environment in which you can run data profiling tasks on a Secure Agent. If you don't select a runtime environment, the data profiling task runs in the runtime environment that your organization administrator selected when they created the connection.
Note: You can run data profiling and data quality tasks on a Windows Secure Agent configured with NTLMv2 proxy authentication.

Mode of run

To determine the type of data you want the data profiling task to collect, choose Keep Signatures Only or Keep Signatures and Values. If you choose Keep Signatures Only, the data profiling task collects only aggregate information such as data types, average, standard deviation and patterns. No data values are collected. If you want the data profiling task to collect both signatures and data values, then choose Keep Signatures and Values. Data values include minimum value, maximum value, frequent values and more.

Profile run scope

Determine whether you want to run data profiling only on the changes made to the source system or on the entire source system. Choose one of the following options:
When you enable incremental profiling for a catalog source and run data profiling for the first time, the data profiling capability profiles the entire metadata based on the filters applied for metadata extraction. Subsequently, incremental profiling discovers any change to the metadata when the metadata extraction capability is enabled and runs profiling only on those changes. For example, if you rename a column after the first run and enable incremental profiling in the subsequent run, only the table containing the renamed column is profiled.
Incremental profiling discovers the following changes to the metadata during the data profiling run:

Connection

Select the SAP Table connection to run data profiling tasks on SAP ERP objects.

Sampling type

Determine the sample rows on which you want to run the data profiling task. The sampling options vary based on the catalog source that you create. Choose one of the following options:

Elastic runtime environment

Select a runtime environment in which you can run data profiling tasks on an advanced cluster. Select an elastic runtime environment for complex file types, including AVRO and Parquet.
Note: This option is available when you configure data profiling for Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage Gen2 catalog sources.
To run a profile on an Avro or Parquet file, connect to the following types of advanced cluster in your organization:
For more information about setting up AWS, Google Cloud, and Microsoft Azure for local and fully-managed clusters, see Advanced clusters in the Administrator documentation.

Staging connection

Applicable only for elastic profile executions, that is, for Parquet and Avro sources located in Amazon S3, Microsoft Azure Data Lake Storage Gen2, and Google Cloud Storage source systems.
The staging connection where profiling results are temporarily stored while the job runs.

Maximum precision of string fields

The maximum precision value for profiles on string data type. Enter a value between 1 and 255. Default value is 50.

Text qualifier

The character that defines string boundaries. If you select a quote character, the profile ignores delimiters within the quotes. Select a qualifier from the list. Default is Double Quote.

Code page for delimited files

Select a code page that the Secure Agent can use to read and write data. Use this option to ensure that profile results for assets with non-English characters don't include junk characters. Default value is UTF-8.
Choose one of the following options:
Note: This option is available when you configure data profiling for the following catalog sources:

Escape character for delimited files

You can specify an escape character if you need to override the default escape character. An escape character ignores a delimiter character in an unquoted string if the delimiter is part of the string value.
If you specify an escape character, the data profiling task overrides the default escape character that the Metadata Extraction job detects and considers the specified escape character. It then reads the delimiter character as a part of the string value. If you don't specify an escape character, the data profiling task considers the default escape character that the Metadata Extraction job detects and reads the delimiter character as a part of the string value.
Note: This option is available when you configure data profiling for the following catalog sources: