Administration > Data quality > Data quality configuration options
  

Data quality configuration options

Based on your requirements, configure the options to determine the type of data that you want the data quality task to collect, the scope of the data quality run, and the sample rows on which you want to run the data quality task.
After you enable Data Quality on the Configuration wizard while creating a catalog source, you can configure the following options on the Data Profiling and Quality tab:

Runtime environment

Select a runtime environment in which you can run data quality tasks on a Secure Agent. If you don't select a runtime environment, the data quality task runs in the runtime environment that your organization administrator selected when they created the connection.
Note: You can run data profiling and data quality tasks on a Windows Secure Agent configured with NTLMv2 proxy authentication.

Data Quality Automation

Select to enable data quality automation for assets in the catalog source. When you enable data quality automation and run the catalog source job in Metadata Command Center, a data quality automation job is triggered, and rule occurrences are automatically created and associated with all data elements that are linked to corresponding glossary business assets in Data Governance and Catalog.
Choose one of the following options:
The following table describes different options that influence the data quality automation process:
isAutomated option on rule templates in Data Governance and Catalog
Data quality option in Metadata Command Center
Data quality automation option in Metadata Command Center
Result
Yes
Yes
Yes
Create rule occurrences for all data elements that are associated with glossary business assets.
Yes
Yes
No
Does not create new rule occurrences for data elements or update an existing rule occurrence on data elements. Does not affect the execution of the existing rule occurrences in Data Governance and Catalog.
Yes
No
Not applicable
Does not create any rule occurrences for data elements. Data quality execution stops for existing rule occurrences that are associated with assets of a particular catalog source.
No
Yes
Yes
Does not create rule occurrences for data elements. Does not affect the execution of the existing rule occurrences in Data Governance and Catalog
For more information about data quality automation, see the Asset Details in the Data Governance and Catalog help system.

Cache Result

Specify how you want to preview the rule occurrence results in Data Governance and Catalog.
Choose one of the following options:
Note: Run the catalog source again whenever you change the Cache Result option from Agent Cache to No Cache.

Connection

Select the SAP Table connection to run data quality tasks on SAP ERP objects.

Run Rule Occurrence Frequency

Specify whether you want to run data quality rules based on the frequency defined for the rule occurrence in Data Governance and Catalog.
Choose one of the following options:
Note: Ensure that the data quality schedule has not expired. The data quality rules don't run if the data quality schedule that you configured for the catalog source is expired.

Sampling type

Determine the sample rows on which you want to run the data quality task. The sampling options vary based on the catalog source that you create.
Choose one of the following options:

Elastic runtime environment

Select a runtime environment in which you can run data quality tasks on an advanced cluster. Select an elastic runtime environment for complex file types, including AVRO and Parquet.
Note: This option is available when you configure data quality for Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage Gen2 catalog sources.
To run data quality on an Avro or Parquet file, connect to the following types of advanced cluster in your organization:
For more information about setting up AWS, Google Cloud, and Microsoft Azure for local and fully-managed clusters, see Advanced clusters.

Staging connection

Applicable only for elastic data quality executions, that is, for Parquet and Avro sources located in Amazon S3, Microsoft Azure Data Lake Storage Gen2, and Google Cloud Storage source systems.
This is the staging connection where data quality results are stored temporarily during the execution.

Maximum precision of string fields

The maximum precision value for profiles on string data type. Enter a value between 1 and 255.

Text qualifier

The character that defines string boundaries. If you select a quote character, the data quality task ignores delimiters within the quotes. Select a qualifier from the list. Default is Double Quote.

Code page for delimited files

Select a code page that the Secure Agent can use to read and write data. Use this option to ensure that rule results for assets with non-English characters don't include junk characters. Default value is UTF-8.
Choose one of the following options:
Note: This option is available when you configure data quality for the following catalog sources:

Escape character for delimited files

You can specify an escape character if you need to override the default escape character. An escape character ignores a delimiter character in an unquoted string if the delimiter is part of the string value.
If you specify an escape character, the data quality task overrides the default escape character that the Metadata Extraction job detects and considers the specified escape character. It then reads the delimiter character as a part of the string value. If you don't specify an escape character, the data quality task considers the default escape character that the Metadata Extraction job detects and reads the delimiter character as a part of the string value.
Note: This option is available when you configure data quality for the following catalog sources: