Exception Management > Introduction to exception management > Identifying the source data set

Identifying the source data set

As the first step in an exception management process, you select the data set in which you expect to find exception records. The structure of the data set that you select determines the configuration of the assets that you use in exception management.

For example, you might decide to search the output from an earlier data process for records that contain unresolved data quality issues. The data might include records that contain duplicate information, or records with incorrect or non-standard data values, or records with null values in one or more columns. To begin the exception management process, you identify the columns across the data set that are likely to contain that data quality issues that you're interested in. You'll select the columns in the assets that you configure.

The assets that you configure for exception management read the data set in the following ways:

•A rule specification analyzes one or more columns and generates exception values that report on the quality of the column values. You select one or more rule specifications in a profiling task.
•An exception task selects records from the data set based on the exception values that the rule specifications report. You create the exception task from the profiling task that includes the rule specifications.

Note: You configure a rule specification with inputs that represent the columns in the source data set. Create an input for each column in the data set that you want to analyze. The data properties that you configure for rule specification inputs must match the data properties of the corresponding columns in the data set.