Exception Management Process Flow
The tasks that you work on in the Analyst tool represent a stage in a data quality cycle. The cycle begins when the organization decides to verify the quality of the data in a data set. The cycle ends when the organization is satisfied with the quality of the data. An organization might run a data quality cycle on a continuous basis.
The exception management stage often occurs toward the end of the data quality cycle. Earlier stages might use profiles and mappings to measure the quality of the organization data and to enhance the quality of the data. Exception management defines the manual operations that users can perform on the records that fall short of the data quality targets in the current cycle.
You might work in a team of data stewards that implements the data quality process. Or, you might be a business user who defines the data quality standards that the data must meet. In both cases, you can own exception management tasks in the Analyst tool.
Note: In many organizations, data stewards combine the data stewardship role with other roles. You might be responsible for a data quality objective as part of a larger role in the organization. As a data steward, you might perform a task on a data set and pass the results to a colleague who assumes the data stewardship duties.
Bad Records Example
You are a data steward at a retail organization. You are concerned that the product inventory records might contain errors. The errors might cause the organization to order more products or fewer products than it can sell.
You define the following process to investigate and correct the errors:
- 1. You ask a developer to configure one or more mappings to find and fix the errors in the data set.
The mappings also calculate a numeric score for each record in the data set. The scores represent the data quality of the records. Some records have marginal scores that indicate that the mappings cannot verify all of the data quality issues that the records contain.
- 2. The developer configures an additional mapping that reads the numeric scores. The developer adds the mapping to a workflow that includes a Mapping task and a Human task.
- - The Mapping task runs the mapping. The Mapping task writes the records to different tables based on the scores that they contain.
- - The Human task distributes the records with marginal scores to tasks that you and other users can open in the Analyst tool.
- 3. You log in to the Analyst tool, and you open a task. The task organizes the exception records in one or more tables. Each table can contain 100 records.
You perform one the following actions on each record:
- - You correct the errors in the record, or you decide that the current record is correct.
You update the record status to indicate that the record is valid.
- - You determine that the record does not contain any valid data.
You update the record status to indicate that the record is not valid.
- - You decide that you cannot verify the accuracy of the record.
You update the record status to indicate that the record needs further analysis by another user or by another Informatica process.
Note: Before you update a record, verify that the task is open in edit mode. To enter edit mode, click the Edit button in the open task.
- 4. When you finish work on all of the records in the task, you update the task status. The task status indicates that the records are ready for the next stage in the data quality process.
The next stage for the data depends on the configuration of the Human task. For example, the Human task might include additional steps that assign the records to other users for review.
When the Human task completes, the next stage of the workflow begins.
Duplicate Records Example
You are a data steward at a bank. You are concerned that multiple records in the customer account tables might contain the same information. The duplicate records might represent data entry errors, or they might represent fraudulent customer activity.
You define the following process to find the duplicate records and to identify a single preferred version of each set of records:
- 1. You ask a developer to configure one or more mappings to identify the duplicate records.
The mappings calculate a set of numeric scores that represent the levels of duplication between the data values in the records. High scores indicate duplicate records, and low scores indicate unique records. Some records have marginal scores that indicate that the duplicate status of the records is uncertain.
- 2. The developer configures an additional mapping that reads the numeric scores. The developer adds the mapping to a workflow that includes a Mapping task and a Human task.
- - The Mapping task runs the mapping. The Mapping task writes the records to different tables based on the scores that they contain.
- - The Human task distributes the records with marginal scores to tasks that you and other users can open in the Analyst tool.
- 3. You log in to the Analyst tool, and you open a task.
The Analyst tool organizes the records in a series of clusters. Each cluster contains two or more records that contain similar information. By default, the first record in a cluster is the preferred record.
- 4. Open a cluster, and analyze the records that it contains.
You perform the following actions in each cluster:
- - You examine the data values in each column of record data. You select the most accurate value in each column and promote the value to the preferred record.
You can edit the values that you select, and you can search for records that contain common values in other clusters.
- - If a record does not belong in the current cluster, you move it to another cluster or you create a cluster for the record.
- - You update the cluster status to indicate that you reviewed the cluster. You complete the task when you verify the current preferred record in every cluster.
Note: Before you update a record, verify that the task is open in edit mode. To enter edit mode, click the Edit button in the open task.
- 5. When you finish work on all of the clusters in the task, you update the task status. The task status indicates that the records are ready for the next stage in the data quality process.
The next stage for the data depends on the configuration of the Human task. For example, the Human task might include additional steps that assign the clusters to other users for review.
When the Human task completes, the next stage of the workflow begins.