Deduplicate assets > Introduction to deduplicate assets > Duplicate analysis operations
  

Duplicate analysis operations

You define duplicate analysis operations in a deduplicate asset in Data Quality and in the Deduplicate transformation that reads the asset in Data Integration.
At a high level, you complete the following steps:
Deduplicate asset steps
Select the type of identity information that the transformation will analyze.
Configure the search criteria that the transformation will apply to the input data.
Deduplicate transformation steps
Add the deduplicate asset to the transformation.
Select the fields that contain the relevant identity data.
Select a field on which the transformation can sort the input records at run time.
The deduplicate asset provides a list of identity types that you must choose from. Each identity is optimized for different types of information. When you configure the Deduplicate transformation, you map the identity fields that the asset specifies to the input fields on the transformation.
Additionally, you configure the Deduplicate transformation to sort the input records into groups based on the values in a field that you select. In duplicate analysis, a group is a set of records that contain identical values in a given field. At run time, the Deduplicate transformation analyzes records exclusively within each group and combines the results from each group into a single output data set. When you create groups on an appropriate field, you reduce the overall number of comparisons that the Deduplicate transformation must perform without any meaningful loss of accuracy in the duplicate analysis.
The GroupKey field in the Deduplicate transformation identifies the field on which the transformation sorts the records. For more information on groups in duplicate analysis, see the Deduplicate transformation chapter in the Transformations module of the Data Integration documentation.
Note: To analyze all of your input records in a single group, create the group from a field that contains the same value for all records.

Rules and guidelines for duplicate analysis operations

When you configure the deduplicate analysis operations, consider the following rules and guidelines: