Deduplicate assets > Introduction to deduplicate assets > Configuring the deduplication process

Configuring the deduplication process

To define duplicate analysis operations in the deduplicate asset, configure the options on the Deduplication tab.
    1Select the Deduplication tab on the asset.
    2Select an objective. The objective represents the type of identity that the Deduplicate transformation will look for during duplicate analysis.
    Tip: The objective also indicates the set of inputs that the transformation expects to read at run time. You can preview the fields in the Test Data panel.
    3Select an index key.
    The index key represents the field on which the transformation will build identity data index. The objective that you select determines the set of keys from which you can select the index key.
    4Select the data locale in which the data set originates.
    The duplicate analysis process reads identity reference data for the locale that you select.
    5Select or clear the option to define optional fields for the objective.
    Select the option if your source data contains one or more columns of relevant data that the objective does not specify. For example, your source data might contain a discrete field for corporate suffixes.
    6Select or clear the option to filter exact duplicates.
    When you select the option, the transformation passes records that are duplicates of each other directly to the consolidation stage or to the downstream objects in the mapping.
    You might select the option when the input data contains many identical rows.
    Note: The output from the analysis contains the same records whether you select or clear the option. The Deduplicate transformation might assign different scores to the output records when you select and clear the option.
    7Select a level of performance for the duplicate analysis.
    The performance level describes the relationship between the speed and the granularity of the analysis. Faster analysis is less granular and might miss some duplicate records.
    8Optionally, review or update the criteria that apply for a performance option.
    To review the criteria, select the option and expand the Advanced Options.
    To customize the criteria, select the Custom option as the performance level. For example, you might decide to update the threshold score value.
    9Save the asset.
After you configure the deduplication process, you can optionally configure a consolidation process for the duplicate records that deduplication identifies.
For more information about the deduplication options, see Deduplication tab options.