Duplicate Records Overview

To correct duplicate records, examine a cluster of duplicate records and determine which record to store in the database and which records to discard. You can update the preferred record with values from other records in the cluster.

A cluster is aset of records in which each record matches at least one other record based on a match score. Each cluster has a preferred record. The preferred record contains the most accurate representation of the information in the cluster. The preferred record is the record you want to store in the database. The other records in the cluster are redundant. By default, the Analyst tool selects the first record in the cluster as the preferred record. When you edit the cluster, you update the preferred record with the most accurate field values from the other duplicate records in the cluster.

If a record is not a duplicate of another record in the cluster, you can remove it from the cluster. You can move a record from one cluster to another cluster. You can create a cluster with one record if the record is unique.

Complete the task after you review all the duplicate record clusters and accept one preferred record for each cluster of duplicate records.

Note: Two or more records are duplicates when they contain the same business information. Records can contain similar data but not represent the same information to the business. Your organization must determine the business rules that define duplicate data. For example, your organization might maintain more than one account record for the same customer.