Configure Match and Merge > Configuring match and merge > Prerequisites for configuring match and merge
  

Prerequisites for configuring match and merge

Before you configure the match and merge process, analyze your data and identify its key characteristics. Also, understand the specific requirements for match and merge. Meaningful insights into your data can help you make better decisions when you configure declarative match rules.
Ensure that you understand the business objectives and requirements for matching and merging duplicate data. You can also analyze the attributes and quality of data and identify anomalies and patterns that you want to consider for the match and merge process.
The following table describes different tasks that you can perform to analyze data:
Task
Description
Analyze data
Analyze the data samples and identify the fields that you can use for matching. You can use SQL queries, data profiler, or other tools to analyze data.
When you analyze the data samples, you can identify the following details:
  • - Identify the most critical attributes that hold unique identifiers, such as phone numbers and social security numbers, and distinct patterns, such as postal codes and email addresses.
  • - Determine the size of the data set, which could impact the performance of the match and merge process.
  • - Determine the match population, which improves match accuracy by accommodating variations and errors that are likely to appear in data for a particular population.
Additionally, using a data profiler or other tools helps you identify granular details, such as the cardinality that exists between fields, data completeness and uniqueness of fields.
Collect data samples
Collect a representative data sample from various sources, excluding test system data, and ensure its size is manageable for analysis.
Determine match requirements
Talk to users to determine the high-level match requirements and understand what qualifies as a match based on the sample data set.
Identify groups of similar records
Identify similar records by running simple group statistics on single or multiple field data.
The group statistics help you identify the following details:
  • - Large groups of identical records.
  • - Appropriate candidate selection criteria. A large number of candidates might impact match performance.
  • - Determine viability of the intended match key fields.
Verify data completeness
Determine how often a field has a non-null value and look for data completeness by fields and combination of fields.
For example, determine the percentage of records that have both first and last name values. If you want to use a field, such as postal code as an exact match field, the data set should contain more than 50% values for this field.
Determine data quality
Ensure that the data is accurate. For example, gender field must contain only gender values. You can use data quality rule associations to refine the data.
You can also use pattern analysis to assess the quality of data in a field. Pattern analysis is useful in analyzing data that conforms to certain formats or data types, such as postal codes or email addresses.
Remove extraneous data
Identify and remove extraneous data by reviewing data manually or by using a data profiling tool.