Prerequisites for configuring match and merge

Task	Description
Analyze data	Analyze the data samples and identify the fields that you can use for matching. You can use SQL queries, data profiler, or other tools to analyze data. When you analyze the data samples, you can identify the following details: - Identify the most critical attributes that hold unique identifiers, such as phone numbers and social security numbers, and distinct patterns, such as postal codes and email addresses. - Determine the size of the data set, which could impact the performance of the match and merge process. - Determine the match population, which improves match accuracy by accommodating variations and errors that are likely to appear in data for a particular population. Additionally, using a data profiler or other tools helps you identify granular details, such as the cardinality that exists between fields, data completeness and uniqueness of fields.
Collect data samples	Collect a representative data sample from various sources, excluding test system data, and ensure its size is manageable for analysis.
Determine match requirements	Talk to users to determine the high-level match requirements and understand what qualifies as a match based on the sample data set.
Identify groups of similar records	Identify similar records by running simple group statistics on single or multiple field data. The group statistics help you identify the following details: - Large groups of identical records. - Appropriate candidate selection criteria. A large number of candidates might impact match performance. - Determine viability of the intended match key fields.
Verify data completeness	Determine how often a field has a non-null value and look for data completeness by fields and combination of fields. For example, determine the percentage of records that have both first and last name values. If you want to use a field, such as postal code as an exact match field, the data set should contain more than 50% values for this field.
Determine data quality	Ensure that the data is accurate. For example, gender field must contain only gender values. You can use data quality rule associations to refine the data. You can also use pattern analysis to assess the quality of data in a field. Pattern analysis is useful in analyzing data that conforms to certain formats or data types, such as postal codes or email addresses.
Remove extraneous data	Identify and remove extraneous data by reviewing data manually or by using a data profiling tool.