Task | Description |
---|---|
Analyze data | Analyze the data samples and identify the fields that you can use for matching. You can use SQL queries, data profiler, or other tools to analyze data. When you analyze the data samples, you can identify the following details:
Additionally, using a data profiler or other tools helps you identify granular details, such as the cardinality that exists between fields, data completeness and uniqueness of fields. |
Collect data samples | Collect a representative data sample from various sources, excluding test system data, and ensure its size is manageable for analysis. |
Determine match requirements | Talk to users to determine the high-level match requirements and understand what qualifies as a match based on the sample data set. |
Identify groups of similar records | Identify similar records by running simple group statistics on single or multiple field data. The group statistics help you identify the following details:
|
Verify data completeness | Determine how often a field has a non-null value and look for data completeness by fields and combination of fields. For example, determine the percentage of records that have both first and last name values. If you want to use a field, such as postal code as an exact match field, the data set should contain more than 50% values for this field. |
Determine data quality | Ensure that the data is accurate. For example, gender field must contain only gender values. You can use data quality rule associations to refine the data. You can also use pattern analysis to assess the quality of data in a field. Pattern analysis is useful in analyzing data that conforms to certain formats or data types, such as postal codes or email addresses. |
Remove extraneous data | Identify and remove extraneous data by reviewing data manually or by using a data profiling tool. |