Developer Transformation Guide > Match Transformations in Field Analysis > Field Match Analysis Example
  

Field Match Analysis Example

You are a data steward at a retail bank. You receive a set of account records for the customers who opened bank accounts in the past seven days. You want to verify that the data set does not contain duplicate records. You design a mapping to search the records for duplicate data.

Create the Mapping

Create a mapping that looks for duplicate data in multiple fields.
The mapping performs the following tasks:
The following image shows the mapping in the Developer tool:
The mapping that you create contains the following objects:
Object Name
Description
Read_Match_Customers
Data source.
Contains the account holder details, including names, addresses, and account numbers.
Key_Generator
Key Generator transformation.
Adds group key values and sequence identifier values to the source data.
Match
Match transformation.
Analyzes the levels of duplication in the account data.
Write_match_target
Data target.
Contains the results of the field analysis.

Input Data Sample

The data set contains the account number, name, address, and employer for each customer. You create a data source from the data set in the Model repository. You add the data source to the mapping.
The following data fragment shows a sample of the customer account data:
CustomerID
Lastname
City
State
ZIP
15954467
JONES
SCARSDALE
NY
10583
10110907
JONES
MINNEAPOLIS
MN
55437
19131127
JONES
INDIANAPOLIS
IN
46240
10112097
JONES
HOUSTON
TX
77036
19133807
JONES
PLANTATION
FL
33324
10112447
JONES
SCARSDALE
NY
10583
15952487
JONES
HOUSTON
TX
77002
10112027
JONES
OAKLAND
CA
94623

Key Generator Transformation Configuration

When you configure the Key Generator transformation, connect the data source ports that you want to analyze. Specify the port that contains the group key data. If the records do not include a unique identifier, use the sequence ID port to add unique identifiers to the record.
When you specify the group key port, consider the following guidelines:
In the current example, select the City port as the group key. If an account name appears more than once in a city, the accounts might contain duplicate data. If an account name appears more than once in different cities, the accounts are unlikely to be duplicates.
Tip: Run a column profile on the data source before you select the group key port. The profile results can indicate the number of times each value appears on a port.

Match Transformation Configuration

Add a nonreusable Match transformation to the mapping to perform the field analysis.
Complete the following tasks to configure the Match transformation:
  1. 1. Select the type of match analysis to perform.
  2. 2. Connect the input ports to the transformation.
  3. 3. Configure strategies to compare the record data.
  4. 4. Select the type of match output data that the transformation creates.
  5. 5. Connect the output ports to a data target.

Select the Type of Match Operation

Use the options on the Match Type view to select the match operation. To compare the account data, configure the transformation to perform single-source field analysis.

Connect the Input Ports

Connect the customer account data ports to the Match transformation.
The Match transformation uses preset input ports to determine the order in which it processes the records. The transformation uses a sequence identifier to track the records from the input ports to the matched pairs or clusters that it writes as output. The transformation uses a group key to sort the records that it processes.
Connect the following preset ports on the Match transformation to the preset output ports on the Key Generator transformation:

Configure the Strategies for Field Analysis

Use the options on the Strategies view to configure the strategies. The strategies determine the types of analysis that the transformation performs on the record data.
Create the following strategies:

Select the Match Output Type

Use the options on the Match Output view to define the output format for the results of the match analysis.
The following image shows the Match Output view for single-source field analysis:
The Match Output view provides different output options, including options to organize the output records into clusters or matched pairs. The current transformation generates matched pairs.
You configure the transformation to organize the output records in matched pairs. Because the transformation returns records in pairs, the results of the analysis do not include unique records.

Connect the Output Ports

Connect the Match transformation output ports to the data target in the mapping. Select the ports that contain the record data that you want to write to the data target.
The transformation includes preset ports that identify the records in each matched pair. The preset port names are RowId and RowId1. Each row ID value identifies a record in the output data.
The row ID values correspond to the ports that you select in a match strategy. When you configure a strategy, you select port names with the suffix _1 or _2. The RowId value identifies the record that contains a port with the suffix _1. The RowId1 value identifies the record that contains a port the suffix _2.
You can use other output ports to review the relationships between the records. The link port values and driver port values indicate the extent of the similarity between the records in each cluster.
In the current example, you connect all the ports to the data target. To view the output data on the ports, run the Data Viewer.

Run the Data Viewer

Run the Data Viewer to review the results of the match analysis. By default, the Data Viewer shows all the output ports on the Match transformation. When you run the mapping, you update the data target with the data from the output ports.
The following image shows the output data in the Data Viewer:
The Data Viewer shows the output data from the Match transformation output ports, including the row I.D. ports.
The Data Viewer verifies that the customer account data includes one or more duplicate records.
Consider the following data in the Data Viewer:

Conclusion

The results of the match analysis indicate that the customer account data set includes at least one pair of duplicate records. You contact the local branch for the accounts that you need to verify. You verify that the other records in the data set uniquely identify a customer account.