Developer Transformation Guide > Match Transformation > Match Mapping Performance
  

Match Mapping Performance

You can preview the data factors that determine the performance of the Match transformation before you run the mapping that contains the transformation. You can verify that the system has the resources to run the mapping. You can also verify that you configured the transformation correctly to measure the levels of similarity in the input data.
Use the Match Performance Analysis option to verify that the system has the required resources. Use the Match Cluster Analysis option to verify that the mapping can accurately measure the levels of similarity in the input data.
Run match performance analysis and match cluster analysis on any Match transformation that reads a single data source. Run match performance analysis on any Match transformation that performs dual-source field match analysis. Do not run match performance analysis or match cluster analysis on an identity match strategy that connects to index tables.

Drill-down on Match Performance Analysis

You can drill down on the match analysis data to view the record pairs that meet or exceed the match threshold. Double-click a record in the Details view, and use the Data Viewer to view the records that match the record that you select. The Data Viewer displays the data for each pair of records on a single row. The row contains the row identifier of each record in the pair.

Drill-down on Match Cluster Analysis

You can drill down on the cluster analysis data to view the records in each cluster. Double-click a cluster in the Details view and read the data in the Data Viewer. The Data Viewer displays one cluster at a time. The cluster data includes the score options that you selected, such as the driver score, link score, driver identifier, or link identifier.

Match Transformation Logging

When you run a mapping that uses a Match transformation, the Developer tool log tracks the number of comparison calculations that the mapping performs. To view the log data, select the Show Log option in the Data Viewer.
The mapping updates the log every 100,000 calculations.

Viewing Match Cluster Analysis Data

You can view statistical data on the clusters that the transformation can create. The cluster statistics summarize the level of record duplication in the data set based on the current mapping configuration.
To view the data, right-click the Match transformation in the mapping canvas and select Match Cluster Analysis.
Before you run the analysis, validate the mapping that contains the transformation.
Match cluster analysis displays data for the following properties:
Property
Description
Source
The number of input data rows.
Last run
The date and time of the analysis.
Total number of discovered clusters
The number of clusters that the match analysis generates when the mapping runs.
Minimum cluster size
The number of records in the cluster or clusters that contain the fewest records. If the minimum cluster size is 1, the data set contains at least one unique record.
Maximum cluster size
The number of records in the cluster or clusters that contain the most records.
If this value greatly exceeds the average cluster size, the largest cluster might contain false duplicates.
Number of unique records
The number of records in the data set that do not match another record with a score that meets the match threshold.
Number of duplicate records
The number of records in the data set that match another record with a score that meets the match threshold.
Total comparisons
The number of comparison operations that the mapping performs.
Average cluster size
The average number of records in a cluster.

Viewing Match Performance Analysis Data

You can view statistical data on the record groups that the mapping reads as input data.
To view the data, right-click the Match transformation in the mapping canvas and select Match Performance Analysis.
Before you run the analysis, validate the mapping that contains the transformation.
Match performance analysis displays data for the following properties:
Property
Description
Source
The number of input data rows.
Last run
The date and time of the analysis.
Total number of discovered groups
The number of groups defined for the data set, based on the selected group key value.
Throughput (records per minute)
Variable value that estimates the speed of the match analysis. You set this value. Use the value to estimate the time required to run the match analysis.
Estimated time to match records
The time taken to analyze all records in the data set, based on the Match transformation configuration.
Total number of pairs generated
The number of comparisons that the transformation must perform, based on the number of input data rows and the number of groups.
Minimum group size
Variable value that indicates the minimum number of records a group can contain. You set this value. Use the value to verify that the mapping will create groups of a usable size.
Note: The minimum group size value does not determine the size of the groups created when the mapping runs.
Number of groups below minimum threshold
The number of groups that contain fewer records than the minimum group size value.
If many groups are below the minimum group size, you might need to edit the transformation and select a different group key.
Maximum group size
Variable value that indicates the maximum number of records a group can contain. You set this value for the performance analysis. Use the value to verify that the mapping will create groups of a usable size.
Note: The value does not determine the size of the groups created when the mapping runs.
Number of groups above maximum threshold
The number of groups that contain more records than the maximum group size value.
If many groups are above the maximum group size, you might need to edit the transformation and select a different group key.