Configure Match and Merge > Training machine learning models > Training metrics
  

Training metrics

When you train a machine learning (ML) model, you need to evaluate the effectiveness of the model. After the model achieves the effectiveness that your business requires, you can complete the training.
To evaluate the effectiveness of the ML model, review the training metrics that appear on the Machine Learning Model page.
The following ML model training metrics help answer your questions about model effectiveness:

Error matrix

An error matrix summarizes the performance of a machine learning (ML) model based on the training data set. Use the matrix to determine how successful the predictions of the model are by understanding the correlation between the actual labels and the predictions of the model.
The following 2 × 2 matrix explains what each quadrant of an ML model's error matrix represents:
-
Predicted as Match
Predicted as Not a Match
Labeled as Match
True positive (TP):
An outcome where the model correctly predicts a record pair as a match.
False negative (FN):
An outcome where the model incorrectly predicts a record pair as not a match.
Labeled as Not a Match
False positive (FP):
An outcome where the model incorrectly predicts a record pair as a match.
True negative (TN):
An outcome where the model correctly predicts a record pair as not a match.
The error matrix shows only 30% of the total record pairs that are labeled as a match or not a match. For example, if you label 20 record pairs as a match and 10 record pair as not a match, the error matrix shows the following values:
The following image shows an error matrix with 60.6% true positives, 30.3% true negatives, 3% false negatives, and 6.1% false positives:

Accuracy

Accuracy measures how well an ML model correctly classifies records as a match or not a match.
Accuracy is the percentage of correct predictions out of the total number of predictions.
For example, if the accuracy of an ML model is 95%, it correctly identifies 95% of the record pairs as a match or not a match out of the total number of record pairs.
Accuracy = (TP + TN) / (TP + TN + FP + FN) = (TP + TN) / Total number of record pairs
Where TP indicates true positives, TN indicates true negatives, FP indicates false positives, and FN indicates false negatives.
When there is a huge disparity between the record pairs that are labeled as a match and those that are labeled as not a match, use precision and recall of the model to decide whether to continue the training.

Precision

Precision measures how many of the matches that the ML model predicts are actually correct matches.
Precision is the percentage of matches that the ML model correctly predicts out of the total number of matches it predicts.
For example, if the precision of an ML model is 80%, when it identifies record pairs as a match, it is correct 80% of the time.
Precision = TP / (TP + FP) = TP / Total number of predicted matches
Where TP indicates true positives and FP indicates false positives.

Recall

Recall measures how well an ML model can find all the correct matches.
Recall is the percentage of matches that the ML model correctly predicts out of the total number of actual matches.
For example, if the recall of an ML model is 50%, it correctly identifies 50% of record pairs that are a match.
Recall = TP / (TP + FN) = TP / Number of actual matches
Where TP indicates true positives and FN indicates false negatives.