Match candidates are record pairs that are possible matches. The criteria for selecting match candidates depends on business entity fields.
Configure candidate selection criteria that are absolutely necessary to enhance the quality of the match candidates. Having multiple criteria improves quality but affects the performance of the candidate selection process.
A candidate selection criteria consists of the following parameters:
Note: When you modify the default population and the parameters in the candidate selection criteria section, ensure that you publish and regenerate the match keys for the records.
For more information about publishing and regenerating match keys, see Publish a match model.
Field name
Field name is the name of the business entity field to use for generating match keys, and then identifying match candidates.
For example, you can select a business entity field, such as Full Name. When you select a field, you need to select a corresponding field type to indicate the type of data the field contains.
Field type
A field type indicates the type of data contained in the business entity field that you select as your candidate selection criteria. The fuzzy matching process is effective only when you configure an appropriate field type for the business entity fields.
The following table describes the field types:
Field Type
Description
Address_Part1
Contains address details, such as building name, street number, street name, street type, and apartment details.
Code
Contains numeric or alphanumeric values, such as license number or vehicle registration number.
CreditCard
Contains credit card numbers.
Date
Contains dates, such as date of birth, expiry date, date of contract, date of change, or issued date.
Email
Contains email addresses.
Geocode
Contains geographic coordinates, such as latitude, longitude, and elevation.
ISBN10
Contains the 10-digit International Standard Book Number (ISBN).
ISBN13
Contains the 13-digit ISBN.
Organization_Name|Company_Name
Contains organization names, such as business names, institution names, department names, agency names, or trading names.
Person_Name
Contains names of individuals.
Product_Description
Contains description of products.
Product_Name
Contains names of products.
Telephone_Number
Contains telephone numbers.
VIN
Contains Vehicle Identification Numbers (VINs).
Filter candidates
You can use filters to identify potential matches within a subset of data based on specified criteria. When you add a filter, you might have a fewer number of match candidates resulting in improved performance and accuracy of matching.
For example, if you select Full Name as the candidate selection field and Country as the filter field, the candidate selection process filters match candidates with similar full names for each country. If the records have country values of Germany and France, when you match a record with the country value Germany, the candidate selection process matches it with other records that also have country value as Germany, rather than matching it with all records. This process generates fewer candidates and the performance of matching improves.
Key generation level
The key generation level defines the thoroughness with which the match model generates match keys to identify candidates for matching. The key generation levels are extended, limited, and standard.
Decide on a key generation level based on the following considerations:
•Size and quality of the data
•Reliability of the matched records
•Processing time for key generation
The following table describes the key generation levels:
Key Generation Level
Description
Extended
Performs a stringent analysis of your data to generate many match keys, which might result in a large number of match candidates. This level ensures reliable matches, but at the cost of long processing time to generate match keys.
If your data volume is not too large and the data quality is poor, use the extended key generation level.
Limited
Performs a quick and lenient analysis of your data to generate few match keys, which might result in few match candidates. This level trades some match reliability for a faster search of match candidates compared to when you use the extended key generation level.
If your data volume is extremely large with variations in word order, use the limited key generation level. The results of the limited key generation level are a subset of the standard level.
Standard
Generates an optimum number of match keys, which in turn results in an optimum number of match candidates. This level balances the reliability of matches and the processing time to generate match keys.
Candidate search level
The match model uses the candidate search level to determine how stringently and thoroughly to search for match candidates. Later, the match model applies the declarative match rules on the match candidates and not on the entire data. You can configure an exhaustive, extreme, narrow, or typical candidate search level.
The match model uses the key generation level and the candidate search level that you define to determine the records that are potential match candidates.
Decide on a candidate search level based on the following considerations:
•Size and quality of the data
•Criticality of the matches
•Time constraints
The following table describes the candidate search levels:
Candidate Search Level
Description
Exhaustive
Performs a lenient search that returns many match candidates. Appropriate for small volumes of data that is not reliable and incomplete. You can also use the exhaustive level for the initial phase of matching and use a more stringent search level for the later phase.
This search level can result in more match candidates than most other search levels, possibly resulting in overmatching, and processes slowly.
Extreme
Performs a lenient search that returns many more match candidates than the exhaustive search level. Appropriate for small volumes of data that is not reliable and incomplete, or if it is critical to find the highest possible number of match candidates. You can also use the extreme level for the initial phase of matching and use a more stringent search level for the later phase.
This search level results in a larger number of match candidates than the other search levels, possibly resulting in overmatching, and processes slowly.
Narrow
Performs a stringent search that returns fewest match candidates. Appropriate for large volumes of data that is reliable and complete or contains a large number of duplicates.
This search level results in fewer match candidates than other search levels, possibly resulting in undermatching, and processes faster.
Typical
Use in most scenarios to search for an optimum number of match candidates.
Scenarios for candidate selection criteria configuration
When you configure candidate selection criteria, consider industry, size and quality, the consequences of missing a match or overmatching, and processing time.
The following scenarios illustrate recommended settings for key generation and search level matching:
Flight reservation system
An operations manager wants to check if a passenger is on the no-fly list and a missed match can be critical. In this scenario, you can use an extended key generation and an extreme search level that results in the maximum number of keys and candidates, ensuring no potential match is missed. The process might take a while to run, but time is less important than missing a match.
Retail store
When a cashier searches for the loyalty account of a customer with other customers waiting, speed is essential but a missed match isn't critical. Use of standard key generation and a typical search level balances quick response times with adequate matching accuracy and minimal impact if a match is missed.
Financial institution
A financial institution wants to merge customer records and if they merge records incorrectly, they might compromise customer data. In this scenario, limited keys produce only a few key permutations, and a narrow search returns only extremely similar candidates. You can also avoid overmatching of records.
Guidelines for configuring candidate selection criteria
Consider the following guidelines when you configure candidate selection criteria:
•If data contains names of organizations, use the name Organization_Name field to generate match keys. If data contains names of individuals, use the Person_Name field to generate match keys. If data contains addresses, use the Address_Part1 field to generate match keys.
•If you can identify more candidates that might not be identified using a single field, use multiple fields to generate match keys.
•Use a match key width that provides you the best performance without affecting your match quality. You might have to trade off some match quality for increased performance.
•Use the narrowest possible search level to generate acceptable matches. In most cases, the typical search level might suffice.
•Ensure that you clean or suppress data that returns a large number of match candidates. You can look for large groups of identical keys when there are identical records. For example, high-frequency phone numbers, email addresses, words, or phrases can return identical keys.
•Consider filtering candidates to minimize the number of candidates for large data volumes. For example, you can filter candidates using a state or country code.