Configure Match and Merge > Match model properties > Candidate selection criteria

Candidate selection criteria

Match candidates are record pairs that are possible matches. The criteria for selecting match candidates depends on business entity fields.

Configure candidate selection criteria that are absolutely necessary to enhance the quality of the match candidates. Having multiple criteria improves quality but affects the performance of the candidate selection process.

A candidate selection criteria consists of the following parameters:

Note: When you modify the default population and the parameters in the candidate selection criteria section, ensure that you publish and regenerate the match keys for the records.

For more information about publishing and regenerating match keys, see Publish a match model.

Field name

Field name is the name of the business entity field to use for generating match keys, and then identifying match candidates.

For example, you can select a business entity field, such as Full Name. When you select a field, you need to select a corresponding field type to indicate the type of data the field contains.

Field type

A field type indicates the type of data contained in the business entity field that you select as your candidate selection criteria. The fuzzy matching process is effective only when you configure an appropriate field type for the business entity fields.

The following table describes the field types:

Field Type	Description
Address_Part1	Contains address details, such as building name, street number, street name, street type, and apartment details.
Code	Contains numeric or alphanumeric values, such as license number or vehicle registration number.
CreditCard	Contains credit card numbers.
Date	Contains dates, such as date of birth, expiry date, date of contract, date of change, or issued date.
Email	Contains email addresses.
Geocode	Contains geographic coordinates, such as latitude, longitude, and elevation. For more information about the Geocode field type, see Geocode field type.
ISBN10	Contains the 10-digit International Standard Book Number (ISBN).
ISBN13	Contains the 13-digit ISBN.
Organization_Name\|Company_Name	Contains organization names, such as business names, institution names, department names, agency names, or trading names.
Person_Name	Contains names of individuals.
Product_Description	Contains description of products.
Product_Name	Contains names of products.
Telephone_Number	Contains telephone numbers.
VIN	Contains Vehicle Identification Numbers (VINs).

Geocode field type

You can match addresses based on the geographic coordinates instead of relying only on the textual address data for matching. The match process calculates the distance between the addresses based on the latitude, longitude, and elevation to determine the proximity.

Based on the proximity of the addresses, the match process considers them either as a match or not a match. You can either manually specify geographic coordinates in the addresses of records or use the enriched address values from the address verification DaaS provider. For more information about address verification, see DaaS rule association for real-time processing.

For example, you work for a telecommunication company that aims to group customer records based on the location of their customer care centers to enhance customer support. Based on the number of customers around each customer care center, you want to adjust the number of support personnel. To achieve this goal, you can identify customer addresses that fall within a radius of 1000 meters from their customer care centers based on the geographic coordinates.

To match records based on geographic coordinates, ensure that you configure a match model that uses the Geocode fields, latitude, longitude, and elevation in the candidate selection criteria and directed AI match rules. The Geocode field type contains the Latitude, Longitude, and Elevation fields. When you add the Geocode field type as the candidate selection criteria, the match process adds the Latitude, Longitude, and Elevation fields from the Address field group by default. If required, you can specify other business entity fields as the Latitude, Longitude, and Elevation fields. When you specify these fields, ensure that they belong to the same field group or they are root fields.

When you use the Geocode field type as candidate selection criteria, include other field types as additional candidate selection criteria to identify good candidates.

Latitude and longitude fields

You can specify the latitude and longitude field values in the Decimal Degrees (D.D°) or Degrees Minutes Seconds (D° M' S") format. For example, you can specify the values either as 13.0843° N or 13°05′3.48″ N. Additionally, you can use the following variations when you specify these values:

•13.0843° N
•13.0843N
•13.0843 N
•N13.0843
•13.0843
•-13.0843
•13d05’3.48”
•13d5’3.48”N
•13:5:3.48N
•13:5:3.48

If directional indicators, such as N, S, E, and W, aren't added to indicate north, south, east, or west, the match process uses north and east as the default latitude and longitude values. Additionally, the match process considers positive and negative latitude values as north and south. Similarly, positive and negative longitude values are considered as east and west.

Elevation field

You can specify a numerical value for the elevation field. For example, you can specify the value as 100m. Additionally, you can use the following variations when you specify a value:

•100m
•100ft
•100km
•100miles
•100

The default unit is meter. If you don’t configure the Elevation field, the match process excludes the field from matching.

Radius

You can specify a radius to determine the proximity of addresses. The value can be between 1 to 10,000 meters. Default is 1000.

Filter candidates

You can use filters to identify potential matches within a subset of data based on specified criteria. When you add a filter, you might have a fewer number of match candidates resulting in improved performance and accuracy of matching.

For example, if you select Full Name as the candidate selection field and Country as the filter field, the candidate selection process filters match candidates with similar full names for each country. If the records have country values of Germany and France, when you match a record with the country value Germany, the candidate selection process matches it with other records that also have country value as Germany, rather than matching it with all records. This process generates fewer candidates and the performance of matching improves.

Key generation level

The key generation level defines the thoroughness with which the match model generates match keys to identify candidates for matching. The key generation levels are extended, limited, and standard.

Decide on a key generation level based on the following considerations:

•Size and quality of the data
•Reliability of the matched records
•Processing time for key generation

The following table describes the key generation levels:

Key Generation Level	Description
Extended	Performs a stringent analysis of your data to generate many match keys, which might result in a large number of match candidates. This level ensures reliable matches, but at the cost of long processing time to generate match keys. If your data volume is not too large and the data quality is poor, use the extended key generation level.
Limited	Performs a quick and lenient analysis of your data to generate few match keys, which might result in few match candidates. This level trades some match reliability for a faster search of match candidates compared to when you use the extended key generation level. If your data volume is extremely large with variations in word order, use the limited key generation level. The results of the limited key generation level are a subset of the standard level.
Standard	Generates an optimum number of match keys, which in turn results in an optimum number of match candidates. This level balances the reliability of matches and the processing time to generate match keys.

Candidate search level

The match model uses the candidate search level to determine how stringently and thoroughly to search for match candidates. Later, the match model applies the directed AI match rules on the match candidates and not on the entire data. You can configure an exhaustive, extreme, narrow, or typical candidate search level.

The match model uses the key generation level and the candidate search level that you define to determine the records that are potential match candidates.

Decide on a candidate search level based on the following considerations:

•Size and quality of the data
•Criticality of the matches
•Time constraints

The following table describes the candidate search levels:

Candidate Search Level	Description	Processing Time	When to Use
Extreme	Performs a lenient search that returns many more match candidates than the exhaustive search level. This search level results in a larger number of match candidates than the other search levels, possibly resulting in overmatching, and processes slowly.	Highest	Appropriate for small volumes of data that aren't not reliable and incomplete, or if it is critical to find the highest possible number of match candidates. You can also use the extreme level for the initial phase of matching and use a more stringent search level for the later phase.
Exhaustive	Performs a lenient search that returns many match candidates. This search level can result in more match candidates than most other search levels, possibly resulting in overmatching, and processes slowly.	High	Appropriate for small volumes of data that aren't reliable and incomplete. You can also use the exhaustive level for the initial phase of matching and a more stringent search level for the later phase.
Typical	Use in most scenarios to search for an optimum number of match candidates.	Optimal	Appropriate for average volumes of data that are moderately critical.
Narrow	Performs a stringent search that returns fewest match candidates. This search level results in fewer match candidates than other search levels, possibly resulting in undermatching, and processes faster.	Low	Appropriate for large volumes of data that are reliable and complete or contains a large number of duplicates.

Scenarios for candidate selection criteria configuration

When you configure candidate selection criteria, consider industry, size and quality, the consequences of missing a match or overmatching, and processing time.

The following scenarios illustrate recommended settings for key generation and search level matching:

Flight reservation system: An operations manager wants to check if a passenger is on the no-fly list and a missed match can be critical. In this scenario, you can use an extended key generation and an extreme search level that results in the maximum number of keys and candidates, ensuring no potential match is missed. The process might take a while to run, but time is less important than missing a match.
Retail store: When a cashier searches for the loyalty account of a customer with other customers waiting, speed is essential but a missed match isn't critical. Use of standard key generation and a typical search level balances quick response times with adequate matching accuracy and minimal impact if a match is missed.
Financial institution: A financial institution wants to merge customer records and if they merge records incorrectly, they might compromise customer data. In this scenario, limited keys produce only a few key permutations, and a narrow search returns only extremely similar candidates. You can also avoid overmatching of records.

Guidelines for configuring candidate selection criteria

Consider the following guidelines when you configure candidate selection criteria:

•If data contains names of organizations, use the name Organization_Name field to generate match keys. If data contains names of individuals, use the Person_Name field to generate match keys. If data contains addresses, use the Address_Part1 field to generate match keys.
•If you can identify more candidates that might not be identified using a single field, use multiple fields to generate match keys.
•Use a match key width that provides you the best performance without affecting your match quality. You might have to trade off some match quality for increased performance.
•Use the narrowest possible search level to generate acceptable matches. In most cases, the typical search level might suffice.
•Ensure that you clean or suppress data that returns a large number of match candidates. You can look for large groups of identical keys when there are identical records. For example, high-frequency phone numbers, email addresses, words, or phrases can return identical keys.
•Consider filtering candidates to minimize the number of candidates for large data volumes. For example, you can filter candidates using a state or country code.