Probabilistic Models > Probabilistic Models Overview
  

Probabilistic Models Overview

A probabilistic model is a reference data object. Use a probabilistic model to understand the contents of a data string that contains multiple data values. A probabilistic model identifies the types of information in each value in the string.
You can add a probabilistic model to a Labeler transformation and a Parser transformation:
Probabilistic models use natural language processes to identify the type of information in a string. Natural language processes detect relevant terms in the input string and disregard terms that are not relevant.
You compile a probabilistic model in the Developer tool. When you compile a model, you create associations between similar data values in the model. The Labeler and Parser transformations uses the compiled data to analyze the values in the input strings.

Labeler Transformation Example

The customer database at an insurance organization contains multiple data entry errors. You are a data steward at the insurance organization. You configure a mapping with a Labeler transformation to determine the different types of data that each column contains.
The following table describes sample data from the customer database:
Row ID
Field 1
Field 2
Field 3
1
19132954
AIM SECURITIES
PETRIE TAYBRO
2
10110169
JASE TRAPANI
BANK OF NEW YORK
3
10111786
WANGER ASSET MANAGEMENT, LLP
JAN SEEDORF
4
10112299
FELIX LEVENGER
HARVARD MAGAZINE
5
10112036
DESCHÊNES & FILS LTÉE (QUEBEC)
RICHARD TREMBLAY
6
BERGER ASSOCIATES
10111101
DAREEN HULSMAN
7
19131385
EAGLE FINANCIAL GROUP INC
PATRICK MCKINNIE
8
LAKENYA PASKETT
WHITEHALL FINANCIAL GROUP
15954710
When you run the mapping, the Labeler transformation compares the input data with the probabilistic model reference data. The Labeler transformation assigns a label to each input value. The transformation writes the labels to an output port. Each output row contains a set of labels that defines the data structure on the corresponding input row.
The following table describes the labels that the Labeler transformation adds to the output port:
Row ID
Output Labels
1
number organization contact
2
number contact organization
3
number organization contact
4
number contact organization
5
number organization contact
6
organization number contact
7
organization number contact
8
contact organization number

Parser Transformation Example

A supermarket stores product descriptions in a single column in a database table. The product descriptions contain multiple data values that represent different types of information. You are a data steward at the supermarket. You want to create columns for the different types of information in the product descriptions.
You configure a mapping with a Parser transformation to organize the data values into the correct fields.
The following data fragment contains the product description for orange juice:
Sunnydream Orange Juice Unsweetened 12 oz
The following table describes the output data that the Parser transformation creates from the input data:
Product Name
Product Type
Product Details
Product Size
Sunnydream
Orange Juice
Unsweetened
12 oz