Probabilistic Models Overview

A probabilistic model is a reference data object. Use a probabilistic model to understand the contents of a data string that contains multiple data values. A probabilistic model identifies the types of information in each value in the string.

You can add a probabilistic model to a Labeler transformation and a Parser transformation:

Probabilistic models use natural language processes to identify the type of information in a string. Natural language processes detect relevant terms in the input string and disregard terms that are not relevant.

You compile a probabilistic model in the Developer tool. When you compile a model, you create associations between similar data values in the model. The Labeler and Parser transformations uses the compiled data to analyze the values in the input strings.

Labeler Transformation Example

The customer database at an insurance organization contains multiple data entry errors. You are a data steward at the insurance organization. You configure a mapping with a Labeler transformation to determine the different types of data that each column contains.

Row ID	Field 1	Field 2	Field 3
1	19132954	AIM SECURITIES	PETRIE TAYBRO
2	10110169	JASE TRAPANI	BANK OF NEW YORK
3	10111786	WANGER ASSET MANAGEMENT, LLP	JAN SEEDORF
4	10112299	FELIX LEVENGER	HARVARD MAGAZINE
5	10112036	DESCHÊNES & FILS LTÉE (QUEBEC)	RICHARD TREMBLAY
6	BERGER ASSOCIATES	10111101	DAREEN HULSMAN
7	19131385	EAGLE FINANCIAL GROUP INC	PATRICK MCKINNIE
8	LAKENYA PASKETT	WHITEHALL FINANCIAL GROUP	15954710

When you run the mapping, the Labeler transformation compares the input data with the probabilistic model reference data. The Labeler transformation assigns a label to each input value. The transformation writes the labels to an output port. Each output row contains a set of labels that defines the data structure on the corresponding input row.

The following table describes the labels that the Labeler transformation adds to the output port:

Row ID	Output Labels
1	number organization contact
2	number contact organization
3	number organization contact
4	number contact organization
5	number organization contact
6	organization number contact
7	organization number contact
8	contact organization number

Parser Transformation Example

A supermarket stores product descriptions in a single column in a database table. The product descriptions contain multiple data values that represent different types of information. You are a data steward at the supermarket. You want to create columns for the different types of information in the product descriptions.

You configure a mapping with a Parser transformation to organize the data values into the correct fields.

The following table describes the output data that the Parser transformation creates from the input data:

Product Name	Product Type	Product Details	Product Size
Sunnydream	Orange Juice	Unsweetened	12 oz