Probabilistic Models Overview
A probabilistic model is a reference data object. Use a probabilistic model to understand the contents of a data string that contains multiple data values. A probabilistic model identifies the types of information in each value in the string.
You can add a probabilistic model to a Labeler transformation and a Parser transformation:
- •Use a probabilistic model in a Labeler transformation to assign a descriptive label to each value in a data string. The Labeler transformation writes the labels to an output port in the same format as the input string.
- •Use a probabilistic model in a Parser transformation to write each value in an input string to a new port. The Parser transformation creates an output port for each data category that you define in the probabilistic model.
Probabilistic models use natural language processes to identify the type of information in a string. Natural language processes detect relevant terms in the input string and disregard terms that are not relevant.
You compile a probabilistic model in the Developer tool. When you compile a model, you create associations between similar data values in the model. The Labeler and Parser transformations uses the compiled data to analyze the values in the input strings.
Labeler Transformation Example
The customer database at an insurance organization contains multiple data entry errors. You are a data steward at the insurance organization. You configure a mapping with a Labeler transformation to determine the different types of data that each column contains.
The following table describes sample data from the customer database:
Row ID | Field 1 | Field 2 | Field 3 |
---|
1 | 19132954 | AIM SECURITIES | PETRIE TAYBRO |
2 | 10110169 | JASE TRAPANI | BANK OF NEW YORK |
3 | 10111786 | WANGER ASSET MANAGEMENT, LLP | JAN SEEDORF |
4 | 10112299 | FELIX LEVENGER | HARVARD MAGAZINE |
5 | 10112036 | DESCHÊNES & FILS LTÉE (QUEBEC) | RICHARD TREMBLAY |
6 | BERGER ASSOCIATES | 10111101 | DAREEN HULSMAN |
7 | 19131385 | EAGLE FINANCIAL GROUP INC | PATRICK MCKINNIE |
8 | LAKENYA PASKETT | WHITEHALL FINANCIAL GROUP | 15954710 |
When you run the mapping, the Labeler transformation compares the input data with the probabilistic model reference data. The Labeler transformation assigns a label to each input value. The transformation writes the labels to an output port. Each output row contains a set of labels that defines the data structure on the corresponding input row.
The following table describes the labels that the Labeler transformation adds to the output port:
Row ID | Output Labels |
---|
1 | number organization contact |
2 | number contact organization |
3 | number organization contact |
4 | number contact organization |
5 | number organization contact |
6 | organization number contact |
7 | organization number contact |
8 | contact organization number |
Parser Transformation Example
A supermarket stores product descriptions in a single column in a database table. The product descriptions contain multiple data values that represent different types of information. You are a data steward at the supermarket. You want to create columns for the different types of information in the product descriptions.
You configure a mapping with a Parser transformation to organize the data values into the correct fields.
The following data fragment contains the product description for orange juice:
Sunnydream Orange Juice Unsweetened 12 oz
The following table describes the output data that the Parser transformation creates from the input data:
Product Name | Product Type | Product Details | Product Size |
---|
Sunnydream | Orange Juice | Unsweetened | 12 oz |