Probabilistic Model Structure
A probabilistic model contains a column of reference data values and one or more columns of label values. The reference data values represent the different values that can appear in the input data. The labels represent the types of information that the input data values can contain.
A Labeler or Parser transformation uses the label values to analyze the structure of the data in an input row. The structure of the row determines the type of information in each data value. Each input row can have a different structure. When you assign labels to the reference data values, you define the possible order in which the input data values might appear.
A probabilistic model also contains compilation data. The transformations use the compilation data to calculate similarities between the reference data and the input data. When you compile a model, you create or update the compilation data.
The following figure shows a probabilistic model in the Developer tool:
When you use a probabilistic model in a Labeler transformation, the Labeler assigns a label value to each value in the input row. For example, the transformation labels the string "Franklin Delano Roosevelt" as "FIRSTNAME MIDDLENAME LASTNAME."
When you use a probabilistic model in a Parser transformation, the Parser writes each input value to an output port based on the label that matches the value. For example, the Parser writes the string "Franklin Delano Roosevelt" to FIRSTNAME, MIDDLENAME, and LASTNAME output ports.
Compiling the Probabilistic Model
Each time you update a probabilistic model, you can compile the model. Compile the model to update the model logic with the current data values and label values.
Before you compile the model, verify that all label values identify at least one data value.
- •To compile the model, open the model in the Developer tool and click Compile.