Probabilistic Model Structure
A probabilistic model contains rows of reference data values and label values. The reference data values represent the different values that might appear in the transformation input data. The label values identify the types of information that you expect the input data to contain.
A probabilistic model also contains compilation data. The Labeler transformation and the Parser transformation use the compilation data to measure the similarities between the reference data in the model and the transformation input data. When you compile a probabilistic model, you create or update the compilation data.
A data row can contain a single value or multiple values. Each data row can have a different structure. You can assign the same label to different values in a data row. Alternatively, you can assign a different label to identical values that appear in different positions on a row. The Data Integration Service considers the relative positions of the values in the input string when the mapping runs. Assign each label to at least one data value before you compile the probabilistic model.
The Developer tool writes the reference data values, the label values, and the compilation data to a file in the Informatica directory structure. The probabilistic model object in the Model repository stores the file name. When you save a probabilistic model, you write the current reference data values and the label values to the file. When you compile the model, you update the compilation data in the file. You can read the file name from the model properties in the Developer tool.
Note: To optimize the capabilities of the probabilistic model, verify that each data row contains multiple reference data values. The order of the values in each row must correspond as closely as possible to the order in which the values occur in the transformation input data. If the data rows contain single reference data values, the Labeler transformation or the Parser transformation cannot apply natural language processes during the probabilistic analysis.