Probabilistic Model Structure
A probabilistic model contains a set of data rows and a corresponding set of labels. The data rows contain examples of the different values that might appear in the transformation input data. The labels identify the types of information that you expect the input values to contain.
When you define a probabilistic model, you assign each label to one or more values in the data rows. When you compile the model, the Developer tool generates a reference data object that represents the relationships between the label values and the data values.
The Developer tool stores the label values and the data values in a data file on the Informatica services host machine. The data file also contains the metadata that defines the associations between the label values and the data values. When you compile a probabilistic model, you refresh the links between the label values and the data values.
If you delete the data file that a probabilistic model uses, the probabilistic model becomes read-only. You cannot compile a read-only probabilistic model.
A data row can contain a single value or multiple values. Each data row can have a different structure. You can assign the same label to multiple values in a data row. Alternatively, you can assign a different label to identical values that appear in different positions in a row. Assign each label to at least one data value before you compile the probabilistic model.
Note: To optimize the capabilities of the probabilistic model, verify that each data row contains multiple values. The order of the values in each row must correspond as closely as possible to the order in which the values occur in the source data. If the data rows contain single values, the Labeler transformation or Parser transformation cannot apply natural language processes in the input data analysis.