Probabilistic Model Label Data
The label values in a probabilistic model represent the types of information that the reference data values might contain. When you add reference data rows to a model, assign a label to each value in each row. The labels that you add to the model appear in the Label view and in the menu options in the Data view.
You can assign any label in the model to any reference data value. If the same value has different meanings in different rows of reference data, you can assign a different label to each value in each row.
The range of label values can correspond to the range of input ports that the Labeler transformation or the Parser transformation reads during probabilistic analysis. The probabilistic model must contain at least one label value that the transformation can apply to the data values on each input port.
For example, a warehouse might store inventory data in a comma-delimited file that defines eight columns. You design a mapping that parses the inventory data to a database table. You create a probabilistic model with a label value for each data column. When you run the mapping, the Parser transformation writes each value in the input data to the correct column in the target table.
The following table shows the columns of inventory data and the label values that you might create in a probabilistic model:
Inventory Column Name | Label Name |
---|
Product_Name | Product_Name |
Quantity | Quantity |
Location | Location |
Barcode | Barcode |
SKU | Stock_Keeping_Unit |
Arrival_Date | Arrival_Date |
Cost_Price | Cost_Price |
Note: You can use the input column names, or you can use other names. The names do not need to match.
Overflow Label
When a transformation cannot apply a label to an input data value, the transformation treats the data value as overflow data. The Labeler transformation applies an overflow label to any data value that it cannot identify. The Parser transformation writes any data value that it cannot identify to an overflow port.
The following table shows how a Parser transformation might use an overflow port to parse address data elements that a probabilistic model does not recognize:
Input Data | Street_Name port | Street_Descriptor port | Overflow port |
---|
Park Place | Park | Place | No overflow data |
Park Avenue | Park | Avenue | No overflow data |
Madison Avenue | Madison | Avenue | No overflow data |
Central Park | Central | Park | No overflow data |
Washington Square Park | Washington | Square | Park |
Madison Square Garden | Madison | Square | Garden |
The Parser transformation also writes values to an overflow port when the number of input values is greater than the number of labels in the model. Before you use a probabilistic model in a transformation, review the input data and verify that the model contains the correct number of label values.