Probabilistic Model Label Data
A probabilistic model contains descriptive labels for the types of information in the reference data. When you create a model or add reference data to a model, assign a label to each reference data value.
The labels you create appear as columns in the probabilistic model. When you assign a label to a data value, the model adds the value to the label column. You can assign any label in the model to any reference data value. If the same value has different meanings in two rows of reference data, you can assign different labels to the value in each row.
You can define the same combination of labels for multiple input strings. Multiple examples of a label increase the likelihood that the probabilistic model assigns the correct label to an input data value.
Address Data Example
You can build a probabilistic model to parse address data values. The probabilistic model determines the address data type from the value and also from its position in the input string. For example, the model can determine when the same value is a street name or an address suffix.
The following table shows how you can assign labels to address data values in different combinations:
Reference Data | Label 1 - Street Names | Label 2 - Address Suffixes |
---|
Park Place | Park | Place |
Park Avenue | Park | Avenue |
Madison Avenue | Madison | Avenue |
Central Park | Central | Park |
State Street | State | Street |
The Labeler transformation can return any of the label combinations that you define in the model. Organize the label columns from left to right in the order in which you want the labels to appear in the output data.
Note: If you add or remove a label in a probabilistic model after you add the model to a Parser transformation, you invalidate the parsing operation that uses the model. You must delete and re-create the operation that uses the probabilistic model.
If a probabilistic model contains a label value that does not identify a data value, you cannot compile the model.
Overflow Label
When a transformation cannot assign a label that you define to an input data value, the transformation assigns an overflow label to the data.
The Labeler transformation assigns an overflow label to any data value that it cannot identify. The Parser transformation creates an overflow column for unassigned data.
A transformation can fail to recognize an input value if the number of values in the input row exceeds the number of labels in the probabilistic model. Before you use a model in a mapping, review the mapping source data and verify that the model contains the correct number of label values.
The following table shows how a Parser transformation uses an overflow port to parse data that a probabilistic model cannot recognize:
Input Data | Street_Names port | Address_Suffixes port | Overflow port |
---|
Park Place | Park | Place | |
Park Avenue | Park | Avenue | |
Madison Avenue | Madison | Avenue | |
Central Park | Central | Park | |
Washington Square Park | Washington | Square | Park |
Madison Square Garden | Madison | Square | Garden |
Assigning Labels to Probabilistic Model Data
Assign a label to every data value in every row, in a model.
You can assign different labels to the same data value if the data value appears in multiple locations in the input data. For example, you can assign the labels FIRSTNAME LASTNAME to the names "John Blake" and "Blake Smith."
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Verify that the model contains the reference data that you need.
4. Right-click an input data row and select New Label. Enter a column name in the New Label dialog box.
The label appears in the model.
5. Right-click an input data row and select View tokens and labels as rows.
The Labels panel displays under the input data column. The panel displays each reference data value as a data row.
6. In the Tokens column, select a reference data value.
7. In the Labels column, select a label to assign to the data value.
8. Save the probabilistic model.
Note: A label is a structural element in a model. If you add or remove a label after you add the model to a transformation, you invalidate the operation that uses the model. Delete and re-create the transformation operation.
Adding a Label to a Probabilistic Model
Add a label for every type of data value in the Data column. If you use the probabilistic model in a Parser transformation, add a label for each output port that you expect the transformation to create.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. From the Label menu, select New.
4. In the New Label dialog box, enter a label name.
5. Click OK to add the label to the model.
Deleting a Label from a Probabilistic Model
When you delete a label from a model, any data value associated with the label remains in the model. Assign another label to each data value.
1. Open the probabilistic model in the Developer tool.
To open the model, select the model name in the content set and click Edit.
2. From the Label menu, select Edit.
3. In the Edit Label dialog box, select a label name.
4. Click Delete to delete the label.
5. Click OK.