Probabilistic Model Label Data
The label values in a probabilistic model represent the types of information that the reference data values might contain. When you add data rows to a model, assign a label to each data value in each row. The labels that you add to the model appear in the Label view and in menu options in the Data view.
You can assign any label in the model to any reference data value. If the same value has different meanings in two rows of reference data, you can assign different labels to the value in each row.
You can define the same combination of labels for multiple input strings. Multiple examples of a label increase the accuracy of the probabilistic model.
Overflow Label
When a transformation cannot assign a label that you define to an input data value, the transformation assigns an overflow label to the data.
The Labeler transformation assigns an overflow label to any data value that it cannot identify. The Parser transformation creates an overflow column for unassigned data.
The following table shows how a Parser transformation uses an overflow port to parse address data elements that a probabilistic model does not recognize:
Input Data | Street_Name port | Street_Descriptor port | Overflow port |
---|
Park Place | Park | Place | |
Park Avenue | Park | Avenue | |
Madison Avenue | Madison | Avenue | |
Central Park | Central | Park | |
Washington Square Park | Washington | Square | Park |
Madison Square Garden | Madison | Square | Garden |
The transformation assigns values to the Overflow port when the number of input data values is greater than the number of labels in the probabilistic model. Before you use a model in a transformation, review the mapping source data and verify that the model contains the correct number of label values.
Assigning Labels to Probabilistic Model Data
Assign a label to all the data value in a probabilistic model.
You can assign different labels to the same data value if the data value appears in different locations in the data rows. If a data value uses an incorrect label, you can update the label.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Select the Data view.
4. Find the data value that does not have a label or that has an incorrect label.
5. Select the data row that contains the data value.
The row appears in the editor.
6. Right-click a data value in the editor and select a label from the context menu.
The Developer tool assigns the label to the data value.
7. Save the probabilistic model.
After you save the model, optionally compile the model.
Adding a Label to a Probabilistic Model
Add a label for every type of information that the model data values represent. If you use the probabilistic model in a Parser transformation, add a label for each output port that you expect the transformation to create.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. In the Data view or the Label view, click Manage Labels.
The Manage Labels dialog box appears.
4. In the Manage Labels dialog box, click New.
A label appears in the first empty row in the dialog box.
5. Edit the label name. Optionally, update the color for the label.
6. Click OK to add the label to the model.
7. Save the probabilistic model.
After you add the label, you must assign the label to at least one data value. Compile the model before you use it in a transformation.
Deleting a Label from a Probabilistic Model
When you delete a label from a model, any data value associated with the label remains in the model. Assign another label to each data value.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. In the Data view or the Label view, click Manage Labels.
The Manage Labels dialog box appears.
4. In the Manage Labels dialog box, click Delete.
5. Click OK to delete the label.
6. Save the probabilistic model.
Note: A label is a structural element in a probabilistic model. If you add or remove a label after you add the model to a transformation, you invalidate the operation that uses the model. To use the model that you updated, delete and re-create the transformation operation.
Compiling the Probabilistic Model
Each time you update a probabilistic model, you can compile the model. Compile the model to update the model logic with the current data values and label values.
Before you compile the model, verify that all label values identify at least one data value.
- •To compile the model, open the model in the Developer tool and click Compile.