Probabilistic Models > Probabilistic Model Label Data
  

Probabilistic Model Label Data

A probabilistic model contains descriptive labels for the types of information in the reference data. When you create a model or add reference data to a model, assign a label to each reference data value.
The labels you create appear as columns in the probabilistic model. When you assign a label to a data value, the model adds the value to the label column. You can assign any label in the model to any reference data value. If the same value has different meanings in two rows of reference data, you can assign different labels to the value in each row.
You can define the same combination of labels for multiple input strings. Multiple examples of a label increase the likelihood that the probabilistic model assigns the correct label to an input data value.

Address Data Example

You can build a probabilistic model to parse address data values. The probabilistic model determines the address data type from the value and also from its position in the input string. For example, the model can determine when the same value is a street name or an address suffix.
The following table shows how you can assign labels to address data values in different combinations:
Reference Data
Label 1 - Street Names
Label 2 - Address Suffixes
Park Place
Park
Place
Park Avenue
Park
Avenue
Madison Avenue
Madison
Avenue
Central Park
Central
Park
State Street
State
Street
The Labeler transformation can return any of the label combinations that you define in the model. Organize the label columns from left to right in the order in which you want the labels to appear in the output data.
Note: If you add or remove a label in a probabilistic model after you add the model to a Parser transformation, you invalidate the parsing operation that uses the model. You must delete and re-create the operation that uses the probabilistic model.
If a probabilistic model contains a label value that does not identify a data value, you cannot compile the model.

Overflow Label

When a transformation cannot assign a label that you define to an input data value, the transformation assigns an overflow label to the data.
The Labeler transformation assigns an overflow label to any data value that it cannot identify. The Parser transformation creates an overflow column for unassigned data.
A transformation can fail to recognize an input value if the number of values in the input row exceeds the number of labels in the probabilistic model. Before you use a model in a mapping, review the mapping source data and verify that the model contains the correct number of label values.
The following table shows how a Parser transformation uses an overflow port to parse data that a probabilistic model cannot recognize:
Input Data
Street_Names port
Address_Suffixes port
Overflow port
Park Place
Park
Place
Park Avenue
Park
Avenue
Madison Avenue
Madison
Avenue
Central Park
Central
Park
Washington Square Park
Washington
Square
Park
Madison Square Garden
Madison
Square
Garden

Assigning Labels to Probabilistic Model Data

Assign a label to every data value in every row, in a model.
You can assign different labels to the same data value if the data value appears in multiple locations in the input data. For example, you can assign the labels FIRSTNAME LASTNAME to the names "John Blake" and "Blake Smith."
    1. Open the content set that contains the model.
    2. Select the model name and click Edit.
    3. Verify that the model contains the reference data that you need.
    4. Right-click an input data row and select New Label. Enter a column name in the New Label dialog box.
    The label appears in the model.
    5. Right-click an input data row and select View tokens and labels as rows.
    The Labels panel displays under the input data column. The panel displays each reference data value as a data row.
    6. In the Tokens column, select a reference data value.
    7. In the Labels column, select a label to assign to the data value.
    8. Save the probabilistic model.
Note: A label is a structural element in a model. If you add or remove a label after you add the model to a transformation, you invalidate the operation that uses the model. Delete and re-create the transformation operation.

Adding a Label to a Probabilistic Model

Add a label for every type of data value in the Data column. If you use the probabilistic model in a Parser transformation, add a label for each output port that you expect the transformation to create.
    1. Open the content set that contains the model.
    2. Select the model name and click Edit.
    3. From the Label menu, select New.
    4. In the New Label dialog box, enter a label name.
    5. Click OK to add the label to the model.

Deleting a Label from a Probabilistic Model

When you delete a label from a model, any data value associated with the label remains in the model. Assign another label to each data value.
    1. Open the probabilistic model in the Developer tool.
    To open the model, select the model name in the content set and click Edit.
    2. From the Label menu, select Edit.
    3. In the Edit Label dialog box, select a label name.
    4. Click Delete to delete the label.
    5. Click OK.