Reference Data Guide > Probabilistic Models > Probabilistic Model Configuration

Probabilistic Model Configuration

The steps to configure a probabilistic model begin with the type of analysis that you want to perform. Use a probabilistic model in a Labeler transformation to identify the types of information in each value in an input string. Use a probabilistic model in a Parser transformation to parse the data values in an input string to different output ports.

You can use the same probabilistic model to label data and to parse data. When you use the model in a Labeler transformation, the transformation creates a single output port for each input port that you select. When you use the model in a Parser transformation, the transformation creates an output port for each type of input data that it identifies.

To create a probabilistic model, perform the following tasks:

1. Identify the reference data values and the label values to add to the model.

You can use a fragment of the data that you want to analyze. Create a data object in the Model repository that reads the data fragment.

2. Create a content set, and add a probabilistic model to the content set.
3. Add the reference data values to the model.
4. Add the label values to the model.

You can import the data from the data object in the Model repository. You can also enter a single row of reference data or a single label.

To use the probabilistic model to parse data, verify that the model contains a label value for each output port that the transformation must create.

5. Assign a label to reference each data value in each row.

You can assign a label to multiple reference data values in a single operation.

6. Compile the model.

After you compile the probabilistic model, you can use the model in a transformation.

Creating an Empty Probabilistic Model

You can create a probabilistic model object that does not contain reference data or label data. Create the empty model, and add data or import data to the model.

1. In Object Explorer, open or create a content set.

2. Select the Content view.

3. Select Probabilistic Models, and click Add.

The Probabilistic Model wizard opens.

4. Select the Probabilistic Model option.

Click Next.

5. Enter a name for the probabilistic model.

Optionally, enter a text description of the model.

6. Click Finish.

Creating a Probabilistic Model from a Data Object

You can use a data object as a source for probabilistic model data.

A probabilistic model performs optimally when you use the input data to the Labeler or Parser transformation as the source for the model reference data.

1. In Object Explorer, open or create a content set.

2. Select the Content view.

3. Select Probabilistic Models, and click Add.

The Probabilistic Model wizard opens.

4. Select the Probabilistic Model from Data Objectsoption.

Click Next.

5. Enter a name for the probabilistic model.

Optionally, enter a text description of the model.

6. Browse the Model repository and select the data object that contains the data to import.

Do not select a social media data object.

Click Next.

7. Review the columns on the data object, and select one or more columns to add to the model. You can add reference data columns and a label column in the same operation.

- To import a column of data as reference data, select the column name and click Data.

You can select multiple data columns. The Developer tool merges the contents of the columns that you select to a single column.

- To import a column of data as label values, select the column name and click Label.

When you import reference data and label values, the Developer tool assigns the label on each row to the reference data string on the same row. You can preview the data before you select the columns. You can change the label assignments after you create the model.

Click Next.

8. Select the number of rows to import from the data source.

By default, the Developer tool imports all rows from the data source. If you enter a number, the model counts the rows from the start of the data set.

9. Specify the delimiters for the data values that you import.

You can specify different delimiters for reference data values and label values. The default delimiter is a character space.

10. Click Finish, and save the model.

After you create the probabilistic model, verify the label assignments and compile the model.

Appending Data from a Data Source to a Probabilistic Model

You can import multiple rows of reference data values and label values to a probabilistic model in a single operation.

1. Open the content set that contains the probabilistic model.

2. Select the model name, and click Edit.

3. Click Append Data.

The Probabilistic Model wizard opens.

4. Browse the Model repository and select the data object that contains the data to import.

Do not select a social media data object.

Click Next.

5. Review the columns on the data object, and select one or more columns to add to the model. You can add reference data columns and a label column in the same operation.

- To import a column of data as reference data, select the column name and click Data.

You can select multiple data columns. The Developer tool merges the contents of the columns that you select to a single column.

- To import a column of data as label values, select the column name and click Label.

Click Next.

6. Select the number of rows to import from the data source.

By default, the Developer tool imports all rows from the data source. If you enter a number, the model counts the rows from the start of the data set.

7. Specify the delimiters for the data values that you import.

You can specify different delimiters for reference data values and label values. The default delimiter is a character space.

8. Click Finish, and save the model.

Adding a Reference Data Row to a Probabilistic Model

Use the Data view to add an empty row to a probabilistic model.

1. Open the content set that contains the model.

Select the model name, and click Edit.

2. Select the Data view.

3. To add an empty row to the model, click New.

4. Select the row that you added, and enter one or more reference data values to the row.

5. Save the probabilistic model.

After you save the model, assign a label to each value in the row. Optionally, compile the model.

Adding a Label to a Probabilistic Model

You can add a single label to a probabilistic model. Add a label for every type of information that the model data values represent. If you use the probabilistic model in a Parser transformation, add a label for each output port that you expect the transformation to create.

1. Open the content set that contains the model.

2. Select the model name, and click Edit.

3. In the Data view or the Label view, click Manage Labels.

The Manage Labels dialog box appears.

4. In the Manage Labels dialog box, click New.

A label appears in the first empty row in the dialog box.

5. Edit the label name. Optionally, update the color for the label.

6. Click OK to add the label to the model.

7. Save the probabilistic model.

After you add the label, assign the label to at least one data value.

Assigning a Label to a Reference Data Value

You can assign a label to a single data value in a reference data row.

You can assign different labels to the same data value if the data value appears in different locations in the row or in different rows.

1. Open the content set that contains the model.

2. Select the model name, and click Edit.

3. Select the Data view.

4. Find a data value that does not have a label or that has an incorrect label. Data values that use a label are color-coded.

5. Select the data row that contains the data value.

The row appears in the editor.

6. Right-click a data value in the editor and select a label from the context menu.

The Developer tool assigns the label to the data value.

7. Save the probabilistic model.

After you save the probabilistic model, optionally compile the model.

Assigning a Label to Multiple Data Values

You can assign a label to multiple reference data values in a single operation.

1. Open the content set that contains the model.

2. Select the model name, and click Edit.

3. Click Assign Label.

The Assign a Label to Multiple Values dialog box opens.

4. Enter one or more characters in the Find field.

You can enter wildcard characters in the Find field.

5. Optionally, select additional search criteria.

You can select or clear the following options:

- Match case.

Specifies that the search operation is case sensitive. Do not use wildcard characters with the option.

- Match full string. Specifies that the search operation looks for a complete match between the characters in the reference data value and the characters that you enter. Do not use wildcard characters with the option.
- Ignore labeled values.

Specifies that the search operation skips any reference data value that uses a label.

6. Select a label to assign to the reference data values that match the search criteria.

You can also select the No Label option. Select the option to remove the label from the reference data values that include the characters that you enter.

7. Click Start.

The Developer tool assigns the label to all reference data values that match the search criteria that you define.

Note: To view the reference data values that you labeled in a single operation, use the Assigned by bulk filter in the Label view.

Deleting Rows from a Probabilistic Model

You can delete one or more reference data rows from a probabilistic model in a single action.

1. Open the content set that contains the model.

2. Select the model name, and click Edit.

3. In the Data view, select one or more reference data rows.

4. Click Delete.

The Developer tool removes the rows that you selected from the classifier model.

To undo the operation, press the Ctrl + Z keys on the keyboard.

Deleting a Label from a Probabilistic Model

When you delete a label value from a model, any reference data value that used the label remains in the model. Assign another label value to each reference data value.

1. Open the content set that contains the model.

2. Select the model name, and click Edit.

3. In the Data view or the Label view, click Manage Labels.

4. In the Manage Labelsdialog box, select a label value.

5. Click Delete.

6. Click OK to delete the label.

7. Save the probabilistic model.

Note: A label is a structural element in a probabilistic model. If you add or remove a label after you add the model to a transformation, you invalidate the operation that uses the model. To use the model that you updated, delete and re-create the transformation operation.

Compiling the Probabilistic Model

When you update the data or the label assignments in a probabilistic model, you can compile the model. Compile the model to update the model logic with the associations between the current reference data values and the current label values.

Before you compile the probabilistic model, verify that each label value identifies at least one reference data value.

•To compile the model, open the model in the Developer tool and click Compile.

Finding Data Rows in a Probabilistic Model

Use the Data view to find the reference data rows that contain a value that you enter.

1. Open the content set that contains the probabilistic model.

2. Select the model name, and click Edit.

3. Select the Data view.

4. Enter one or more characters in the Find field.

The Data view displays the first row in the model that contains the value that you entered.

5. Use the Up arrow or Down arrow to move to other rows that contain the value.

Filtering Reference Data Values by Label Assignment

Use the Label view to find the reference data values that use a label that you specify. Filter the results based on the method that you used to assign the label.

1. Open the content set that contains the probabilistic model.

2. Select the model name, and click Edit.

3. In the Label view, select a label value.

The probabilistic model displays a list of the reference data values that use the label. The model also shows the number of data values that use the label.

4. Apply a filter to the list of reference data values that use the label.

Select one of the following filters:

- All. Displays the reference data values that use the label. All is the default option.
- Assigned by user. Displays any reference data value that you selected individually when you assigned the label.
- Assigned by bulk. Displays the reference data values to which you assigned a label as part of a bulk assignment operation.

The probabilistic model displays the reference data values that satisfy the filter condition.

Finding Unused Label Values

Use the Label view to find any label value that you did not assign to a reference data value. You must assign each label to at least one reference data value.

1. Open the content set that contains the probabilistic model.

2. Select the model name, and click Edit.

3. In the Label view, select a label value.

The probabilistic model displays a list of the reference data values that use the label. The model also shows the total number of data values that use the label.

If the total number of data values is zero, you did not assign the label to any reference data value in the probabilistic model.