Parse assets > Pattern-based parsing operations > Configuration summary for pattern-based parsing
  

Configuration summary for pattern-based parsing

To perform pattern-based parsing, create a parse asset in Data Quality and select the Pre-built parsing mode.
Optionally, test the asset configuration. Use the test results to update the pattern logic that the asset uses.

Prerequisites

Before you perform pattern-based parsing, verify that the CDQ_Name_Parsing_Reference_Data_Bundle is present in the Add-On Bundles folder in Explorer.

Testing strategies for pattern-based parsing

You can test the asset before you use it in a mapping and after you use it in a mapping. For example, you might test the asset and update the pattern data before your first mapping run. You might then use the results of the first mapping run to update the pattern data so that the mapping parses the source data more comprehensively.
When you test the asset before you run a mapping, you add one or more patterns from the test results to the pattern-based logic in the asset. When you test the asset with the results of a mapping, you can add both the patterns and the names that the patterns identify to the pattern-based logic in the asset.

Process flow

The following steps summarize the configuration process:
  1. 1If necessary, install the CDQ_Name_Parsing_Reference_Data_Bundle asset bundle.
  2. 2Select Pre-built parsing mode.
  3. 3Select the locale and the format of the data that the operation will read.
  4. Note: After you verify the locale and the data format, the asset is ready to use in a Parse transformation. You can use the steps that follow to enhance the pattern-based logic that the asset applies to your data.
    You can test and enhance the pattern-based logic before you add the asset to the Parse transformation. Or, you can run a mapping that contains the transformation and use the mapping results to update the pattern-based logic.
  5. 4Test a sample of your data in the parse asset. Enter the values that you want to test, or import a file that contains the values.
  6. You can import a file and perform the test on your data in the following ways:
  7. 5Review the results of the test.
  8. Find any name that the test failed to parse. Copy the pattern for each name to a CSV or Microsoft Excel file. Optionally, copy the name along with the pattern.
  9. 6Import the file that contains the pattern data and optionally the name data to the asset. Use the Add User-Defined Patterns option to import the file data.
  10. 7Map the values in each pattern to appropriate fields in the user-defined pattern grid. Then, save the asset.
  11. Note: When you map a pattern to the appropriate fields, you train the asset to recognize names with a structure that matches the pattern.
  12. 8Run the test again and review the results.
  13. You may decide to import additional patterns to the asset in order to further improve the pattern parsing logic.
When you are satisfied with the performance of the parsing operation on your sample data, save the asset. A Data Integration user can add the asset to a Parse transformation in a mapping and run the mapping on your data.

Installing the asset bundle for pattern-based parsing

Before you perform pattern-based parsing, verify that the dictionaries that the parsing operation reads are installed for your organization. Informatica provides the dictionaries in a bundle named CDQ_Name_Parsing_Reference_Data_Bundle. Find the bundle in the Add-On Bundles folder on the Explore page.
If the bundle is not present, install the bundle. You install the bundle in the Administrator service.
Note: CDQ_Name_Parsing_Reference_Data_Bundle contains 84 dictionaries. If you find the bundle on the Explore page and the bundle contents are incomplete, uninstall the bundle from the Add-On Bundles page in Administrator. Then, install the bundle.
To install a bundle, perform the following steps:
    1In Administrator, select Add-On Bundles.
    2Click Available Bundles.
    The Available Bundles tab lists the public and private bundles that are available for installation or copying.
    3If the bundle that you want to install is an unlisted bundle, enter the bundle access code in the Find field.
    4Click the bundle name to open the Bundle Details page.
    5Verify that the Allow field is set to Reference or to Reference and Copy.
    You cannot install a bundle that is configured for copying only.
    6Click Install.
Administrator displays a notification to indicate the status of the installation.
You can find the installed bundle name on the Installed Bundles tab of the Add-On Bundles page in Administrator. In Data Quality, the bundle is added to the Add-On Bundles project in the Explore page.

Configuring the pre-built options for pattern-based parsing

If your data matches the default settings in Pre-built mode on the parse asset, you can save the asset for use in a Parse transformation with minimal configuration.
    1On the Configuration tab, select the Pre-built parsing mode.
    2In the Pre-built Pattern Properties pane, configure the following options:
    3Save the asset.
After you complete the configuration steps, test the asset configuration.

Testing and updating the pattern-based parsing configuration

Test a parse asset to verify that data flows through the asset in the ways that you expect. You can then update the asset with the pattern for any name that the test failed to parse. The asset stores the patterns that you add and includes them in the pattern logic that the Parse transformation uses at run time.
You can update the asset with pattern data, and you can update the asset both with patterns and with the names in your input data that are associated with the patterns.
You might update an asset exclusively with pattern data when you test the asset before you add it to a Parse transformation. You might update an asset with both pattern data and name data after you run a mapping with the Parse transformation and review the output from the transformation.

Iterative testing

Testing the asset and updating the pattern-based logic for a source data set can be an iterative process. You might test the asset first with a sample of your source data and later test the asset with the name data and associated patterns that you read from the transformation output data.

Testing the parsing configuration with name data

The following steps describe the process to test the parse asset with name data that you import. When you test the data, you might find names that the asset does not parse. You can then add the patterns for the unparsed names to the asset logic.
    1Open the parse asset that you created for pattern-based parsing.
    2Select the Configuration tab.
    3 Select a Secure Agent to run the test.
    To refresh the list of active Secure Agents, click the Refresh icon.
    4Import a sample of the source data that the Parse transformation will run on.
    Use the Import option in the Test Results pane to import the data from a CSV or Microsoft Excel file. The sample data must populate the first column in the file.
    The input data appears in the Inputs column.
    5Click Test.
    The output columns display the test results.
    6Review the results of the test:
    7Import the file that contains the pattern to the asset.
    Use the Add User-Defined Patterns option to import the data. The patterns must populate the first column in the file.
    You can import up to 5,000 user-defined patterns to a parse asset from a file that you specify.
    8Map the values in each pattern to an appropriate field in the pattern grid. For each value, select the field that best matches the type of information that the value represents.
    Use the number in each pattern value as a guide when you map the values. The numbers match the order in which the data values appear in the input row. The numbered values in each pattern begin at (0).
    9After you map the pattern values to the appropriate fields, run the test again and review the results.
    You may decide to import additional patterns to further update the pattern parsing logic.
    10When you are satisfied with the performance of the parsing operation on your sample data, save the asset.

Testing the parsing configuration with pattern and name data

The following steps describe the process to test the parse asset with pattern data and associated name data. For example, you might discover names that the Parse transformation did not parse successfully at run time. Import the names and the associated patterns from the transformation output to the asset in Data Quality.
To import the names and associated patterns, copy the data to a CSV or Microsoft Excel file. The pattern values must populate the first column in the file, and the name values must populate the second column.
    1Open the parse asset that you created for pattern-based parsing.
    2Select the Configuration tab.
    3Select a runtime environment in which to test the configuration.
    4Use the Add User-Defined Patterns option to import the unparsed name and pattern data that the Parse transformation wrote as output.
    You can import up to 5,000 user-defined patterns to a parse asset from a file that you specify.
    When you select the file that contains the data, the Import Patterns dialog box opens. Select the Import Input Data option in the dialog box.
    The pattern data appears in the User-Defined Patterns pane and the name data appears in the Inputs column of the Test Results pane.
    5Map the values in each pattern to an appropriate field in the pattern grid. For each value, select the field that best matches the type of information that the value represents.
    Use the number in each pattern value as a guide when you map the values. The numbers match the order in which the data values appear in the input row. The numbered values in each pattern begin at (0).
    6Click Test.
    The output columns display the test results.
    7Review the results of the test:
    8Import the file that contains the latest pattern and name data.
    9Map the pattern values to the appropriate fields in the pattern grid.
    10Run the test again and review the results.
    You may decide to import additional patterns to further update the pattern parsing logic.
    11When you are satisfied with the performance of the parsing operation on your sample data, save the asset.
Example 1. Further information
For more information about importing files, see Rules and guidelines for test data.
For more information about the meaning of the pattern label values, see Rules and guidelines for user-defined pattern labels.

Pattern selection and row selection options

When you import data to the User-Defined Patterns list, you can use the pattern and row selection options to undo any pattern match that you define and to delete unwanted rows of data from the list. Use the row selection options to clear the pattern matches from a single row or delete a single row. Use the pattern selection options to clear the pattern matches from multiple rows or delete multiple rows in a single action.
The following image shows the row selection options:
Image showing the property button at the end of a row.
The row selection options include the following properties:
  1. 1Row properties option
  2. Opens the menu of row selection options.
  3. 2Delete Row
  4. Deletes the current row of data.
    Clear Values
    Removes all pattern matches that you selected in the current row. The option does not delete the pattern that you imported.
The following image shows the pattern selection options:
The image shows the pattern properties menu that lists the pattern selection options.
The pattern selection options include the following properties:
  1. 1Pattern properties option
  2. Opens the menu of pattern selection options. The option also displays the number of rows that you selected.
  3. 2Select All
  4. Selects all rows under User-Defined Patterns.
  5. 3Select None
  6. Delesects all rows under User-Defined Patterns.
    Note: You can also use the check boxes beside each row to select or deselect one or more rows.
  7. 4Delete Selected
  8. Deletes the rows of data that you selected.
  9. 5Clear Values from Selected
  10. Removes all pattern matches from the rows that you selected. The option does not delete the patterns that you imported.