When a parsing step cannot identify a value in an input field, the step writes the value to a field for unparsed data. When a parsing step successfully identifies a value but cannot assign the value to the designated output field, the step writes the value to a field for overflow data. A step writes a value to an overflow field when it has already parsed another value successfully to the designated output field.
When a step assigns a value to an unparsed data field, the value remains available for parsing from its original input field by subsequent steps in the asset. When a step writes a value to an overflow field, the value is no longer available to subsequent steps in the asset. In this way, a value is not finally written to the unparsed data field until all of the steps in the asset are complete.
The number of overflow fields that the asset generates at run time depends on the types of step that you configure and the asset properties that you define. For example, you might select the Detailed Overflow property to enable the creation of a dedicated overflow field for each step. The Detailed Overflow property is available in custom mode.
Example: using the overflow data field
You might configure a parsing step to split a field of multiple values into discrete fields based on the types of information that the values represent. You can use the fields for parsed data and overflow data to capture values that contain the same type of information.
For example, the following product description contains color and product type information:
EMERALD GREEN WOODSTAIN
The product description uses two values define the color. You can define a dictionary step to parse the color names and the product type to discrete fields. The step reads a dictionary of color names and specifies a single overflow field.
At run time, the step identifies EMERALD and GREEN as colors that match values in the dictionary. The step writes EMERALD to the field for parsed data and writes GREEN to the overflow field. The step additionally writes WOODSTAIN to the unparsed data field. As a result, the step writes the terms EMERALD, GREEN, and WOODSTAIN to discrete fields.
You can optionally merge the fields that contain colors in a downstream transformation.
Example: using an unparsed data field to create new fields
You might configure a parsing step to write a uniform or consistent set of data values to an unparsed data field.
For example, the following contact field includes a name and a prefix:
PROFESSOR INDIRA SINGH
You define a step that uses a dictionary of prefix values to parse the prefix PROFESSOR to an output field. The step parses all other information to an unparsed data field.
Because the unparsed data field contains a set of values that uniformly represent a person name, you can use the field as person name data in downstream operations.
Rules and guidelines for overflow and unparsed data fields
The parse asset configuration determines how the associated Parse transformation creates overflow and unparsed data fields at runtime.
Consider the following rules and guidelines about the overflow and unparsed fields:
•When you configure an asset in Custom mode, the Detailed Overflow option determines how the Parse transformation creates overflow fields. If you clear the option, the transformation creates a single overflow field for all overflow data from the asset. If you select the option, the transformation creates an overflow field for each step in the asset.
Find the Detailed Overflow option in the Parse Properties dialog box when you configure the asset in custom mode. Open the properties from the Data Quality toolbar. The option is cleared by default.
•When you configure an asset in Pre-built mode, the locale that you select determines whether the Parse transformation creates an overflow field. The transformation creates an overflow field when you set the locale to Brazil or Portugal. The transformation does not create an overflow field in other locales.
•A Parse transformation creates a single field for all unparsed data when you configure the asset in Custom mode.
•The locale that the asset specifies in Pre-built mode determines whether the Parse transformation creates an unparsed data field. The transformation does not create an unparsed data field when you set the locale to Brazil, Canada, Portugal, or the United States.