Pattern-Based Parsing Mode
In pattern-based parsing mode, the Parser transformation parses patterns made of multiple strings.
You can use the following methods to define patterns in pattern-based parsing mode:
- •Parse input data using patterns defined in reference tables. You can create a pattern reference table from the profiled output of a Labeler transformation that uses the token labeling mode.
- •Parse input data using patterns that you define.
- •Parse input data using patterns that you import from a reusable pattern set in the Model repository. Changes to the reusable pattern set do not update the data you add in the Parser transformation.
You can use the "+" and "*" wildcards to define a pattern. Use "*" characters to match any string, and "+" characters to match one or more instances of the preceding string. For example, use "WORD+" to find multiple consecutive instances of a word token, and use "WORD *" to find a word token followed by one or more tokens of any type.
You can use multiple instances of these methods within the Parser transformation. The transformation uses the instances in the order in which they are listed on the Configuration view.
Note: In pattern-based parsing mode, the Parser transformation requires the output of a Labeler transformation that uses token labeling mode. Create and configure the Labeler transformation before creating a Parser transformation that uses pattern-based parsing mode.
Pattern-Based Parsing Ports
Configure the pattern-based parsing ports with settings appropriate for your data.
A Parser transformation that uses the pattern-based parsing mode has the following port types:
- Label_Data
- Connect this port to the Labeled_Output port of a Labeler transformation that uses the token labeling mode.
- Tokenized_Data
- Connect this port to the Tokenized_Data output port of a Labeler transformation that uses the token labeling mode.
- Parse_Status
- If a match is found for the input pattern, this port outputs the value Matched. If no match is found, it outputs Unmatched.
- Overflow
- Successfully parsed strings that do not fit into the number of outputs defined in the transformation. For example, if only two "WORD" outputs are defined, the string "John James Smith" results in an overflow output of "Smith" by default.
- Parsed
- Successfully parsed strings in user-defined ports.