A labeler asset derives information about the content and structure of data. You can configure a labeler asset to perform token labeling or character labeling. Token labeling analyzes one or more tokens, or delimited values, in an input field. Character labeling analyzes the individual characters in the input field.
A token labeling operation identifies the types of information in the input field. At run time, the asset writes a label for each type that it identifies to a corresponding output field. The label is a string of characters that indicates a type of information, such as a person name, or a date, or a post code.
A character labeling operation analyzes the character structure of the data in the input field, including punctuation and spaces. At run time, the asset writes a label for each character in the input data field that matches the labeling criteria. In character labeling, a label is a single character. The output data contains a label for each matching input character.
You add a labeler asset to a Labeler transformation in Data Integration. Run a mapping with a Labeler transformation to better understand the types of information in your data fields and to identify fields that do not contain the types of information that you expect.
You can configure a labeling operation to label values in the following ways:
Use a dictionary to label values
The labeling operation compares the values in the input string to the values in a dictionary that the labeler asset specifies. When the operation finds an input value that matches a dictionary value, it writes a label that you specify for the value to the output.
You can use dictionaries in token labeling and character labeling. In token labeling, you can also configure a labeling operation to assign labels to values that do not match any dictionary value.
Use a regular expression to label values
The labeling operation applies a regular expression to the input string and finds values that match the expression logic. When the operation finds an input value that matches the expression logic, it writes a label that you specify for the value to the output.
Use a regular expression to find values that match a character format or structure. You can use a predefined regular expression, or you can enter your own regular expression.
You can use regular expressions in token labeling.
Use a character set to label values
The labeling operation examines the characters in an input string and returns labels for the characters that match the character set.
You can use a predefined character set, or you can add a custom character set.
You can use character sets in character labeling. In character labeling, the label is a single character.
Each operation that you define in a labeler asset is called a step. In token labeling, you can combine steps that use dictionaries and steps that use regular expressions in a single asset. In character labeling, you can combine steps that use dictionaries and steps that use character sets in a single asset.