The labeler asset assigns a descriptive label to values in an input string.
The following examples describe some of the types of analysis you can perform with a Labeler asset.
Verify business information with dictionaries
A data set might contain a field of values that correspond to a finite set of known values, such as a set of stock-keeping unit (SKU) numbers in your organization. You can use a dictionary of the values to verify that the field contains the data that you expect.
Create a token labeling step that reads a dictionary, and add a dictionary that contains the SKU values to the step. Next, specify a label name for the step. For example, you might specify SKU as the label.
Add the asset that you create to a Labeler transformation in a mapping. When a mapping runs, the transformation compares the input field values to the values in the dictionary that the steps specifies. The mapping writes the text label for each SKU value that it finds to an output field.
You can also configure the step to label any value that does not match a dictionary value. In this case, the mapping writes a text label to the output field when an input value does not match a value in the dictionary. You might specify a different label in this case, such as INCORRECT. To find the incorrect values, select the Exclusive option in the step.
Identify data by character format
A customer data set might include a column for contact data. You expect the column to contain email addresses, but users might enter other values, such as phone number, country name, or postal code values into the column. You can use a regular expression to verify the fields that contain email addresses.
For example, you can configure the asset to label the following string as an email address:
info@informatica.com
Create a token labeling step for regular expressions, and add an expression that represents the email data format. You can enter a regular expression that describes the format that you want to find. Or, select a regular expression from the list of built-in expressions in the asset. You can specify EMAIL as the label name that the step applies to values that match the expression format.
At run time, the Labeler transformation applies the regular expression logic to the values in the input field. When the transformation finds a value with a format that matches the expression logic, it writes the label that you provided to an output field. The output fields will contain the label EMAIL for well-formatted email addresses and will contain any non-email addresses in their original form.
Review the structure of your input data
An organization might store the telephone number of employees in the following patterns: (212)555-1212, 2125551212, and +212-555-1212. You can use a character set to verify the telephone number structures.
Create a step in character labeling mode for each telephone number structure that you support. Add a custom character set or select a built-in character set in the asset. Configure the asset to label any input character that matches the content of a character set. You might specify the following label names for your telephone data: P for punctuation characters, D for digits, and S for spaces.
When the transformation finds a character that matches a member of the character set that you define, it writes the label that you provided for the characters in the output. For example, the labeling operation reads the telephone number (212)555-1212 and returns the label PDDDPDDDPDDD
Selecting the right labeling mode data for your data
Token labeling and character labeling can perform equally well in identifying correct and incorrect data values. The labeling mode that you choose can depend on the types of error that you expect to find in your data. Character labeling can be more useful when a user adds valid data to the wrong field. Token labeling can be more useful when the accuracy of the field data is paramount and you want to find inaccurate data.
Consider the following cases:
•Your organization maintains an address data set in which users can enter valid data values to the wrong fields. For example, the users may enter street name information to a field for city names.
You configure a character labeling operation that applies a dictionary of street terminology to the city name field. At run time, the operation returns a label for street terms such as STREET, ROAD, and AVENUE.
Because character labeling returns the labeled and unlabeled characters in the same field, you can determine whether the values are incorrect or simply entered in the wrong field.
•Your organization maintains a list of batch codes for ingredients in a product recipe.
You configure a token labeling operation to apply a dictionary of the code values to the code data field. At run time, the operation returns a label for each correct value in the field.
You might alternatively configure the token labeling operation to return a label for each incorrect value in the field.