Standardizing data values using a dictionary in advanced mode
To standardize data values, you can write rule logic that searches for instances of values that appear in a dictionary and replaces each value with a corresponding valid value from the dictionary.
For example, a bank might decide to standardize address terms in customer address data so that the data is consistent and uniform. The bank wants to use abbreviated street descriptors that the postal carrier might prefer, such as St for Street, Rd for Road, and Ave for Avenue.
Your data quality assets include a dictionary of address terms. You write a rule specification that standardizes any value in the input data that matches a value in the dictionary.
At a high level, you'll complete the following steps:
1Open a rule specification in advanced mode.
2Verify that the rule specification contains the inputs and outputs that you require.
If the rule specification does not contain an input and an output that you can use, create an input and output. Configure the input and output properties to represent the type of column that contains the business data.
3 Write an IF statement to define the condition in which the rule will standardize data.
4Write a THEN statement to define the action to take when the data satisfies the IF condition.
Include a replaceWithDictionary function in the THEN statement. You can optionally add parameters to the function.
5Add an ENDIF element.
Writing the IF statement
1In the rule editor, type the IF element of the IF-THEN-ELSE statement.
2Type an input name.
3Add an in value to associate the input name with the dictionary name.
4 Type the name of the dictionary that contains the address terms.
The rule editor can suggest available dictionaries based on the text that you type.
Writing the THEN statement
1In the rule editor, type the THEN element.
2Type an output name.
3Add an = operator value to associate the output with the function that you'll add.
4Define a function that returns the dictionary values that match input values.
To define the function, use the STANDARDIZE command with a replaceWithDictionary function, add the input name, and specify the dictionary.
Use the dictionary that you selected in the IF statement. The rule editor can suggest available dictionaries and function names based on the text that you type.
Writing the ENDIF statement
To complete the rule logic, type ENDIF.
Adding optional parameters to the THEN statement function
Optionally, add one or more of the following parameters to the replaceWithDictionary function:
- Specify the dictionary column that contains the preferred versions of the address terms. If you do not identify a valid column, the rule specification uses the first column in the dictionary.
- Enter the delimiter that the rule specification will use to recognize discrete values in the input field. Specify the delimiters within square brackets. If you do not specify a delimiter, the rule specification uses a space as the delimiter.
- Enter the scope for the search operation. You can set the following options:
▪ Anywhere. Replaces any instance of the value that occurs in the input field.
▪ Start. Replaces any instance of the value that is not preceded by another value in the input field.
▪ End. Replaces any instance of the value that is not followed by another value in the input field.
- Enter NULL to indicate that you are not replacing input values with a custom value.
- Enter TRUE to indicate that the function will perform a case-sensitive comparison between the dictionary data and the input data.
Rule logic example
The following example shows rule logic that represents the standardize operation:
IF Input in dictionary:MRS.Merge_Cleanse_Strategy.AddressDictionary THEN Output = STANDARDIZE("replaceWithDictionary",Input,dictionary:MRS.Merge_Cleanse_Strategy.AddressDictionary, "Column 1",[" "],"Anywhere","NULL", TRUE) ENDIF