Lookup and Standardization Rules
Data quality rules that can be used specifically to standardize input values based on reference table lookups.
Check_Profanity
General description
Parses inappropriate language and returns original text with profanity words removed.
Input ports
Text_Value |
Text field that will be checked for profanities. |
Output ports
Profanity_Value |
Returns the profanity value(s) found in the text. |
Masked_Data |
Returns the input text and exchanges profanity words with the value [CENSORED]. |
Cleansed_Text |
Returns the input text with the profanity words removed. |
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
profanity_infa |
Dictionary that contains a variety of profanity values that are being used during rule execution for look ups. |
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Check whether the "English Long Description" (Text_Value) contains any profanity values referenced by "profanity_infa".
The "English Long Description" of the item is:
"HDTV with a crappy 50 GB hard drive"
Example output
Profanity_Value |
crappy |
Masked_Data |
HDTV with a CENSORED 50 GB hard drive |
Cleansed_Text |
HDTV with a 50 GB hard drive |
Status_Code |
Failed |
Status_Message |
Profanity -crappy- was found in input data. |
Parse_Color
General description
Parses color names as determined by a reference table. The rule will return 2 color values – for each, as found in the data as well as a cleansed / cased version.
Input ports
Text_Value |
Text field that will be parsed for color values. |
Output ports
Standard_Text |
Returns the text field with standardized color values. |
Color_ID1 |
Returns the name of the first color value found. |
Color_ID2 |
Returns the name of the second color value found. |
Color_Standardized1 |
Returns the standardized color value found for the first color. |
Color_Standardized2 |
Returns the standardized color value found for the second color. |
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
colors_infa |
Dictionary that contains the color values in a standardized format. |
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Parse out two color values from the "English long description" (Text_Value) and exchange them with the standardized format of the same color (Standard_Text).
The "English Long Description" of the item is:
"This nice HDTV has a ferrari red frame and a cool black screen color."
Example output
Standard_Text |
This nice HDTV has a Ferrari Red frame and a Cool black screen color. |
Color_ID1 |
ferrari red |
Color_ID2 |
cool black |
Color_Standardized1 |
Ferrari Red |
Color_Standardized2 |
Cool black |
Status_Code |
OK |
Status_Message |
No Error |
Standardize_Color
General description
Returns a base color value (Base_Color) for an input color value (Text_Value).
Input ports
Text_Value |
Color value that will be standardized. |
Output ports
Base_Color |
Returns the standardized color value. |
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
colors_base_infa |
Reference table that maps different color values to a standardized color value (e.g. "Deep sky blue" to "Blue") |
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Take the color value provided by the manufacturer of the item and derive a base color value out of it that can be used for web shop search classification of that particular item.
The attribute "Color" provide by the supplier has the value "Midnight black".
Example output
Base_Color |
Black |
Status_Code |
OK |
Status_Message |
No Error |
Standardize_CompanyName
General description
Standardizes a company name and additionally provides its acronym if possible.
Input ports
CompanyName |
Company name that will be standardized. |
Output ports
Standardized_CompanyName |
Returns the standardized company name. |
Acronym_ComapnyName |
Returns the acronym for the standardized company name. |
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
usa_company_acronyms_infa |
Reference table that maps company names to their corresponding acronyms (e.g. "Hewlett-Packard Co" to "HP") |
usa_company_names_std_infa |
Reference table that maps company names to their standardized spelling format (e.g. "Abercrombie and Fitch" to "Abercrombie & Fitch Co") |
usa_company_sufx_abrv_infa |
Reference table that contains set of American suffixes for company names and their abbreviations (e.g. "Co.") |
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Take the manufacturer name "Hewlett Packard" (CompanyName) attached to an item and standardize it to the defined format as determined by the reference table.
Example output
Standardized_CompanyName |
Hewlett-Packard |
Acronym_CompanyName |
HP |
Status_Code |
OK |
Status_Message |
No Error |
Standardize_UOM
General description
Separates the quantity and unit of measure, and outputs the unstandardized and standardized values. It also outputs the full string with the Unit of Measure standardized.
Input ports
Text_Value |
Text field to be checked for unit of measures. |
Output ports
Standardized_Unit |
Returns the standardized unit of measure found in the text. |
Parsed_Unit |
Returns the unit of measure found in the text. |
Standardized_Text |
Returns the text field with standardized UOM values. |
Additional_Parsed_Values |
Returns any additional parsed values found in the text. |
Unparsed_Field |
Returns the part of the text that hasn't been parsed by the rule. |
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
uom_infa |
Reference table that contains the unit values in a standardized format. |
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
The "English Long Description" of the item that will be parsed by that rule is:
"This nice HDTV weighs 10 kilogram and can be delivered in 24 hours."
Example output
Standardized_Unit |
10 kg |
Parsed_Unit |
10 kilogram |
Standardized_Text |
This nice HDTV weighs 10 kg and can be delivered in 24 hrs. |
Additional_Parsed_Values |
24 hours |
Unparsed_Field |
This nice HDTV weighs and can be delivered hours. |
Status_Code |
OK |
Status_Message |
No Error |