Lookup and Standardization Rules
Data quality rules that can be used specifically to standardize input values based on reference table lookups.
Check_Profanity
General description
Parses inappropriate language and returns original text with profanity words removed.
Input ports
|
Text_Value |
Text field that will be checked for profanities. |
Output ports
|
Profanity_Value |
Returns the profanity value(s) found in the text. |
|
Masked_Data |
Returns the input text and exchanges profanity words with the value [CENSORED]. |
|
Cleansed_Text |
Returns the input text with the profanity words removed. |
|
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
|
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
|
profanity_infa |
Dictionary that contains a variety of profanity values that are being used during rule execution for look ups. |
|
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Check whether the "English Long Description" (Text_Value) contains any profanity values referenced by "profanity_infa".
The "English Long Description" of the item is:
"HDTV with a crappy 50 GB hard drive"
Example output
|
Profanity_Value |
crappy |
|
Masked_Data |
HDTV with a CENSORED 50 GB hard drive |
|
Cleansed_Text |
HDTV with a 50 GB hard drive |
|
Status_Code |
Failed |
|
Status_Message |
Profanity -crappy- was found in input data. |
Parse_Color
General description
Parses color names as determined by a reference table. The rule will return 2 color values – for each, as found in the data as well as a cleansed / cased version.
Input ports
|
Text_Value |
Text field that will be parsed for color values. |
Output ports
|
Standard_Text |
Returns the text field with standardized color values. |
|
Color_ID1 |
Returns the name of the first color value found. |
|
Color_ID2 |
Returns the name of the second color value found. |
|
Color_Standardized1 |
Returns the standardized color value found for the first color. |
|
Color_Standardized2 |
Returns the standardized color value found for the second color. |
|
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
|
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
|
colors_infa |
Dictionary that contains the color values in a standardized format. |
|
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Parse out two color values from the "English long description" (Text_Value) and exchange them with the standardized format of the same color (Standard_Text).
The "English Long Description" of the item is:
"This nice HDTV has a ferrari red frame and a cool black screen color."
Example output
|
Standard_Text |
This nice HDTV has a Ferrari Red frame and a Cool black screen color. |
|
Color_ID1 |
ferrari red |
|
Color_ID2 |
cool black |
|
Color_Standardized1 |
Ferrari Red |
|
Color_Standardized2 |
Cool black |
|
Status_Code |
OK |
|
Status_Message |
No Error |
Standardize_Color
General description
Returns a base color value (Base_Color) for an input color value (Text_Value).
Input ports
|
Text_Value |
Color value that will be standardized. |
Output ports
|
Base_Color |
Returns the standardized color value. |
|
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
|
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
|
colors_base_infa |
Reference table that maps different color values to a standardized color value (e.g. "Deep sky blue" to "Blue") |
|
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Take the color value provided by the manufacturer of the item and derive a base color value out of it that can be used for web shop search classification of that particular item.
The attribute "Color" provide by the supplier has the value "Midnight black".
Example output
|
Base_Color |
Black |
|
Status_Code |
OK |
|
Status_Message |
No Error |
Standardize_CompanyName
General description
Standardizes a company name and additionally provides its acronym if possible.
Input ports
|
CompanyName |
Company name that will be standardized. |
Output ports
|
Standardized_CompanyName |
Returns the standardized company name. |
|
Acronym_ComapnyName |
Returns the acronym for the standardized company name. |
|
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
|
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
|
usa_company_acronyms_infa |
Reference table that maps company names to their corresponding acronyms (e.g. "Hewlett-Packard Co" to "HP") |
|
usa_company_names_std_infa |
Reference table that maps company names to their standardized spelling format (e.g. "Abercrombie and Fitch" to "Abercrombie & Fitch Co") |
|
usa_company_sufx_abrv_infa |
Reference table that contains set of American suffixes for company names and their abbreviations (e.g. "Co.") |
|
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
Take the manufacturer name "Hewlett Packard" (CompanyName) attached to an item and standardize it to the defined format as determined by the reference table.
Example output
|
Standardized_CompanyName |
Hewlett-Packard |
|
Acronym_CompanyName |
HP |
|
Status_Code |
OK |
|
Status_Message |
No Error |
Standardize_UOM
General description
Separates the quantity and unit of measure, and outputs the unstandardized and standardized values. It also outputs the full string with the Unit of Measure standardized.
Input ports
|
Text_Value |
Text field to be checked for unit of measures. |
Output ports
|
Standardized_Unit |
Returns the standardized unit of measure found in the text. |
|
Parsed_Unit |
Returns the unit of measure found in the text. |
|
Standardized_Text |
Returns the text field with standardized UOM values. |
|
Additional_Parsed_Values |
Returns any additional parsed values found in the text. |
|
Unparsed_Field |
Returns the part of the text that hasn't been parsed by the rule. |
|
Out_Status_Code |
Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status) |
|
Out_Status_Message |
Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message) |
Meta data and reference tables
|
uom_infa |
Reference table that contains the unit values in a standardized format. |
|
Error_messages_by_Language |
Contains a 4 digit error code plus a language to indicate the preferred error message to be output. |
Example usage
The "English Long Description" of the item that will be parsed by that rule is:
"This nice HDTV weighs 10 kilogram and can be delivered in 24 hours."
Example output
|
Standardized_Unit |
10 kg |
|
Parsed_Unit |
10 kilogram |
|
Standardized_Text |
This nice HDTV weighs 10 kg and can be delivered in 24 hrs. |
|
Additional_Parsed_Values |
24 hours |
|
Unparsed_Field |
This nice HDTV weighs and can be delivered hours. |
|
Status_Code |
OK |
|
Status_Message |
No Error |