Lookup and Standardization Rules

Data quality rules that can be used specifically to standardize input values based on reference table lookups.

Check_Profanity

General description

Parses inappropriate language and returns original text with profanity words removed.

Input ports

Text_Value

Text field that will be checked for profanities.

Output ports

Profanity_Value

Returns the profanity value(s) found in the text.

Masked_Data

Returns the input text and exchanges profanity words with the value [CENSORED].

Cleansed_Text

Returns the input text with the profanity words removed.

Out_Status_Code

Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status)

Out_Status_Message

Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message)

Meta data and reference tables

profanity_infa

Dictionary that contains a variety of profanity values that are being used during rule execution for look ups.

Error_messages_by_Language

Contains a 4 digit error code plus a language to indicate the preferred error message to be output.

Example usage

Check whether the "English Long Description" (Text_Value) contains any profanity values referenced by "profanity_infa".

The "English Long Description" of the item is:

"HDTV with a crappy 50 GB hard drive"

Example output

Profanity_Value

crappy

Masked_Data

HDTV with a CENSORED 50 GB hard drive

Cleansed_Text

HDTV with a 50 GB hard drive

Status_Code

Failed

Status_Message

Profanity -crappy- was found in input data.

Parse_Color

General description

Parses color names as determined by a reference table. The rule will return 2 color values – for each, as found in the data as well as a cleansed / cased version.

Input ports

Text_Value

Text field that will be parsed for color values.

Output ports

Standard_Text

Returns the text field with standardized color values.

Color_ID1

Returns the name of the first color value found.

Color_ID2

Returns the name of the second color value found.

Color_Standardized1

Returns the standardized color value found for the first color.

Color_Standardized2

Returns the standardized color value found for the second color.

Out_Status_Code

Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status)

Out_Status_Message

Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message)

Meta data and reference tables

colors_infa

Dictionary that contains the color values in a standardized format.

Error_messages_by_Language

Contains a 4 digit error code plus a language to indicate the preferred error message to be output.

Example usage

Parse out two color values from the "English long description" (Text_Value) and exchange them with the standardized format of the same color (Standard_Text).

The "English Long Description" of the item is:

"This nice HDTV has a ferrari red frame and a cool black screen color."

Example output

Standard_Text

This nice HDTV has a Ferrari Red frame and a Cool black screen color.

Color_ID1

ferrari red

Color_ID2

cool black

Color_Standardized1

Ferrari Red

Color_Standardized2

Cool black

Status_Code

OK

Status_Message

No Error

Standardize_Color

General description

Returns a base color value (Base_Color) for an input color value (Text_Value).

Input ports

Text_Value

Color value that will be standardized.

Output ports

Base_Color

Returns the standardized color value.

Out_Status_Code

Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status)

Out_Status_Message

Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message)

Meta data and reference tables

colors_base_infa

Reference table that maps different color values to a standardized color value (e.g. "Deep sky blue" to "Blue")

Error_messages_by_Language

Contains a 4 digit error code plus a language to indicate the preferred error message to be output.

Example usage

Take the color value provided by the manufacturer of the item and derive a base color value out of it that can be used for web shop search classification of that particular item.

The attribute "Color" provide by the supplier has the value "Midnight black".

Example output

Base_Color

Black

Status_Code

OK

Status_Message

No Error

Standardize_CompanyName

General description

Standardizes a company name and additionally provides its acronym if possible.

Input ports

CompanyName

Company name that will be standardized.

Output ports

Standardized_CompanyName

Returns the standardized company name.

Acronym_ComapnyName

Returns the acronym for the standardized company name.

Out_Status_Code

Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status)

Out_Status_Message

Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message)

Meta data and reference tables

usa_company_acronyms_infa

Reference table that maps company names to their corresponding acronyms (e.g. "Hewlett-Packard Co" to "HP")

usa_company_names_std_infa

Reference table that maps company names to their standardized spelling format (e.g. "Abercrombie and Fitch" to "Abercrombie & Fitch Co")

usa_company_sufx_abrv_infa

Reference table that contains set of American suffixes for company names and their abbreviations (e.g. "Co.")

Error_messages_by_Language

Contains a 4 digit error code plus a language to indicate the preferred error message to be output.

Example usage

Take the manufacturer name "Hewlett Packard" (CompanyName) attached to an item and standardize it to the defined format as determined by the reference table.

Example output

Standardized_CompanyName

Hewlett-Packard

Acronym_CompanyName

HP

Status_Code

OK

Status_Message

No Error

Standardize_UOM

General description

Separates the quantity and unit of measure, and outputs the unstandardized and standardized values. It also outputs the full string with the Unit of Measure standardized.

Input ports

Text_Value

Text field to be checked for unit of measures.

Output ports

Standardized_Unit

Returns the standardized unit of measure found in the text.

Parsed_Unit

Returns the unit of measure found in the text.

Standardized_Text

Returns the text field with standardized UOM values.

Additional_Parsed_Values

Returns any additional parsed values found in the text.

Unparsed_Field

Returns the part of the text that hasn't been parsed by the rule.

Out_Status_Code

Returns the overall Status Code after the rule execution (OK or Failed). (QualityStatusEntry.Status)

Out_Status_Message

Returns the overall Status Message after the rule execution. (QualityStatusEntry.Message)

Meta data and reference tables

uom_infa

Reference table that contains the unit values in a standardized format.

Error_messages_by_Language

Contains a 4 digit error code plus a language to indicate the preferred error message to be output.

Example usage

The "English Long Description" of the item that will be parsed by that rule is:

"This nice HDTV weighs 10 kilogram and can be delivered in 24 hours."

Example output

Standardized_Unit

10 kg

Parsed_Unit

10 kilogram

Standardized_Text

This nice HDTV weighs 10 kg and can be delivered in 24 hrs.

Additional_Parsed_Values

24 hours

Unparsed_Field

This nice HDTV weighs and can be delivered hours.

Status_Code

OK

Status_Message

No Error