Standardizing Data Overview
Standardizing data improves data quality by removing errors and inconsistencies in the data.
To improve data quality, standardize data that contains the following types of values:
- •Incorrect values
- •Values with correct information in the wrong format
- •Values from which you want to derive new information
Use the Standardizer transformation to search for these values in data. You can choose one of the following search operation types:
- •Text. Search for custom strings that you enter. Remove these strings or replace them with custom text.
- •Reference table. Search for strings contained in a reference table that you select. Remove these strings, or replace them with reference table entries or custom text.
For example, you can configure the Standardizer transformation to standardize address data containing the custom strings Street and St. using the replacement string ST. The Standardizer transformation replaces the search terms with the term ST. and writes the result to a new data column.
Story
HypoStores needs to standardize its customer address data so that all addresses use terms consistently. The address data in the All_Customers data object contains inconsistently formatted entries for common terms such as Street, Boulevard, Avenue, Drive, and Park.
Objectives
In this lesson, you complete the following tasks:
- •Create and configure an All_Customers_Stdz_tgt data object to contain standardized data.
- •Create a mapping to standardize the address terms Street, Boulevard, Avenue, Drive, and Park to a consistent format.
- •Add the All_Customers data object to the mapping to connect to the source data.
- •Add the All_Customers_Stdz_tgt data object to the mapping to create a target data object.
- •Add a Standardizer transformation to the mapping and configure it to standardize the address terms.
- •Run the mapping to generate standardized address data.
- •Run the Data Viewer to view the mapping output.
Prerequisites
Before you start this lesson, verify the following prerequisite:
- •You have completed lessons 1 and 2 in this tutorial.
Timing
Set aside 15 minutes to complete this lesson.