Parsing Data Overview
You parse data to identify one or more data elements in an input field and to write each element to a different output field.
Parsing allows you to have greater control over the information in each column. For example, consider a data field that contains a person's full name, Bob Smith. You can use the Parser transformation to split the full name into separate data columns for the first name and last name. After you parse the data into new columns, you can create custom data quality operations for each column.
You can configure the Parser transformation to use token sets to parse data columns into component strings. A token set identifies data elements such as words, ZIP codes, phone numbers, and Social Security numbers.
You can also use the Parser transformation to parse data that matches reference table entries or custom regular expressions that you enter.
Story
HypoStores wants the format of customer data files from the Los Angeles office to match the format of the data files from the Boston office. The customer data from the Los Angeles office stores the customer name in a FullName column, while the customer data from the Boston office stores the customer name in separate FirstName and LastName columns. HypoStores needs to parse the Los Angeles FullName column data into first names and last names so that the format of the Los Angeles data will match the format of the Boston data.
Objectives
In this lesson, you complete the following tasks:
- •Create and configure an LA_Customers_tgt data object to contain parsed data.
- •Create a mapping to parse the FullName column into separate FirstName and LastName columns.
- •Add the LA_Customers data object to the mapping to connect to the source data.
- •Add the LA_Customers_tgt data object to the mapping to create a target data object.
- •Add a Parser transformation to the mapping and configure it to use a token set to parse full names into first names and last names.
- •Run a profile on the Parser transformation to review the data before you generate the target data source.
- •Run the mapping to generate parsed names.
- •Run the Data Viewer to view the mapping output.
Prerequisites
Before you start this lesson, verify the following prerequisite:
- •You have completed lessons 1 and 2 in this tutorial.
Timing
Set aside 20 minutes to complete the tasks in this lesson.