A cleanse asset is composed of instances and steps. An instance identifies the input fields on which you can perform cleanse and merge operations. A step is a discrete cleanse operation that you define for an input.
You define one or more cleanse steps for the input fields that an instance identifies. The input fields that you add to an instance and the steps that you define for the input fields depend on your business requirements and the content of your data. You can add one or more input fields to an instance.
Cleanse process flow
To cleanse your data, you configure and run assets in Data Quality and in Data Integration.
The following image shows the steps involved in the cleansing process:
The cleansing process includes the following steps:
1Analyze the content of the source data, so that you can identify the fields that require cleanup. Work with a developer or data steward to understand the data sets. You will create inputs that represent the source data columns in the cleanse asset instances.
During your analysis, perform the following steps:
aVerify your business requirements.
bVerify the content and structure of your data.
cDetermine the sequence of the cleanse operations to apply to the data.
2In Data Quality, configure a cleanse asset to translate your business requirements into one or more cleanse instances. The input fields within an instance are processed through the logic that you define in a particular instance.
To configure the asset, perform the following steps:
aAdd one or more instances for the input fields that need cleansing in the source data.
bConfigure one or more steps for the input fields that each instance identifies.
cOptionally, configure the merge operation for the input fields.
3In Data Integration, define a mapping that can run the cleansing operation:
aAdd the Cleanse asset to a Cleanse transformation.
bConnect the cleanse asset input and output fields to the upstream and downstream objects in the mapping.
4Run the mapping.
Note: The Cleanse transformation does not identify the instance on which each asset input originates. If you define multiple instances on the asset in Data Quality, make a record of the instances to which each input belongs. Use the record as a guide when you connect the asset inputs to the transformation input fields.
Example: Customer data cleanup
A customer data set might include multiple fields for customer contact data, including name, gender, and address fields. You can configure a single cleanse asset to perform cleanse operations on the different fields.
The customers data record in your organization might contain the following information:
Title
FirstName
MiddleName
Surname
Gender
Address
Country
Dr.
John
William
smith
Male
2101 massachusetts ave nw washington dc 20008-2811
usa
Mr.
Mr. Frances
folsom
cleveland
Man
18 broomfield ridge midleton co. cork p25 kn66
IRE
Miss
Miss. Abigail
powers
Fillmore
Female
shop 7 208 adelaide st brisbane city qld 4000
aus
You might want to perform the following operations on your record:
•Remove salutation data from the FirstName field.
•Change the character case to title case in the MiddleName and Surname fields.
•Replace Man with Male in the Gender field.
•Remove character spaces and change the character case to uppercase in the Address field and Country field.
Configure a cleanse asset with four instances, based on the types of cleanse operation that you want to perform on the data. Add one or more input fields to each instance and define the steps that you want to apply to the input fields in each case.
For example, to change the character case to title case in the MiddleName and Surname fields, create an instance in the asset to add two input fields. You can specify MiddleName and Surname as the input field names. Configure the step to perform the Convert Case operation on the fields that the instance identifies.
Likewise, you can configure an instance to change the character case of the address and country abbreviation data. Add a step to convert the case of the data to uppercase. Additionally, add a step to replace multiple character spaces with a single space. Provide suitable names for the inputs, such as Address and Country.
Note: The Country input does not contain any character spaces, and so the step logic to remove spaces does not alter the data in the associated input field.
Add the asset that you create to a Cleanse transformation in a mapping. When a mapping runs, the transformation applies the standardization operations that you define in the asset instances to the fields that you select.
Additionally, if you want to merge the cleansed data from the FirstName, MiddleName, and Surname fields into a single field, configure the asset to perform a merge operation on the input fields.
Create a merge record to add the fields that you want to merge. You can specify the merged field name as FullName.