Parse assets > Custom parsing operations > Parsing with regular expressions
  

Parsing with regular expressions

You can use a regular expression to find values that match a given character structure in an input field. Create a regular expression that matches the structure of the values that you want to find. Or, select a regular expression from the list of built-in expressions in the asset.
Use a regular expression in place of a dictionary when you cannot predict the content of every value or when the range of values that you will search for is too great to add to a dictionary.
At run time, the Parse transformation applies the regular expression logic to the values in the input field. When the transformation finds a value with a structure that matches the expression logic, the transformation writes the value to the output field that the step specifies.

Example: United States telephone numbers and Social Security numbers

A customer data set might include a column for telephone numbers. Over a period of time, many users incorrectly enter Social Security numbers into the column. You can configure a parse asset to find values that match both formats.
The following table displays the types of errors that can appear in the column:
Value
Format
212-555-1234
Telephone number
910-22-5555
Social Security number
(518)555-8466
Telephone number
(718) 555-2907
Telephone number
2125550987
Telephone number
922-823-5746
Social Security number
974-43-0202
Social Security number
212-555-3287
Telephone number
Create a step for each data format, and add a regular expression to each step.
For example, the parse asset contains the following built-in regular expression for United States telephone numbers:
1?[-. ]?\(?[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}?(?:EXT|ext|Ext|X|x|#|\.| |,)*[0-9]{3,5}|1?[-. ]?\(?[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}
The asset contains the following built-in regular expression for United States Social Security numbers:
(.*)([0-9]{3}[- ]?[0-9]{2}[- ]?[0-9]{4})(.*)
Add a single output to each step for telephone numbers and Social Security numbers respectively.