Mass Ingestion Guide > Create > Mass Transformations
  

Mass Transformations

Configure mass transformations to define the parameters to apply across all tables ingested from the source database.
To set parameters for specific tables, configure the transformation override.
The following image shows the Mass Transformations page of the mass ingestion specification:
This screenshot shows the Mass Transformations page of the mass ingestion specification. On the Mass Transformations page, you can configure the following parameters: Filter By, Drop Columns, Trim, Conver to Upper Case, Convert to Lower Case, Replace Columns, and Replace Criteria. In the top-right corner, you have the option Next to go to the next page, or the button X to discard the specification.
The following table describes the parameters that you can configure:
Property
Description
Filter By
Filters rows in the target table based on criteria for a column in the table. Enter a filter clause to determine the criteria, such as STATE='California'. You can use any transformation language functions in the filter clause. The filter clause must evaluate to TRUE or FALSE.
To set criteria for multiple columns, use the operators AND and OR. For example, use the operator AND to specify a filter clause such as STATE='California' AND STATUS='Single'.
Drop Columns
Drops columns in the target table. Enter each column as a string and separate column names with a comma, or specify a regular expression.
For example, if you enter COL1, the column COL1 will be dropped in the target table.
If you enter a regular expression such as .*SSN.*, columns that have SSN in the column name will be dropped.
Trim
Trims column values in the target table to remove spaces before and after the values. Enter each column as a string and separate column names with a comma, or specify a regular expression.
For example, if you enter COL1, values in the column COL1 will be trimmed in the target table.
If you enter a regular expression such as .*SSN.*, values in columns that have SSN in the column name will be trimmed.
Convert to Uppercase
Converts column values in the target table to uppercase. Enter each column as a string and separate column names with a comma, or specify a regular expression.
For example, if you enter COL1, values in the column COL1 will be converted to upper case in the target table.
If you enter a regular expression such as .*SSN.*, values in columns that have SSN in the column name will be converted to upper case.
Convert to Lowercase
Converts column values in the target table to lowercase. Enter each column as a string and separate column names with a comma, or specify a regular expression.
For example, if you enter COL1, values in the column COL1 will be converted to lower case in the target table.
If you enter a regular expression such as .*SSN.*, values in columns that have SSN in the column name will be converted to lower case.
Replace Columns
Replaces column values in the target table. Enter each column as a string and separate column names with a comma, or specify a regular expression.
For example, if you enter COL1, values in the column COL1 will be replaced in the target table.
If you enter a regular expression such as .*SSN.*, values in columns that have SSN in the column name will be replaced.
If you specify columns to replace, you must specify the replace criteria.
Replace Criteria
Required if you specify columns to replace. Determines how to replace column values in the target table. You can select Pattern or Entire String.
If you select Entire String, all values in the columns that you specify are replaced by the value that you configure.
If you select Pattern, enter the pattern to be replaced. Then enter the value to replace the pattern.
Pattern
Required if you configure the replace criteria to be a pattern. Determines the pattern to replace. The pattern must be a regular expression. For example, if you want to replace the values in columns that contain Social Security numbers, you can enter the pattern:
^\d(3)-?\d(2)-?\d(4)$
The pattern replaces all values that correspond to the pattern. In the column that contains Social Security numbers, the pattern replaces the entire Social Security number.
To mask only the first five digits of the Social Security number, you can enter the pattern:
^\d(3)-?\d(2)$
Value
Required if you configure columns to replace. Replaces the pattern or the entire string according to the configured criteria.
If you select Entire String for the replace criteria, the value that you enter replaces the values in all of the columns that you specify to replace. For example, if you replace COL1 and COL2 and enter the value XXX, all values in columns COL1 and COL2 are replaced with the value XXX.
If you select Pattern for the replace criteria, the value that you enter replaces all values that correspond to the pattern.
For example, you want to mask the entire Social Security number according to the pattern:
^\d(3)-?\d(2)-?\d(4)$
Enter the value XXX-XX-XXXX. All values in the SSN column will appear as XXX-XX-XXXX.
You might also want to mask only the first five digits of the Social Security number according to the pattern
^\d(3)-?\d(2)$
Enter the value XXX-XX. The first five digits of every Social Security number will appear as XXX-XX. For example, if the original Social Security number is 123-45-6789, the replaced value is XXX-XX-6789.
If you configure parameters for a table column that does not exist in all of the tables, the tables where the column does not exist will fail to be ingested. You must reconfigure the parameters for each table where the column does not exist when you configure the transformation override.

Filter Clauses

The filter clause determines how rows are filtered in the target table based on criteria for a column in the table.
To design a filter clause, you can specify the column names in the target table and any transformation language functions that use the column names as arguments. To set multiple criteria, use the operators AND and OR. The filter clause must evaluate to TRUE or FALSE.
For example, you ingest the following source table:
EMPLOYEEID
PHONENUMBER
607014
(630)4468851
620368
(904)3854084
698107
(549)5694371
621861
(904)9062721
You want to filter the rows that you ingest to the target. You can use the following filter clause to filter the rows by area code (904):
RTRIM(PHONENUMBER, REG_EXTRACT(PHONENUMBER, '.*([0-9]{7})$')) = '(904)'
The rows in the table are filtered by area code (904). The following table is ingested to the target:
EMPLOYEEID
PHONENUMBER
620368
(904)3854084
621861
(904)9062721
For more information on transformation language functions, see the "Functions" chapter in the Informatica Transformation Language Reference.

Regular Expressions

A regular expression describes a range or pattern of values.
You can use a regular expression specify the columns that you want to parameterize in a mass ingestion specification. Use a regular expression when the columns in different source tables have varying names but contain the same information. If you choose to replace columns, you also use a regular expression to specify the pattern in the replace criteria.
For example, you might want to drop the columns that contain Social Security numbers. All of the column names contain SSN, but the column names have different prefixes depending on the source table where a column appears. To specify all variations in the column names, you can use a regular expression such as .*SSN.
The following table describes the metacharacters that you can use in a regular expression:
Metacharacter
Description
.
Matches any single character.
[ ]
Indicates a character class. Matches any character inside the brackets. For example, [abc] matches “a,” “b,” and “c.”
^
If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets. For example, [^abc] matches all characters except “a,” “b,” and “c.”
If this metacharacter occurs at the beginning of the regular expression, it matches the beginning of the input. For example, ^[abc] matches the input that begins with “a,” “b,” or “c.”
-
Indicates a range of characters in a character class. For example, [0-9] matches any of the digits “0” through “9.”
?
Indicates that the preceding expression to this metacharacter is optional. It matches the preceding expression zero or one time. For example, [0-9][0-9]? matches “2” and “12.”
+
Indicates that the preceding expression matches one or more times. For example, [0-9]+ matches “1,” “13,” “666,” and similar combinations.
*
Indicates that the preceding expression matches zero or more times. For example, the input <abc*> matches <abc>, <abc123>, and similar combinations that contains <abc> as the preceding expression.
??, +?, *?
Modified versions of ?, +, and *. These match as little as possible, unlike the versions that match as much as possible. For example, the input “<abc><def>,” <.*?> matches “<abc>” and the input <.*> matches “<abc><def>.”
( )
Grouping operator. For example, (\d+,)*\d+ matches a list of numbers separated by commas such as “1” or “1,23,456.”
{ }
Indicates a match group.
\
An escape character, which interprets the next metacharacter literally. For example, [0-9]+ matches one or more digits, but [0-9]\+ matches a digit followed by a plus character. Also used for abbreviations such as \a for any alphanumeric character.
If \ is followed by a number n, it matches the nth match group, starting from 0. For example, <{.*?}>.*?</\0> matches “<head>Contents</head>”.
In C++ string literals, two backslashes must be used: “\\+,” “\\a,” “<{.*?}>.*?</\\0>.”
$
At the end of a regular expression, this character matches the end of the input. For example, [0-9]$ matches a digit at the end of the input.
|
Alternation operator that separates two expressions, one of which matches. For example, T|the matches “The” or “the.”
!
Negation operator. The expression following ! does not match the input. For example, a!b matches “a” not followed by “b.”
The following table describes the abbreviations that you can use in the regular expressions:
Abbreviation
Definition
\a
Any alphanumeric character, ([a-zA-Z0-9]).
\b
White space (blank), ([ \\t]).
\c
Any alphabetic character, ([a-zA-Z]).
\d
Any decimal digit, ([0-9]).
\h
Any hexadecimal digit, ([0-9a-fA-F]).
\n
Newline, (\r|(\r?\n)).
\q
Quoted string, (\”[^\”]*\”)|(\’[^\’]*\’).
\w
Simple word, ([a-zA-Z]+).
\z
Integer, ([0-9+]).

Replace Criteria

The replace criteria determines how to replace column values in the target table.
Configure the replace criteria to replace values in columns according to the entire string or a pattern.

Replacing an Entire String

If you configure the replace criteria to replace an entire string, all values in the columns that you specify are replaced according to the new value that you configure. For example, you configure the criteria to replace values in columns COL1 and COL2 and configure the new value to be XXX. All values in columns COL1 and COL2 are replaced with the value XXX.
The following image shows the options to configure the replace criteria based on an entire string:
This image shows the Mass Transformations page when you create a mass ingestion specification. A column is specified for the property Replace Columns. The property Replace Criteria appears below Replace Columns. In the property Replace Columns, Entire String is selected. Next to Entire String, you have the option to specify a value.

Replacing a Pattern

If you configure the replace criteria to replace a pattern, you enter a pattern and a new value to replace values that match the pattern. The pattern that you enter must be a regular expression.
For example, you want to mask all values in a column that contains IP addresses. To match an IP address, you can enter the following pattern:
^(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])$
All values that match the pattern are replaced according to the new value that you configure. If you configure the new value XXX, all values that match the pattern are replaced with XXX.
The following image shows the options to configure the replace criteria based on a pattern:
This image shows the Mass Transformations page when you create a mass ingestion specification. A column is specified for the property Replace Columns. The property Replace Criteria appears below Replace Columns. In the property Replace Columns, Pattern is selected. Next to Pattern, you have the option to specify the previous string or pattern and a value.