Developer Transformation Guide > Data Masking Transformation > Masking Techniques
  

Masking Techniques

The masking technique is the type of data masking to apply to the selected column.
You can select one of the following masking techniques for an input column:
Random
Produces random, non-repeatable results for the same source data and masking rules. You can mask date, numeric, and string datatypes. Random masking does not require a seed value. The results of random masking are non-deterministic.
Expression
Applies an expression to a source column to create or mask data. You can mask all datatypes.
Key
Replaces source data with repeatable values. The Data Masking transformation produces deterministic results for the same source data, masking rules, and seed value. You can mask date, numeric, and string datatypes.
Substitution
Replaces a column of data with similar but unrelated data from a dictionary. You can mask the string datatype.
Dependent
Replaces the values of one source column based on the values of another source column. You can mask the string datatype.
Tokenization
Replaces source data with data generated based on customized masking criteria. The Data Masking transformation applies rules specified in a customized algorithm. You can mask the string datatype.
Special Mask Formats
Credit card number, email address, IP address, phone number, SSN, SIN, or URL. The Data Masking transformation applies built-in rules to intelligently mask these common types of sensitive data.
No Masking
The Data Masking transformation does not change the source data.
Default is No Masking.

Random Masking

Random masking generates random nondeterministic masked data. The Data Masking transformation returns different values when the same source value occurs in different rows. You can define masking rules that affect the format of data that the Data Masking transformation returns. Mask numeric, string, and date values with random masking.

Masking String Values

Configure random masking to generate random output for string columns. To configure limitations for each character in the output string, configure a mask format. Configure filter characters to define which source characters to mask and the characters to mask them with.
You can apply the following masking rules for a string port:
Range
Configure the minimum and maximum string length. The Data Masking transformation returns a string of random characters between the minimum and maximum string length.
Mask Format
Define the type of character to substitute for each character in the input data. You can limit each character to an alphabetic, numeric, or alphanumeric character type.
Source String Characters
Define the characters in the source string that you want to mask. For example, mask the number sign (#) character whenever it occurs in the input data. The Data Masking transformation masks all the input characters when Source String Characters is blank.
Result String Replacement Characters
Substitute the characters in the target string with the characters you define in Result String Characters. For example, enter the following characters to configure each mask to contain uppercase alphabetic characters A - Z:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Masking Numeric Values

When you mask numeric data, you can configure a range of output values for a column. The Data Masking transformation returns a value between the minimum and maximum values of the range depending on port precision. To define the range, configure the minimum and maximum ranges or configure a blurring range based on a variance from the original source value.
You can configure the following masking parameters for numeric data:
Range
Define a range of output values. The Data Masking transformation returns numeric data between the minimum and maximum values.
Blurring Range
Define a range of output values that are within a fixed variance or a percentage variance of the source data. The Data Masking transformation returns numeric data that is close to the value of the source data. You can configure a range and a blurring range.

Masking Date Values

To mask date values with random masking, either configure a range of output dates or choose a variance. When you configure a variance, choose a part of the date to blur. Choose the year, month, day, hour, minute, or second. The Data Masking transformation returns a date that is within the range you configure.
You can configure the following masking rules when you mask a datetime value:
Range
Sets the minimum and maximum values to return for the selected datetime value.
Blurring
Masks a date based on a variance that you apply to a unit of the date. The Data Masking transformation returns a date that is within the variance. You can blur the year, month, day, hour, minute, or second. Choose a low and high variance to apply.

Expression Masking

Expression masking applies an expression to a port to change the data or create new data. When you configure expression masking, create an expression in the Expression Editor. Select input and output ports, functions, variables, and operators to build expressions.
You can concatenate data from multiple ports to create a value for another port. For example, you need to create a login name. The source has first name and last name columns. Mask the first and last name from lookup files. In the Data Masking transformation, create another port called Login. For the Login port, configure an expression to concatenate the first letter of the first name with the last name:
SUBSTR(FIRSTNM,1,1)||LASTNM
Select functions, ports, variables, and operators from the point-and-click interface to minimize errors when you build expressions.
The Expression Editor displays the output ports which are not configured for expression masking. You cannot use the output from an expression as input to another expression. If you manually add the output port name to the expression, you might get unexpected results.
When you create an expression, verify that the expression returns a value that matches the port datatype. The Data Masking transformation returns zero if the data type of the expression port is numeric and the data type of the expression is not the same. The Data Masking transformation returns null values if the data type of the expression port is a string and the data type of the expression is not the same.

Repeatable Expression Masking

Configure repeatable expression masking when a source column occurs in more than one table and you need to mask the column from each table with the same value.
When you configure repeatable expression masking, the Data Masking transformation saves the results of an expression in a storage table. If the column occurs in another source table, the Data Masking transformation returns the masked value from the storage table instead of from the expression.

Dictionary Name

When you configure repeatable expression masking you must enter a dictionary name. The dictionary name is a key that allows multiple Data Masking transformations to generate the same masked values from the same source values. Define the same dictionary name in each Data Masking transformation. The dictionary name can be any text.

Storage Table

The storage table contains the results of repeatable expression masking between sessions. A storage table row contains the source column and a masked value pair. The storage table for expression masking is a separate table from the storage table for substitution masking.
Each time the Data Masking transformation masks a value with a repeatable expression, it searches the storage table by dictionary name, locale, column name and input value. If it finds a row in the storage table, it returns the masked value from the storage table. If the Data Masking transformation does not find a row, it generates a masked value from the expression for the column.
You need to encrypt storage tables for expression masking when you have unencrypted data in the storage and use the same dictionary name as key.

Encrypting Storage Tables for Expression Masking

You can use transformation language encoding functions to encrypt storage tables. You need to encrypt storage tables when you have enabled storage encryption.
    1. Create a mapping with the IDM_EXPRESSION_STORAGE storage table as source.
    2. Create a Data Masking transformation.
    3. Apply the expression masking technique on the masked value ports.
    4. Use the following expression on the MASKEDVALUE port:
    Enc_Base64(AES_Encrypt(MASKEDVALUE, Key))
    5. Link the ports to the target.

Example

For example, an Employees table contains the following columns:
FirstName
LastName
LoginID
In the Data Masking transformation, mask LoginID with an expression that combines FirstName and LastName. Configure the expression mask to be repeatable. Enter a dictionary name as a key for repeatable masking.
The Computer_Users table contains a LoginID, but no FirstName or LastName columns:
Dept
LoginID
Password
To mask the LoginID in Computer_Users with the same LoginID as Employees, configure expression masking for the LoginID column. Enable repeatable masking and enter the same dictionary name that you defined for the LoginID Employees table. The Integration Service retrieves the LoginID values from the storage table.
Create a default expression to use when the Integration Service cannot find a row in the storage table for LoginID. The Computer_Users table does not have the FirstName or LastName columns, so the expression creates a less meaningful LoginID.

Storage Table Scripts

Informatica provides scripts that you can run to create the storage table. The scripts are in the following location:
<PowerCenter installation directory>\client\bin\Extensions\DataMasking
The directory contains a script for Sybase, Microsoft SQL Server, IBM DB2, and Oracle databases. Each script is named <Expression_<database type>.

Rules and Guidelines for Expression Masking

Use the following rules and guidelines for expression masking:

Key Masking

A column configured for key masking returns deterministic masked data each time the source value and seed value are the same. The Data Masking transformation returns unique values for the column.
When you configure a column for key masking, the Data Masking transformation creates a seed value for the column. You can change the seed value to produce repeatable data between different Data Masking transformations. For example, configure key masking to enforce referential integrity. Use the same seed value to mask a primary key in a table and the foreign key value in another table.
You can define masking rules that affect the format of data that the Data Masking transformation returns. Mask string and numeric values with key masking.

Masking String Values

You can configure key masking in order to generate repeatable output for strings. Configure a mask format to define limitations for each character in the output string. Configure source string characters that define which source characters to mask. Configure result string replacement characters to limit the masked data to certain characters.
You can configure the following masking rules for key masking strings:
Seed
Apply a seed value to generate deterministic masked data for a column. You can enter a number between 1 and 1,000.
Mask Format
Define the type of character to substitute for each character in the input data. You can limit each character to an alphabetic, numeric, or alphanumeric character type.
Source String Characters
Define the characters in the source string that you want to mask. For example, mask the number sign (#) character whenever it occurs in the input data. The Data Masking transformation masks all the input characters when Source String Characters is blank. The Data Masking transformation does not always return unique data if the number of source string characters is less than the number of result string characters.
Result String Characters
Substitute the characters in the target string with the characters you define in Result String Characters. For example, enter the following characters to configure each mask to contain all uppercase alphabetic characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Masking Numeric Values

Configure key masking for numeric source data to generate deterministic output. When you configure a column for numeric key masking, you assign a random seed value to the column. When the Data Masking transformation masks the source data, it applies a masking algorithm that requires the seed.
You can change the seed value for a column to produce repeatable results if the same source value occurs in a different column. For example, you want to maintain a primary-foreign key relationship between two tables. In each Data Masking transformation, enter the same seed value for the primary-key column as the seed value for the foreign-key column. The Data Masking transformation produces deterministic results for the same numeric values. The referential integrity is maintained between the tables.

Masking Datetime Values

When you can configure key masking for datetime values, the Data Masking transformation requires a random number as a seed. You can change the seed to match the seed value for another column in order to return repeatable datetime values between the columns.
The Data Masking transformation can mask dates between 1753 and 2400 with key masking. If the source year is in a leap year, the Data Masking transformation returns a year that is also a leap year. If the source month contains 31 days, the Data Masking transformation returns a month that has 31 days. If the source month is February, the Data Masking transformation returns February.
The Data Masking transformation always generates valid dates.

Substitution Masking

Substitution masking replaces a column of data with similar but unrelated data. Use substitution masking to replace production data with realistic test data. When you configure substitution masking, define the dictionary that contains the substitute values.
The Data Masking transformation performs a lookup on the dictionary that you configure. The Data Masking transformation replaces source data with data from the dictionary. Dictionary files can contain string data, datetime values, integers, and floating point numbers. Enter datetime values in the following format:
mm/dd/yyyy
You can substitute data with repeatable or non-repeatable values. When you choose repeatable values, the Data Masking transformation produces deterministic results for the same source data and seed value. You must configure a seed value to substitute data with deterministic results.The Integration Service maintains a storage table of source and masked values for repeatable masking.
You can substitute more than one column of data with masked values from the same dictionary row. Configure substitution masking for one input column. Configure dependent data masking for the other columns that receive masked data from the same dictionary row.

Dictionaries

A dictionary is a reference table that contains the substitute data and a serial number for each row in the table. Create a reference table for substitution masking from a flat file or relational table that you import into the Model repository.
The Data Masking transformation generates a number to retrieve a dictionary row by the serial number. The Data Masking transformation generates a hash key for repeatable substitution masking or a random number for non-repeatable masking. You can configure an additional lookup condition if you configure repeatable substitution masking.
You can configure a dictionary to mask more than one port in the Data Masking transformation.
When the Data Masking transformation retrieves substitution data from a dictionary, the transformation does not check if the substitute data value is the same as the original value. For example, the Data Masking transformation might substitute the name John with the same name (John) from a dictionary file.
The following example shows a dictionary table that contains first name and gender:
SNO
GENDER
FIRSTNAME
1
M
Adam
2
M
Adeel
3
M
Adil
4
F
Alice
5
F
Alison
In this dictionary, the first field in the row is the serial number, and the second field is gender. The Integration Service always looks up a dictionary record by serial number. You can add gender as a lookup condition if you configure repeatable masking. The Integration Service retrieves a row from the dictionary using a hash key, and it finds a row with a gender that matches the gender in the source data.
Use the following rules and guidelines when you create a reference table:
If you use a flat file table to create the reference table, use the following rules and guidelines:

Storage Tables

The Data Masking transformation maintains storage tables for repeatable substitution between sessions. A storage table row contains the source column and a masked value pair. Each time the Data Masking transformation masks a value with a repeatable substitute value, it searches the storage table by dictionary name, locale, column name, input value, and seed. If it finds a row, it returns the masked value from the storage table. If the Data Masking transformation does not find a row, it retrieves a row from the dictionary with a hash key.
The dictionary name format in the storage table is different for a flat file dictionary and a relational dictionary. A flat file dictionary name is identified by the file name. The relational dictionary name has the following syntax:
<Connection object>_<dictionary table name>
Informatica provides scripts that you can run to create a relational storage table. The scripts are in the following location:
<PowerCenter Client installation directory>\client\bin\Extensions\DataMasking
The directory contains a script for Sybase, Microsoft SQL Server, IBM DB2, and Oracle databases. Each script is named Substitution_<database type>. You can create a table in a different database if you configure the SQL statements and the primary key constraints.
You need to encrypt storage tables for substitution masking when you have unencrypted data in the storage and use the same seed value and dictionary to encrypt the same columns.

Encrypting Storage Tables for Substitution Masking

You can use transformation language encoding functions to encrypt storage tables. You need to encrypt storage tables when you have enabled storage encryption.
    1. Create a mapping with the IDM_SUBSTITUTION_STORAGE storage table as source.
    2. Create a Data Masking transformation.
    3. Apply the substitution masking technique on the input value and the masked value ports.
    4. Use the following expression on the INPUTVALUE port:
    Enc_Base64(AES_Encrypt(INPUTVALUE, Key))
    5. Use the following expression on the MASKEDVALUE port:
    Enc_Base64(AES_Encrypt(MASKEDVALUE, Key))
    6. Link the ports to the target.

Substitution Masking Properties

You can configure the following masking rules for substitution masking:

Rules and Guidelines for Substitution Masking

Use the following rules and guidelines for substitution masking:

Dependent Masking

Dependent masking substitutes multiple columns of source data with data from the same dictionary row.
When the Data Masking transformation performs substitution masking for multiple columns, the masked data might contain unrealistic combinations of fields. You can configure dependent masking in order to substitute data for multiple input columns from the same dictionary row. The masked data receives valid combinations such as, "New York, New York" or "Chicago, Illinois."
When you configure dependent masking, you first configure an input column for substitution masking. Configure other input columns to be dependent on that substitution column. For example, you choose the ZIP code column for substitution masking and choose city and state columns to be dependent on the ZIP code column. Dependent masking ensures that the substituted city and state values are valid for the substituted ZIP code value.
Note: You cannot configure a column for dependent masking without first configuring a column for substitution masking.
Configure the following masking rules when you configure a column for dependent masking:
Dependent column
The name of the input column that you configured for substitution masking. The Data Masking transformation retrieves substitute data from a dictionary using the masking rules for that column. The column you configure for substitution masking becomes the key column for retrieving masked data from the dictionary.
Output column
The name of the dictionary column that contains the value for the column you are configuring with dependent masking.

Dependent Masking Example

A data masking dictionary might contain address rows with the following values:
SNO
STREET
CITY
STATE
ZIP
COUNTRY
1
32 Apple Lane
Chicago
IL
61523
US
2
776 Ash Street
Dallas
TX
75240
US
3
2229 Big Square
Atleeville
TN
38057
US
4
6698 Cowboy Street
Houston
TX
77001
US
You need to mask source data with valid combinations of the city, state, and ZIP code from the Address dictionary.
Configure the ZIP port for substitution masking. Enter the following masking rules for the ZIP port:
Rule
Value
Dictionary Name
Address
Serial Number Column
SNO
Output Column
ZIP
Configure the City port for dependent masking. Enter the following masking rules for the City port:
Rule
Value
Dependent Column
ZIP
Output Column
City
Configure the State port for dependent masking. Enter the following masking rules for the State port:
Rule
Value
Dependent Column
ZIP
Output Column
State
When the Data Masking transformation masks the ZIP code, it returns the correct city and state for the ZIP code from the dictionary row.

Tokenization Masking

Use the tokenization masking technique to mask source string data based on criteria that you specify in an algorithm. For example, you can create an algorithm that contains a fake email address to replace field entries in the source data.
You can configure the format of the masked data using Tokenization masking. You must assign a tokenizer name to the masking algorithm before you can use it. The tokenizer name references the masking algorithm (JAR) used. Specify the tokenizer name when you apply the tokenization masking technique.

Configuring Tokenization Masking

Perform the following tasks before you use the tokenization masking technique:
    1. Browse to the tokenprovider directory in the path: <Informatica_home>\services\shared.
    2. Open the following XML file: com.informatica.products.ilm.tx-tokenizerprovider.xml.
    3. Add the tokenizer name and the fully qualified name of the class file for each tokenizer you want to use. Implement the tokenizer class within the com.informatica.products.ilm.tx-tokenprovider-<Build-Number>.jar class in the tokenprovider directory. For each tokenizer, enter the information in the XML file as in the following example:
    <TokenizerProvider>
    <Tokenizer Name="CCTokenizer"
    ClassName="com.informatica.tokenprovider.CCTokenizer"/>
    </TokenizerProvider>
    Where:
After configuration, you can use the Tokenization masking technique. Enter the tokenizer name to specify the algorithm to use when you create a mapping.