User Guide > Data Masking Techniques and Parameters > Substitution Masking
  

Substitution Masking

Substitution masking replaces a column of data with similar but unrelated data from a dictionary. Mask date, numeric, and string datatypes with substitution masking.
Use substitution masking to mask string data with realistic output. For example, if you want to mask address data, you specify a dictionary file that contains addresses.
Substitution is an effective way to replace production data with realistic test data. When you configure substitution masking, select the relational or flat file dictionary that contains the substitute values. The PowerCenter Integration Service performs a lookup on the dictionary and replaces source data with data from the dictionary. You can use relational dictionary to mask Hadoop data.
When you assign a substitution masking rule to a column, you can specify the rule assignment parameters.
The following table describes the rule assignment parameters that you can configure:
Parameter
Description
Lookup Condition
The column name in the source table you can refer to match with the column in the dictionary. This field is optional.
Unique Substitution Column
The column name in the source table to substitute with unique data. This field is optional.
You can substitute data with repeatable or non-repeatable values. When you choose repeatable values, the PowerCenter Integration Service produces deterministic results for the same source data and seed value. You must configure a seed value to substitute data with deterministic results. The PowerCenter Integration Service maintains a storage table of source and masked values for repeatable masking. You can specify the storage table you want to use when you generate a workflow.
You cannot use flat file dictionaries and unique substitution masking to mask Hadoop data.

Substitution Masking Parameters

You can substitute data with repeatable or non-repeatable values.
When you choose repeatable values, the PowerCenter Integration Service produces deterministic results for the same source data and seed value. You must configure a seed value to substitute data with deterministic results.
You can configure the following substitution masking parameters:
Parameter
Description
Repeatable Output
Returns deterministic results between sessions. The PowerCenter Integration Service saves masked values in the storage table.
Seed
A start number that the PowerCenter Integration Service uses to return deterministic data.
Unique Substitution Data
Replaces the target column with unique masked values for every unique source column value. If there are more values in the source than in the dictionary file, the data masking operation fails because the dictionary file does not contain sufficient unique values to substitute the data. For security, the default is nonunique substitution.
Dictionary Information
Required. Configuration of the flat file or relational table that contains the substitute data values. Configure the following parameters:
  • - Dictionary. Displays the flat file or relational table name that you select.
  • - Masked Value. The column returned to the masking rule.
  • - Lookup Column. The source data column to use in the lookup.
  • - Serial Number Column. The column in the dictionary that contains the serial number.