Configuration properties for masking techniques

You can define how a specific masking technique works by configuring different properties for a masking technique. The Data Masking transformation masks data based on the masking technique that you select and the specific configuration that you set.

The configuration properties that appear depend on the masking technique and the data type. For example, you cannot blur string data. You cannot select a seed value when you use the Random masking technique.

Repeatable output

Repeatable output returns deterministic values. For example, you configure repeatable output for a column of first names. The Data Masking transformation returns the same masked value every time the same name appears in the workflow.

You can configure repeatable masking when you use the Random masking technique, Substitution masking technique, or the special mask formats for string data type. Select Repeatable and enter the seed value to configure repeatable masking.

You cannot configure repeatable output for the Key masking technique.

Optimize dictionary usage

The Optimize Dictionary Output option increases the use of dictionary values for masking and reduces duplicate dictionary values in the target.

If you perform substitution masking or custom substitution masking, you can choose to optimize the dictionary usage. The workflow uses some values from the selected dictionary to mask source data. These dictionary values might be used for multiple entries so that all source data is masked in the target. The chances of using duplicate dictionary values reduces if you optimize dictionary usage. To optimize dictionary output, you must configure the masking rule for repeatable output.

Seed

The Data Masking transformation creates a default seed value that is a random number from 1 through 999. You can enter a different seed value. Apply the same seed value to a column to return the same masked data values in different source data. For example, if you have the same Cust_ID column in four tables, and you want all of them to output the same masked values. You can set all four columns to the same seed value.

You can enter the seed value as a parameter. Seed value parameter names must begin with $$. You can include an underscore (_) in the name but you cannot include other special characters. Add the required parameter and value to the parameter file and specify the parameter file name at run time.

Note: If you enter the seed value as a parameter, you must run the mapping in a mapping task. If you run a mapping that includes a seed value parameter, the mapping uses an incorrect value because it cannot read the parameter value.

Unique substitution

Unique substitution masking ensures that each unique source value uses a unique dictionary value.

To mask a source value with a unique dictionary value, you can configure unique substitution masking. If a source value is masked with a specific dictionary value, then no other source value is masked with this dictionary value.

For example, the Name column in the source data contains multiple entries of John. If you configure repeatable masking, every entry of John takes the same dictionary value, such as Xyza. However, other source values might also be masked with the same dictionary value. A source entry Jack can also use the dictionary value Xyza. As a result, all entries of John and Jack use the same dictionary value. When you configure unique substitution masking, if all source values of John use the Xyza dictionary value, then no other source value uses the same dictionary value.

Unique substitution masking requires a storage connection for the storage tables. Storage tables contain the source to dictionary value mapping information required for unique substitution masking.

Note: If the source data contains more unique values than the dictionary, the masking fails because there are not enough unique dictionary values to mask all the source data.

Mask format

When you configure key or random masking for string data type, configure a mask format to limit each character in the output column to an alphabetic, numeric, or alphanumeric character.

If you do not define a mask format, the Data Masking transformation replaces each source character with any character. If the mask format is longer than the input string, the Data Masking transformation ignores the extra characters in the mask format. If the mask format is shorter than the source string, the Data Masking transformation does not mask the characters at the end of the source string.

When you configure a mask format, configure the source filter characters or target filter characters that you want to use the mask format with.

The mask format contains uppercase characters. When you enter a lowercase mask character, the Data Masking transformation converts the character to uppercase.

The following table describes mask format characters:

Character	Description
A	Alphabetical characters. For example, ASCII characters a to z and A to Z.
D	Digits. From 0 through 9.
N	Alphanumeric characters. For example, ASCII characters a to z, A to Z, and 0-9.
X	Any character. For example, alphanumeric or symbol.
+	No masking.
R	Remaining characters. R specifies that the remaining characters in the string can be any character type. R must appear as the last character of the mask.

For example, a department name has the following format:

nnn-<department_name>

You can configure a mask to force the first three characters to be numeric, the department name to be alphabetic, and the dash to remain in the output. Configure the following mask format:

DDD+AAAAAAAAAAAAAAAA

The Data Masking transformation replaces the first three characters with numeric characters. It does not replace the fourth character. The Data Masking transformation replaces the remaining characters with alphabetic characters.

Source filter characters

When you configure key or random masking for string data type, configure source filter characters to choose the characters that you want to mask.

When you set a character as a source filter character, the character is masked every time it occurs in the source data. The position of the characters in the source string does not matter, and you can configure any number of characters. If you do not configure source filter characters, the masking replaces all the source characters in the column.

The source filter characters are case-sensitive. The Data Masking transformation does not always return unique data if the number of source string characters is fewer than the number of result string characters.

Target filter characters

When you configure key or random masking for string data type, configure target filter characters to limit the characters that appear in a target column.

The Data Masking transformation replaces characters in the target with the target filter characters. For example, enter the following characters to configure each mask to contain all uppercase alphabetic characters: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

To avoid generating the same output for different input values, configure a wide range of substitute characters or mask only a few source characters. The position of each character in the string does not matter.

Range

Define a range for numeric or datetime data. When you define a range for numeric or date values, the Data Masking transformation masks the source data with a value between the minimum and maximum values.

Numeric Range

Set the minimum and maximum values for a numeric column. The maximum value must be less than or equal to the field precision. The default range is from one to the field precision length.

Date Range

Set minimum and maximum values for a datetime value. The minimum and maximum fields contain the default minimum and maximum dates. The default datetime format is MM/DD/YYYY HH24:MI:SS. The maximum datetime must be later than the minimum datetime.

Blurring

Blurring creates an output value within a fixed or percent variance from the source data value. Configure blurring to return a random value that is close to the original value. You can blur numeric and date values.

Select a fixed or percent variance to blur a numeric source value. The low bound value is a variance below the source value. The high bound value is a variance above the source value. The low and high values must be greater than or equal to zero. When the Data Masking transformation returns masked data, the numeric data is within the range that you define.

You can mask a date as a variance of the source date by configuring blurring. Select a unit of the date to apply the variance to. You can select the year, month, day, hour, minute, or second. Enter the low and high bounds to define a variance above and below the unit in the source date. The Data Masking transformation applies the variance and returns a date that is within the variance.

For example, to restrict the masked date to a date within two years of the source date, select year as the unit. Enter two as the low and high bound. If a source date is February 2, 2006, the Data Masking transformation returns a date between February 2, 2004, and February 2, 2008.

Connections in a Data Masking transformation

To use a custom dictionary connection or storage connection in a masking technique, you must add the connection to the Data Masking transformation.

You can use custom flat file or relational dictionaries. Unique substitution masking techniques also require a storage connection for source- to dictionary-value mapping.

Add the connections on the Masking Rules tab of the data masking transformation. When you configure a masking technique for a column, you can use the dictionaries and storage connection that you specify on the Masking Rules tab.

The following connection fields appear on the Masking Rules tab:

If you export a mapping created before the April 2022 release, the Data Masking transformation in the mapping might not include the dictionary and storage connection information. When you import the mapping, the fields appear blank. To avoid this issue when you import the mapping into an environment with the April 2022 release or later, open and save the mapping before you export the mapping. The exported mapping displays the dictionary and storage connection information when imported. The connections also appear on the Uses tab of the Show Dependencies page.