Masking Rules
Masking rules are the options that you configure after you choose the masking technique.
When you choose the random or key masking technique, you can configure the mask format, the source string characters, and the result string characters. You can configure range or blurring with random masking.
The following table describes the masking rules that you can configure for each masking technique:
Masking Rule | Description | Masking Technique | Source Datatype |
---|
Mask Format | Mask that limits each character in an output string to analphabetic, numeric, or alphanumeric character. | Random and Key | String |
Source String Characters | Set of source characters to mask or to exclude from masking. | Random and Key | String |
Result String Replacement Characters | A set of characters to include or exclude in a mask. | Random and Key | String |
Range | A range of output values. | Random | Numeric String Date/Time |
Blurring | Range of output values with a fixed or percent variance from the source data. The Data Masking transformation returns data that is close to the value of the source data. Datetime columns require a fixed variance. columns require a fixed variance. | Random | Numeric Date/Time |
Mask Format
Configure a mask format to limit each character in the output column to an alphabetic, numeric, or alphanumeric character. Use the following characters to define a mask format:
A, D, N, X, +, R
The following table describes mask format characters:
Character | Description |
---|
A | Alphabetical characters. For example, ASCII characters a to z and A to Z. |
D | Digits 0 to 9. The data masking transformation returns an "X" for characters other than digits 0 to 9. |
N | Alphanumeric characters. For example, ASCII characters a to z, A to Z, and 0-9. |
X | Any character. For example, alphanumeric or symbol. |
+ | No masking. |
R | Remaining characters. R specifies that the remaining characters in the string can be any character type. R must appear as the last character of the mask. |
For example, a department name has the following format:
nnn-<department_name>
You can configure a mask to force the first three characters to be numeric, the department name to be alphabetic, and the dash to remain in the output. Configure the following mask format:
DDD+AAAAAAAAAAAAAAAA
The Data Masking transformation replaces the first three characters with numeric characters. It does not replace the fourth character. The Data Masking transformation replaces the remaining characters with alphabetical characters.
If you do not define a mask format, the Data Masking transformation replaces each source character with any character. If the mask format is longer than the input string, the Data Masking transformation ignores the extra characters in the mask format. If the mask format is shorter than the source string, the Data Masking transformation masks the remaining characters with format R.
Source String Characters
Source string characters are source characters that you choose to mask or not mask. The position of the characters in the source string does not matter. The source characters are case sensitive.
You can configure any number of characters. When Characters is blank, the Data Masking transformation replaces all the source characters in the column.
Select one of the following options for source string characters:
- Mask Only
- The Data Masking transformation masks characters in the source that you configure as source string characters. For example, if you enter the characters A, B, and c, the Data Masking transformation replaces A, B, or c with a different character when the character occurs in source data. A source character that is not an A, B, or c does not change. The mask is case sensitive.
- Mask All Except
- Masks all characters except the source string characters that occur in the source string. For example, if you enter the filter source character “-” and select Mask All Except, the Data Masking transformation does not replace the “-” character when it occurs in the source data. The rest of the source characters change.
Source String Example
A source file has a column named Dependents. The Dependents column contains more than one name separated by commas. You need to mask the Dependents column and keep the comma in the test data to delimit the names.
For the Dependents column, select Source String Characters. Choose Don’t Mask and enter “,” as the source character to skip. Do not enter quotes.
The Data Masking transformation replaces all the characters in the source string except for the comma.
Result String Replacement Characters
Result string replacement characters are characters you choose as substitute characters in the masked data. When you configure result string replacement characters, the Data Masking transformation replaces characters in the source string with the result string replacement characters. To avoid generating the same output for different input values, configure a wide range of substitute characters, or mask only a few source characters. The position of each character in the string does not matter.
Select one of the following options for result string replacement characters:
- Use Only
- Mask the source with only the characters you define as result string replacement characters. For example, if you enter the characters A, B, and c, the Data Masking transformation replaces every character in the source column with an A, B, or c. The word “horse” might be replaced with “BAcBA.”
- Use All Except
- Mask the source with any characters except the characters you define as result string replacement characters. For example, if you enter A, B, and c result string replacement characters, the masked data never has the characters A, B, or c.
Result String Replacement Characters Example
To replace all commas in the Dependents column with semicolons, complete the following tasks:
- 1. Configure the comma as a source string character and select Mask Only.
The Data Masking transformation masks only the comma when it occurs in the Dependents column.
- 2. Configure the semicolon as a result string replacement character and select Use Only.
The Data Masking transformation replaces each comma in the Dependents column with a semicolon.
Range
Define a range for numeric, date, or string data. When you define a range for numeric or date values the Data Masking transformation masks the source data with a value between the minimum and maximum values. When you configure a range for a string, you configure a range of string lengths.
String Range
When you configure random string masking, the Data Masking transformation generates strings that vary in length from the length of the source string. Optionally, you can configure a minimum and maximum string width. The values you enter as the maximum or minimum width must be positive integers. Each width must be less than or equal to the port precision.
Numeric Range
Set the minimum and maximum values for a numeric column. The maximum value must be less than or equal to the port precision. The default range is from one to the port precision length.
Date Range
Set minimum and maximum values for a datetime value. The minimum and maximum fields contain the default minimum and maximum dates. The default datetime format is MM/DD/YYYY HH24:MI:SS. The maximum datetime must be later than the minimum datetime.
Blurring
Blurring creates an output value within a fixed or percent variance from the source data value. Configure blurring to return a random value that is close to the original value. You can blur numeric and date values.
Blurring Numeric Values
Select a fixed or percent variance to blur a numeric source value. The low blurring value is a variance below the source value. The high blurring value is a variance above the source value. The low and high values must be greater than or equal to zero. When the Data Masking transformation returns masked data, the numeric data is within the range that you define.
The following table describes the masking results for blurring range values when the input source value is 66:
Blurring Type | Low | High | Result |
---|
Fixed | 0 | 10 | Between 66 and 76 |
Fixed | 10 | 0 | Between 56 and 66 |
Fixed | 10 | 10 | Between 56 and 76 |
Percent | 0 | 50 | Between 66 and 99 |
Percent | 50 | 0 | Between 33 and 66 |
Percent | 50 | 50 | Between 33 and 99 |
Blurring Date Values
Mask a date as a variance of the source date by configuring blurring. Select a unit of the date to apply the variance to. You can select the year, month, day, or hour. Enter the low and high bounds to define a variance above and below the unit in the source date. The Data Masking transformation applies the variance and returns a date that is within the variance.
For example, to restrict the masked date to a date within two years of the source date, select year as the unit. Enter two as the low and high bound. If a source date is 02/02/2006, the Data Masking transformation returns a date between 02/02/2004 and 02/02/2008.
By default, the blur unit is year.