User Guide > Data Masking Techniques and Parameters > Shuffle Masking
  

Shuffle Masking

Shuffle masking masks the data in a column with data from the same column in another row of the table. Shuffle masking switches all the values for a column in a file or database table. You can restrict which values to shuffle based on a lookup condition or a constraint. Mask date, numeric, and string datatypes with shuffle masking.
For example, you might want to switch the first name values from one customer to another customer in a table. The table includes the following rows:
100 Tom Bender
101 Sue Slade
102 Bob Bold
103 Eli Jones
When you apply shuffle masking, the rows contain the following data:
100 Bob Bender
101 Eli Slade
102 Tom Bold
103 Sue Jones
You can configure shuffle masking to shuffle data randomly or you can configure shuffle masking to return repeatable results.
For Hadoop data sources, you can use shuffle masking only when the source is a relational database and the target is Hadoop.
Note: If the source file might have empty strings in the shuffle column, set the Null and Empty Spaces option to Treat as Value in the rule exception handling. When you set the option to Treat as Value, the PowerCenter Integration Service masks the space or the null value with a valid value. The default is to skip masking the empty column.

Shuffle Masking Parameters

You can configure masking parameters to determine if shuffle masking is repeatable, the masking is repeatable for one workflow run, or the masking is random. You can also configure a lookup to ensure that replacement values originate from rows that contain specific values.
The following image shows Data Masking parameters that appear when you configure a Shuffle data masking rule:
The shuffle masking parameters are the random or representative shuffle type, seed number, and the constrained option.
The following table describes the parameters that you can configure for shuffle masking:
Parameter
Description
Shuffle Type
Select random or representative shuffling:
  • - Random. Shuffle values from one row to another without checking if the target values are unique for each source value. For example, the Integration Service masks 12345 with 65432 in a row. The Integration Service can also replace 33333 with 12345 in another row.
  • - Representative. All source rows with the same value receive the same shuffle value. When the Integration Service replaces 12345 with 65432, then it can use 65432 as a mask value for any row with a 12345 source value. Representative masking does not save values between workflow runs. Use repeatable masking to return the same values between workflow runs.
Seed
Starting point for creating repeatable output. Enter a number between 1 and 999. Default is 1.
Enabled when Representative Shuffle Type is selected.
Constrained
Restricts applying shuffle masking to rows that are constrained by another column. For example, shuffle employee names based on gender. Or, shuffle addresses within the same city. Choose the constraint column when you assign the rule to columns in a project.