Tasks > Masking tasks > Data subset
  

Data subset

You can extract a subset of data from the source and move to the target in a masking task.
The masking task maintains primary and foreign key relationships in the subset data and reconciles the object relationships in the subset data.
Configure the following data subset options from the Data Filters page:
Data Filters
The data filter that you want to apply on the source. You can create a simple or an advanced data filter for an object. You can apply a filter to a single object in a task. You can apply multiple filters on the same object. You can add one advanced filter in a task. You can also use the filter values from a parameter file and mention the file name in the task.
Relationship Behavior
You can configure relationships when you select multiple source objects. When you perform a data subset operation, the masking task selects all the parent records of an object to maintain referential integrity. The task selects the child record if you configure to include the child objects. You can configure relationships of the child objects after you apply a data filter. You can select the child objects that you want to include in the data subset. You can view and download the schema graph that shows the graphical representation of the relationships between the source objects.
You can view the number of join operations that are required to compute a subset operation. View the sequence in which the task selects the records to create a data subset.
Subset Statistics
You can view the subset statistics such as total number of rows, the number of subset rows, and the subset size for each source object. The source contains large amount of data, but the target in which you want to create a subset might not contain enough space. To evaluate the target size, you can estimate the data subset. After you estimate the subset, you can view the estimated target size on all the masking task pages. If the estimated target size is large, you can update the task and estimate the subset again.
You can estimate the data subset size for multiple source objects.

Data subset options

Configure the data subset options on the Schedule page. You can view the data subset options if you select multiple source objects.
Configure the following data subset options:
Staging Connection
The connection that the task uses to run the data subset operation.
Source Lookup Batch Size
The number of records to retrieve from the Salesforce source in one SOQL query when the task writes to the target. Uses Salesforce standard API because the standard API limit is higher than the bulk API. Enter a number between 10 and 200 based on the SOQL character limit restriction for Salesforce.
Drop Staging Tables
Drops the staging tables even if there are error rows in the task.
When you configure data subset filters and run the masking task, the task runs through the staging, subset computation, target load, and staging drop phases. By default, if there are error rows, the task does not drop the staging tables. You can correct the errors and restart the task. The task resumes from the phase at which it failed. The staged data that is saved in the Secure Agent machine consumes some storage space, and you can choose to drop the staging tables even if there are errors rows present.

Automatic task recovery

In a masking task, you can estimate a subset. If you estimate the subset and then run the task, the task recovers from the previous stage and continues to the next stage.
When you configure data subset filters and run a masking task, the task runs through the staging, subset estimation, target load, and staging drop stages.
You can estimate the subset to evaluate the target subset size before you run the task. When you click Estimate, the task stages the records and estimates the subset. After you estimate the subset, if you click Run, the task resumes to load the target and then drops the staging tables.
If you save and run a task without estimating the subset, the task runs through all the stages and drops the staging tables at the end.
Every task has an associated staging schema. After the task runs through all the stages, the task drops staging tables if there are no errors. If you did not choose to drop the staging tables and if there are error rows present, the task does not drop the staging tables. If you run the same task after a few days, the task performs the data subset operation on the old data. To run the task with updated data, you must first reset the task. When you reset the task, the task status returns to the start stage. You can then estimate the subset and then run the task or directly run the task. In both cases, the task runs through staging and estimation stages and then loads the tables into the target.

Parameter files in data filters

In a masking task, you can use user-defined parameters in simple and advanced data filters.
When you use a parameter in a filter, start the filter with the parameter. Use two dollar signs to name the parameter in the following format: $$<parameter>
Save the parameter file local to the following directory:
<Secure Agent installation directory>/apps/Data_Integration_Server/data/userparameters
You can specify the parameter file name on the Schedule page of the task wizard. The parameter values are applied when the task runs.
Example 1. Simple Filter Example
Consider that you apply a filter on the Account object. Configure the filter condition that the Created Date must be equal to $$param. Then create a parameter file with the following content:
$$param=('1991-10-03')
The following image shows a simple filter with the use of a parameter:
The Data Filter dialog box shows a simple filter that is applied on the Account object. The filter applied is that the Created Date must be equal to $$param.
Example 2. Advanced Filter Examples
Consider that you apply a filter on the Account object. In the Advanced Data Filter dialog box, you can specify the filter expression that the Account Name field must pick all the values from $$param. Then create a parameter file with the following content:
$$param=('Apple' , 'Microsoft')
The following image shows an advanced filter with the use of parameter as a value:
The Advanced Data Filter dialog box shows that the filtered object is Account, the list of fields, and a filter expression. The following filter expression is entered: Account.Name IN $$param
You can also specify a filter expression as a parameter. Enter $$param as the filter expression. Then create a parameter file with the following content:
$$param=Name IN ('Apple' , 'Microsoft')
The following image shows an advanced filter with the use of a parameter as an expression:
The Advanced Data Filter dialog box shows that the filtered object is Account, the list of fields, and a filter expression. The following filter expression is entered: $$param