Data Subset
You can extract a subset of data from the source and move to the target in a Data Masking task.
The Data Masking task maintains primary and foreign key relationships in the subset data and reconciles the object relationships in the subset data.
Configure the following data subset options from the Data Subset page:
- Data Filters
- The data filter that you want to apply on the source. You can create a simple or an advanced data filter for an object. You can apply filter to a single object in a task. You can apply multiple filters on the same object. You can add one advanced filter in a task. You can also use the filter values from a parameter file and mention the file name in the task.
- Relationship Behavior
- You can configure relationships when you select multiple source objects. When you perform a data subset operation, the Data Masking task selects all the parent records of an object to maintain referential integrity. The task selects the child record if you configure to include the child objects. You can configure relationships of the child objects after you apply data filter. You can select the child objects that you want to include in the data subset. You can view and download the schema graph that shows the graphical representation of the relationships between the source objects.
You can view the number of join operations that are required to compute a subset operation. View the sequence in which the task selects the records to create a data subset.
- Subset Statistics
- You can view the subset statistics such as total number of rows, the number of subset rows, and the subset size for each source object. The source contains large amount of data, but the target in which you want to create a subset might not contain enough space. To estimate the target size, you can compute the data subset. After you save and compute the subset, you can view the estimated target size on all the Data Masking task pages. If the estimated target size is large, you can update the task and compute the subset again.
You can compute the data subset size for multiple source objects.
Data Subset Options
Configure the data subset options on the Schedule page. You can view the data subset options if you select multiple source objects.
Configure the following data subset options:
- Staging Connection
- The connection that the task uses to run the data subset operation.
- Source Lookup Batch Size
- The number of records to retrieve from the Salesforce source in one SOQL query when the task writes to the target. Uses Salesforce standard API because the standard API limit is higher than the bulk API. Enter a number between 10 and 200 based on the SOQL character limit restriction for Salesforce.
- Drop Staging Tables
- Drops the staging tables even if there are error rows in the task.
- When you configure data subset filters and run the Data Masking task, the task runs through the staging, subset computation, target load, and staging drop phases. By default, if there are error rows, the task does not drop the staging tables. You can correct the errors and restart the task. The task resumes from the phase at which it failed. The staged data that is saved in the Secure Agent machine consumes some storage space, and you can choose to drop the staging tables even if there are errors rows present.
Automatic Task Recovery
In a Data Masking task, you can save and compute a subset. When you run the task after subset computation, the task recovers from the previous stage and continues to the next stage.
When you configure data subset filters and run a Data Masking task, the task runs through the staging, subset computation, target load, and staging drop stages.
You can save and compute the subset to estimate the target subset size. When you click Save and Compute Subset, the task stages the records and computes the subset. After you compute the subset, if you click Save and Run, the task resumes to load the target and then drops the staging tables.
If you save and run a task without computing the subset, the task runs through all the stages and drops the staging tables at the end.
Parameter Files in Data Filters
In a Data Masking task, you can use user-defined parameters in simple and advanced data filters.
When you use a parameter in a filter, start the filter with the parameter. Use two dollar signs to name the parameter in the following format: $$<parameter>
Save the parameter file local to the following directory:
<Secure Agent installation directory>/apps/Data_Integration_Server/data/userparameters
You can specify the parameter file name on the Schedule page of the task wizard. The parameter values are applied when the task runs.
Example 1. Simple Filter Example
Consider that you apply filter on the Account object. Configure the filter condition that the Created Date must be equal to $$param. Then create a parameter file with the following content:
$$param=('1991-10-03')
The following image shows a simple filter with the use of a parameter:
Example 2. Advanced Filter Examples
Consider that you apply filter on the Account object. In the Advanced Data Filter dialog box, you can specify the filter expression that the Account Name field must pick all the values from $$param. Then create a parameter file with the following content:
$$param=('Apple' , 'Microsoft')
The following image shows an advanced filter with the use of parameter as a value:
You can also specify a filter expression as a parameter. Enter $$param as the filter expression. Then create a parameter file with the following content:
$$param=Name IN ('Apple' , 'Microsoft')
The following image shows an advanced filter with the use of parameter as an expression: