Data Domain Discovery
Data domain discovery finds the source columns that contain similar data. The profile assigns the same data domain name to each column that contains similar data. You can assign the same data masking rules to all the columns in a data domain at the same time.
Create a data domain to describe the columns you need to mask with the same data masking rules. When you create a data domain, you configure regular expressions that define patterns in the data or patterns in the column names.
Run the data domain discovery profile to find the columns that match the criteria in the data domain regular expressions. When you configure a profile for data domain discovery, select the tables to search in the data domain discovery operation. Select which data domains to search for in the tables. You can select policies that contain data domains instead of selecting each data domain to search for.
After you run the profile for data discovery you can view the profile results. The profile results assign source columns to data domains. You can choose which profile results to use for data masking.
Data Domain Profiling on Hive and HDFS Sources
You can run data domain profiles on Hive and HDFS data sources to identify sensitive data.
You can run a data domain profile on Hive data sources. You cannot run primary key and entity profiles on Hive data sources.
You can run a profile on an HDFS source from the Developer tool. You can then import the profile results to the TDM repository with Test Data Manager. After you import the profile results, you must run the profiles to view the data domain profile results in Test Data Manager. You cannot view primary key profile and entity profile results for HDFS sources in Test Data Manager.
Data Domain Profile Sampling Options
When you run a profile for data domain discovery, configure sampling options to limit the number of rows to search or limit the regular expressions to search with.
The following table describes the sampling options that you can select in a profile for data discovery:
Option | Description |
---|
Data | Search for patterns in the data only. |
Column Name | Search for patterns in the column name only. |
Data and Column Name | Search for patterns in the data and in the column name. |
Maximum Rows to Profile | Limit the number of rows to profile. Default is 1000. |
Minimum Conformance Percent | Minimum percentage of rows where the column data or metadata matches the data domain. |
Assigning a Data Domain to Multiple Columns
You can manually assign a data domain to multiple columns at a time. You can also remove the data domain assignment from multiple columns at a time.
1. Open a project.
2. Navigate to the Discover | Columns view.
A list of all the columns in the project appears.
3. Select the columns that you want to assign the data domain to.
4. Click Actions > Edit Assignments.
The Edit Data Domain Assignment dialog box appears.
5. Choose the data domain to assign to the columns.
You can choose a blank data domain to remove the previous data domain assignment.
6. Click Save.
The data domain assignments appear in the Columns view.
Manually Updating the Column Data Domain
You can manually update a data domain for a column. When you add a data domain to a column, Test Data Manager marks the column as a sensitive column.
- 1. Open the project and click Discover | Columns.
- 2. Click the Domain column for the column you want to update.
A list of data domains appears.
- 3. Select the data domain to add to the column.
Note: You can run a profile for data domain discovery to update the data domain for columns.