User Guide > Policies > Data Domains
  

Data Domains

A data domain is an object that represents the functional meaning of a column based on the column data or the column name. Configure data domains to group data source columns for data masking. You can assign a masking rule to a data domain and all the columns in the data domain are masked with the same rule. You can add generation rules to a data domain so that TDM generates data with the same generation rule.
Create data domains to describe the columns you need to mask with the same masking rules. Assign at least one masking rule to each data domain.
For example, you might need to mask all the instances of Social Security number with the same masking rule. You can create a data domain that describes the Social Security data that occurs in the different columns. A database might have a Social Security number in a column called SSN. The database also has a column called SOCIAL_SECURITY in a different table. A Social Security number might also appear in a COMMENTS column.
When you create the data domain, you create a data expression that describes the data format for Social Security numbers. A Social Security number has this format: 999-99-9999. You can also create multiple metadata expressions that describe possible column names for Social Security numbers. Social Security column names might include SSN or Social.
You can add data generation rules to a data domain. TDM lists the preferred data generation rules for a data domain. You can edit the list or add another generation rule.
After you define a data domain, you can add the data domain to a policy. You can run profiles for data discovery against data sources in a project. Run profiles to find the columns for data domains. For example, the profile job can find all the Social Security numbers in the source data based on how you defined the data domain. The profile assigns data domains to columns.
Note: If you do not have Data Discovery, you can still use data domains to aggregate data. However, you must manually associate source columns with the data domains.

Apply Masking Rules to a Data Domain

You can assign one or more data masking rules to the data domain. When you assign a masking rule to a data domain, the columns in the domain receive the data masking rule when you configure data masking.
When you assign data masking rules to the data domain, the rules are called preferred rules. If you assign multiple rules to the data domain, you enable one of the rules to be the default rule. The default rule is applied to all columns in the data domain. You can manually change the masking rule for a column to a different preferred rule. You can also apply more than one masking rule to a column.
For example, an organization has a data domain called Last_Name. The Last_Name data domain describes columns that contain last names in company databases. The company can use a shuffle masking rule to mask the last names of customers in a database. The shuffle masking rule is the default rule. The organization applies a substitution masking technique to mask the last names of customers in a different table. The substitution masking rule is a different preferred masking rule in the data domain.

Apply Generation Rules to a Data Domain

You can assign one or more data generation rules to a data domain. When you assign a generation rule to a data domain, the columns in the data domain receive the data generation rule when you configure data generation.
TDM lists the preferred data generation rules for a data domain. You can edit the list or add another generation rule.
When you add data generation rules to a data domain, you can enable one of the rules to be the default rule. The default rule is applicable to all columns in the data domain. To change a default rule, you must edit the data domain and enable the default rule that you want. You can also apply more than one generation rule to a data domain.
For example, an organization has a data domain called Date_of_Birth. The Date_of_Birth data domain describes the columns that contain birth dates of the customers. The organization can use the date random generation rule to generate birth dates. The date random generation rule is the default rule. The organization applies a date sequence generation technique to generate the birth dates of customers in a different table. The date sequence generation rule is a different preferred generation rule in the data domain.

Metadata and Data Patterns for Data Domains

A data pattern and a metadata pattern are regular expressions that you configure to group columns into a data domain. Use regular expressions to find sensitive data such as IDs, telephone numbers, postal codes, and Social Security numbers in the source data.
A regular expression is a text string that describes a search pattern. A regular expression provides a way to match strings of text or patterns of characters in the source data.
A data domain expression can contain data expressions and metadata expressions. A data expression identifies data values in a source. A metadata expression identifies column names in a source. When a data domain contains multiple expressions, any column name or column value that matches an expression in the pattern appear in the search results.

Regular Expression Syntax

A regular expression contains characters that represent source character types, source character sets, and string or word boundaries in the source columns. A regular expression can also contain quantifiers that determine how many times characters can occur in the source data. Regular expressions are case sensitive.
The following special characters are examples of characters that you can include in a regular expression:
Any character except [\^$.|?*+()
All characters except the listed special characters match a single instance of themselves. For example, abc always matches abc.
\ (backslash) followed by any of the following special characters: [\^$.|?*+(){}
A backslash escapes any special character in a regular expression, so the character loses the special meaning.
* (asterisk)
Matches the preceding token zero or more times.
[ (left bracket)
Marks the beginning of specifications for one character that you want to match.
- (hyphen)
Specifies a range of characters. For example, [a-zA-Z0-9] matches any letter or digit.
] (right bracket)
Marks the end of the specifications for one character.
? (question mark)
Makes the preceding item optional.
{n} where n is an integer > = 1
Repeats the previous item n times.
For information about creating regular expressions, see tutorials and documentation for regular expressions on the internet such as http://www.regular-expressions.info/tutorial.html.

Data Patterns

Data patterns are regular expressions that describe the format of source data in a data domain.
A data pattern can contain multiple data expressions. If any of the expressions match patterns of the data for a column, then the column belongs in the data domain. You can configure detailed regular expressions to identify data in columns.
For example, a Social Security number contains numbers in the following pattern:
999-99-9999
The following regular expression shows a data pattern that describes the format of a Social Security number:
[0-9]{3}-[0-9]{2}-[0-9]{4}

Metadata Patterns

A metadata pattern is a regular expression that identifies column names in a source. A metadata pattern can contain multiple metadata expressions.
A metadata expression can be a column name or part of a column name. For example, if you configure .*Name* as a metadata expression, column names such as Name, Employee_Name, and Organization_Name in the source appear in the search result.
A column name that matches any metadata expression in the pattern appears in the search results.
A Social Security number might have different column names. The following regular expressions are metadata expression to find Social Security numbers by column name:
.*SSN*
.*SOCIAL*
.*SECURITY*

Data Domain Options

When you create a data domain you configure options that describe the data domain.
Configure the following options to describe a data domain:
Name
Data domain name.
Sensitivity level
The sensitivity level for all columns in the data domain. The Administrator defines the sensitivity levels that you can choose from when you apply the sensitivity level option.
Description
Description of the data domain.
Status
Data domain status is enabled or disabled. When the data domain is enabled, a profile for data discovery includes the data domain. Default is enabled.

Creating a Data Domain

When you create a data domain, you can enter regular expressions that describe the data that you want to include in the data domain. You can also enter regular expressions that describe the names of database columns to include.
    1. To access the policies, click Policies.
    The Policies view shows a list of the policies, data domains, and rules in the TDM repository.
    2. Click Actions > New > Data Domain.
    3. Enter the name, sensitivity level, and description for the data domain. Click Next.
    4. Click Next.
    5. Optionally, enter a regular expression to filter columns by data pattern.
    6. To add more expressions for data patterns, click the + icon.
    7. To add regular expressions that filter columns by column name, click Next. Or, click Finish to skip entering any more data domain information.
    You can add multiple expressions.
    8. Enter regular expressions to filter columns by column name.
    9. Click Next if you want to apply preferred masking or generation rules to the data domain. Or, click Finish to finish configuring the data domain.
    10. To add preferred masking and generation rules to the data domain, click Add Rules.
    The Add Rules dialog box appears.
    11. Select the data masking rules and data generation rules that you want to add.
    12. Click OK.
    13. Enable a default masking rule and a default generation rule.
    14. Click Finish.

Copying a Data Domain

You can create a data domain by copying a data domain.
    1. To access the policies, click Policies.
    2. Click a data domain description to select the data domain.
    Do not open the data domain.
    3. Click Actions > Duplicate.
    The Copy <Data Domain Name> dialog box appears.
    4. Change the name and description of the data domain. Click Save.

Editing a Data Domain

You can edit a data domain to update the rules, data patterns, and metadata patterns.
    1. To access the policies, click Policies.
    The Policies view shows a list of the policies, data domains, and rules in the TDM repository.
    2. Click the name of the data domain that you want to edit.
    The data domain opens in a tab.
    3. Click Actions > Edit.
    The Edit dialog box appears.
    4. To add or edit expressions for data patterns, click the Data Patterns tab.
    5. To add or edit expressions for metadata patterns, click the Metadata Patterns tab.
    6. To add or edit masking or generation rules, click the Preferred Rules tab. Click Save.
    If you delete a rule that contains assignments, the Impacted Objects dialog box appears with the list of affected columns and plans.
    7. To download the list of affected columns and plans, click Export, and save the .csv file.
    8. To save the changes, click Continue.
    To update the changes in a plan, you must generate and run the plan again.

Deleting a Data Domain

When you delete a data domain, you delete the data domain assignments. When you delete a data domain, you do not delete the rules that you add to the data domain.
    1. To access the policies, click Policies.
    The Policies view shows a list of the policies, data domains, and rules in the TDM repository.
    2. Select the name of the data domain that you want to delete.
    3. Click Actions > Delete.
    The Delete Data Domain dialog box appears. If you delete a data domain that contains assignments, the Impacted Objects dialog box appears with the list of affected columns and plans.
    4. To delete the data domain that has no assignments, click OK.
    5. To delete the data domain that contains assignments, click Continue. To download the list of affected objects, click Export, and save the .csv file.
    To update the changes in a plan, you must generate and run the plan again.