Data classification is the process of identifying and organizing data into relevant categories based on the functional meaning of data. Classifying data can help your organization manage risks, compliance, and data security.
In Metadata Command Center, enable the data classification capability on a catalog source and create data classification rules to identify and classify data into relevant categories. For example, you can create data classification rules to identify and tag sensitive information such as credit card numbers or customer addresses contained in columns or tables. You can then run the data classification rule against the catalog source and view the classification of columns and tables in Data Governance and Catalog.
For more information about creating data classifications, see the Administration help module in Metadata Command Center.
Types of data classifications
Depending on the type of data that you want to identify and classify in your organization, there are three different types of data classifications that you can create and view in your organization.
Data element classification
This is the smallest unit of data classification. It refers to the classification of columns or fields of tables or files. Data element classification labels and categorizes information contained in data elements based on the metadata extracted from source systems and the facts collected as the result of data profiling. For example, you can use data element classification to find sensitive information in columns, such as, credit card numbers, Social Security Numbers or the driver's license numbers. You can then take actions to secure access to sensitive data and set standards for data privacy in your organization.
Data entity classification
A data entity is a collection of data elements and is derived based on an inclusion scope. For example, if 'Full Name', 'Gender', 'Date of Birth', 'Email', or 'Phone #' are identified in one or more columns of a table, then that table is classified as a 'person' entity. You can use entity classifications to identify important characteristics of data and group them together as entities. This classification identifies data entities such as purchase order, invoice, customer, location, person, or address contained in a data set.
CLAIRE-generated data classification
This classification is powered by CLAIRE. When you select it, the system automatically generates data classifications for the data elements extracted from a source system. CLAIRE can identify, analyze, and classify the data contained in the data elements without any human input.
When you use rule-based data classifications, you choose from predefined or custom data classifications. When you use CLAIRE-generated data classifications, CLAIRE uses the nomenclature of technical data assets to generate classifications. To generate potential classification labels based on asset names, CLAIRE uses an embedded dictionary.
Generated data classification powered by CLAIRE has the following advantages:
•You can generate data classifications without creating data classification rules. You don't have to know how to create inclusion rules in Metadata Command Center.
•You can generate data classifications even if you don't know which of the existing predefined rule-based data classifications you need to select. CLAIRE automatically generates data classifications for the data elements.
•You can either promote a generated classification to a data element classification, or reject the generated classification.
•Once you promote the generated classification, the system remembers and automatically accepts it in future scans. When you use data classification rules, the rules take precedence.
•If you use generated data classification, you don't have to perform profiling. If you use rule-based classifications with Statistics attributes, you need to run data profiling.
For more information about working with generated data classifications, see the Asset Management help.
View data classifications for technical assets
You can view data element classifications, data entity classifications, and generated data classifications associated with a technical asset on the asset page.
To view data classifications for technical assets, ensure that the organization administrator grants you the Read permission on data classification assets through access policies in Metadata Command Center. For more information about how administrators manage access to assets, see the Asset Management help module.
After you run the data classification capability for a catalog source in Metadata Command Center, you can curate the data classifications in Data Governance and Catalog. For more information about curating data classifications for technical assets, see Curate data classifications for technical assets.
The following image shows the data element classifications:
Curate data classifications for technical assets
As a stakeholder, you want to label or classify data into a variety of categories that are critical to your organization, you can curate data classifications in Data Governance and Catalog.
After you add the data classification rules to a catalog source and run the data classification job for the catalog source in Metadata Command Center, you can curate data classifications for the extracted technical assets in Data Governance and Catalog. When you curate data classifications, you either accept the inferred data classifications to associate them to technical assets, or decline the inferred data classification associations. You can also manually associate technical assets with one or more data classification rules that you have created. For more information about creating data classifications, see the Administration help module in Metadata Command Center.
The following guidelines apply if you want to curate data classifications:
•The data classifications that you want to curate should be in the Published Lifecycle stage.
•The organization administrator must grant you the Read permission on data classification assets and the Update permission on technical assets through access policies in Metadata Command Center.
•You can associate data classifications that have been configured with or without inclusion rules in Metadata Command Center.
Accept or decline inferred data classifications
When you add data classification rules to a catalog source in Metadata Command Center, the system identifies the columns and tables that match the rules and displays one or more matched data classifications on the column or table asset pages in Data Governance and Catalog. If your role has the required privileges, you can see the inferred data classifications in the Accepted section of the Data Element Classifications or Data Entity Classifications panel.
The following image shows a column asset page with the inferred data element classifications that match the column data and metadata. All accepted data classification assets appear with an orange color border.
If the data classifications inferred for the columns or tables are not suitable, you can decline the accepted data classifications from the Accepted section. The declined classifications move to the Declined section of the Data Element Classifications or Data Entity Classifications panel. The declined classification automatically changes from orange to grey color border. In the above example, three inferred data element classifications were declined for the column.
Manually associate data classifications with technical assets
If you did not add data classification rules while configuring the catalog source in Metadata Command Center, you can manually add data classifications to technical assets in Data Governance and Catalog. To manually associate data classifications with technical assets, perform one of the following actions:
•On the Data Element Classifications or Data Entity Classifications panel, click the add icon, and select one or more data classifications to associate with the data element or the data set.
The associated data classifications appear in the Accepted section of the Data Element Classifications panel for a data element or in the Data Entity Classification panel for a data set.
The following image shows the add icon and the manually associated classifications in the Data Elements Classifications panel on the column page.
•On the Contains tab, use the Action menu next to the data element for which you want to curate data classifications to open the Curate Glossaries and Data Classifications dialog box. On this dialog box, you can manually associate data classifications with specific data elements.
The following image shows the Contains tab of a table:
Bulk accept or decline inferred and manually associated data classifications
You can also bulk accept or decline all inferred and manually associated data classifications for a column or a table. To bulk accept or decline data classifications for a technical asset, click the action menu and select the appropriate option.
The following image shows the options for bulk accepting or declining data classifications:
Remove declined data classifications
You can remove the data element classifications and data entity classifications that are declined for a technical asset.
Optionally, you can bulk remove all of the declined data classifications. To bulk remove all the declined classifications for a technical asset, click the action menu and select Remove All Declined Classifications.
The following image shows the option for bulk removing declined data classifications:
Data sensitivity levels for assets
Sensitivity labels in Data Governance and Catalog help you classify and protect important data in your organization. This helps an organization understand the value of its data, evaluate whether the data is at risk, execute control measures to mitigate risks, and helps the organization to comply with relevant industry-specific regulatory mandates
While creating a data element classification in Metadata Command Center, you can select the sensitivity level to indicate whether an asset is sensitive or non-sensitive. For more information on creating a data element classification, see the Administration help in Metadata Command Center.
In Data Governance and Catalog, you can view the sensitivity labels for the data elements that the classification is related to. You can search and find assets based on the sensitivity level by using appropriate search queries. For more information about search query examples, see the Asset Discovery help.
View sensitivity of a technical asset
To view the sensitivity of a technical asset in Data Governance and Catalog, the organization administrator must assign the following permission to you through access policies in Metadata Command Center:
•Read permission on technical assets for which you want to view the sensitivity
•Read permission on the data classification assets
By default, the sensitivity attribute has the following three levels:
•High. Use for data that is sensitive for your organization. Typically, these can be confidential information, such as credit card details, customer emails and contact numbers, intellectual property, or financial records.
•Medium. Use for data that is moderately sensitive for your organization. Typically, these can be internally confidential information, such as employee IDs, technical information, or emails.
•Low. Use for data that is not sensitive for your organization. Typically, these can be public information, such as names, designations, qualifications, or public website content. The low sensitivity level is not specifically shown on the Data Governance and Catalog page.
Apart from the predefined sensitivity levels that are listed in this help, you might see other sensitivity levels that your organization administrator can customize exclusively for your organization in Metadata Command Center. For information about how your administrator can customize sensitivity levels, see the Administration help in Metadata Command Center.
You can view the sensitivity for data elements on various pages in Data Governance and Catalog. The following image displays the sensitivity of an asset on the asset page: