Introduction > Introduction to Data Quality > Data Quality
  

Data Quality

Use Data Quality to create data quality assets. Add the assets to transformations in a mapping in Data Integration.
When you select Data Quality from the My Services page, the Home page appears.
The following image shows the Data Quality Home page:
The image shows the Data Quality home page.The Home page includes a summary of the asset types that you created and a list of the most recent assets that you worked on.
The Home page displays the following panels:
You can access the following pages from the navigation bar for Data Quality:
When you switch from Data Quality to another service, the panels and the options in the navigation bar change to suit the service.

Data quality life cycle

The assets that you configure for your data quality projects constitute a set of operations that you can perform across Informatica Intelligent Cloud Services.
To understand and improve the quality of your data, you can move the data through the following stages:
  1. 1Discover. Analyze the content and structure of your source data.
  2. To analyze the content and structure, create a profile in Data Profiling.
    Note: You can open and run profiles from the Explore page in both Data Profiling and Data Quality.
  3. 2Design. Create assets to address the issues that you find in the source data.
  4. Create the assets in Data Quality.
  5. 3Apply. Add the assets to one or more mappings, and run the mappings on the data.
  6. Design and run the mappings in Data Integration.
  7. 4Measure. Run profiles to review the results of the mappings.
  8. Optionally, update the assets that you created in Data Quality and run the mappings again to optimize the quality of the data.

Data Quality dimensions

Your organization may target a range of objectives when you build data quality initiatives into your data systems. For example, you may need to eradicate duplicate records in order to comply with regulatory standards. Or, you might recognize that postal address accuracy is sub-optimal across your records. Or, you might decide to mine additional information and value from your current data.
The needs of every organization are unique, but the data quality issues that your data may demonstrate can fall into a range of common categories. Data Quality assets can identify these categories as dimensions.
You can set the Dimension option in data quality assets that correspond to passive transformations in Data Integration. Set the option to specify the data quality issue that you want the asset to address. You can set the Dimension option in a cleanse, labeler, parse, rule specification, and verifier asset. A scorecard can read the dimension that you set on a rule specification asset.
The following dimensions are built-in Data Quality:
Accuracy
Select Accuracy when the asset logic is primarily concerned with establishing the accuracy of data values. Data is accurate when it matches a known data fact that the asset can verify.
For example, a business rule may require that each employee in an organization has the correct data security clearance for their role. The organization maintains a set of personnel records that includes the security clearance level and job title of each employee. You can configure an asset to compare the security clearance data to the job title data in each record and to verify that the values match accurately.
You might use dictionaries that contain the job titles and the security clearance levels to verify that the respective data values are correct.
Validity
Select Validity when the asset logic is primarily concerned with establishing the validity of the data. Data is valid when it meets the formal and structural requirements of a business rule that your organization defines. For example, valid data might use the data type and conform to the character length that the business rule expects.
Note: Validity and consistency are similar dimensions. However, data values can be consistent but not valid. Consistency is a measure of the similarity in form between the data values in a column. Validity is a measure of the correspondence between the formal aspects of the column data and the format that your organization requires.
Completeness
Select Completeness when the asset logic is primarily concerned with establishing the completeness of the data.
For example, a business rule in your organization might require that one or more data columns do not contain null data. You can configure a rule specification with one or more rule statements that search the relevant columns for null data.
Consistency
Select Consistency when the asset logic is primarily concerned with establishing the consistency of the data within one or more columns. The data in a column is consistent when the column values conform to a uniform character format. Additionally, column data can be consistent in the use of an agreed set of terms for different pieces of information. For example, you might configure a cleanse asset to standardize street descriptors such as Street and Road to ST and RD.
Uniqueness
Select Uniqueness when the asset logic is primarily concerned that a data set does not contain duplicate records. Two or more records are duplicates of each other when they refer to the same data entity with substantially the same data. To report on the uniqueness of the records, use a deduplicate asset.
A deduplicate asset applies a threshold score to the results of the comparisons that it makes between pairs of records in a data set. You can feed the output from a Deduplicate transformation to a Rule Specification transformation in a mapping, and you can configure the Rule Specification transformation to apply a status value to records according to their threshold scores. You can assign the Uniqueness dimension to the rule specification asset in the Rule Specification transformation.
Timeliness
Select Timeliness when the primary purpose of the asset is to verify that the record data is current. Current data represents the most recent version of a data fact.
For example, a retail organization might require that warehouse inventory records are updated every day. You can define a rule specification to check that the date stamp on each inventory record matches the current date.
In addition to built-in dimensions, you might see additional custom dimensions. Custom dimensions that you create in Metadata Command Center also appear in Data Quality.

Rules and guidelines for dimensions

Consider the following rules and guidelines when you add a dimension to an asset: