Automating Data Quality Rules Overview
In Axon, you can automate the process of running data quality rules on Enterprise Data Catalog fields instead of manually generating the mappings and producing data quality scores. You can use Enterprise Data Catalog resources to scan for data objects, send the values of the data sources to Informatica Data Quality, and run rules on thousands of fields instantly. Axon retrieves the scores of the data quality rules from Informatica Data Quality and displays the scores for each field.
In Axon, you can associate Glossary objects in Axon with data domains and fields in Enterprise Data Catalog. This association automatically creates Axon attributes for Enterprise Data Catalog fields. You can also create standard rules in Axon and link them with rules in Informatica Data Quality. When you make these associations, Axon automatically runs Informatica Data Quality rules on fields that are scanned by Enterprise Data Catalog. Informatica Data Quality generates scores for the rules that are run on the fields. Axon retrieves these scores and displays them in the Axon interface.
Example
You have created 10 glossary objects in Axon, and each object has 100 attributes that are associated to Enterprise Data Catalog fields. For each attribute, you want to run two data quality rules, one rule for consistency and one rule for completeness.
If you want to run the two data quality rules manually on the 1000 attributes, for each attribute, you must create 1000 local rules for consistency and 1000 local rules for completeness. As a result, you have to create 2000 local data quality rules manually in Axon. You must also create several profiles and scorecards in Informatica Data Quality and associate them to the local data quality rules as required.
If you choose to automate the rules, you can create 10 standard data quality rules for consistency and 10 standard data quality rules for completeness. When you link the standard rules in Axon to rules in Informatica Data Quality, Axon automatically creates 2000 local data quality rules for the attributes, sends the mapping requests to Informatica Data Quality to run the rules, and retrieves the scores and displays them in the Axon interface.
In this example, instead of creating 2000 local rules, you created only 20 standard rules and mapped the rules to the relevant rules in Informatica Data Quality. Axon automated the process of rules in Informatica Data Quality and displayed the data quality scores to you. The rule automation significantly reduces the effort to run data quality rules and increases productivity.
Understanding Terms
To understand how Axon runs data quality rules on fields scanned by Enterprise Data Catalog, you must be familiar with the following terms:
- Resource
- A resource is an Enterprise Data Catalog object that represents an external data source or metadata repository from where scanners extract metadata. Scanners in Enterprise Data Catalog fetch the metadata from resources. Axon displays resources from the relational sources, file systems such as Amazon S3, Native, and HDFS, and Salesforce application from Enterprise Data Catalog. You can also view Business Intelligence (BI) resources, such as Google BigQuery, Tableau, and Cognos.
- For more information about resources, see Enterprise Data Catalog Objects.
- Field
- Fields represent a source column or source field in an Enterprise Data Catalog resource. You can view the following types of Enterprise Data Catalog fields in Axon:
- - Columns in relational tables
- - Columns in relational views
- - Fields in CSV, JSON, and XML files
- - Fields in Salesforce objects
- For more information about fields, see Enterprise Data Catalog Objects.
- Data Domain
- A data domain is a predefined or user-defined object that enables you to discover the functional meaning of data. Examples of data domains include Social Security number, credit card number, and account status. Data domain discovery is the process of discovering the functional meaning of data in the data sources based on the semantics of data.
- For more information about data domains, refer the Column and Field Assets and Data Domain Assets topics in the Informatica 10.2.2 Enterprise Data Catalog User Guide.
- Data Quality Rule
- A data quality rule is a rule in Informatica Data Quality that checks the quality of the data in your systems. Data quality rules are stored in Informatica Data Quality as rule specifications and rule mapplets. A rule specification is the design of a rule in Informatica Data Quality. It refers to the requirements of a business rule that you can design in the Informatica Analyst Tool. A rule mapplet is an executable rule in Informatica Data Quality. Rule specifications are read-only in the Informatica Developer tool, but rule mapplets are editable templates that you can run on specific data objects.
To know more about rules, rule specifications, and rule mapplets, see the Rule Specification Guide.
- Standard Data Quality Rule
- A standard data quality rule in Axon is a textual description of a rule that must be run on some data.
For example, if your organization processes automobile insurance claims, you would have some predetermined criteria to determine whether the insurance claim of a particular customer qualifies or not. You might check the age of the car, distance driven, and premium amount paid, and then use a formula to determine whether a specific claim must be processed. In Axon, you can create a Glossary object called “Eligibility for Insurance Claim”, and mention a brief summary of the rule in the Description field. You can then go to the Data Quality tab of the Glossary object to view the standard data quality rule associated with the Glossary.
- Local Data Quality Rule
- A local data quality rule is an instance of a standard rule that you run on a specific data object. Standard data quality rules are textual descriptions of rules. Based on the Enterprise Data Catalog technical metadata associated with the Axon attributes, Axon automatically creates local data quality rules for the standard rule.
For example, the standard rule “Eligibility for Insurance Claim” helps you understand the eligibility to process a car insurance claim. When a specific customer makes an actual insurance claim for a car, Axon applies the standard rule for this particular claim, and creates a local rule for this specific instance.
A standard rule refers to the textual description of a rule, and a local rule refers to the instance of running the rule on actual data.
- Input Parameter
- An input parameter refers to a field on which a rule is run. In Axon, you can run rules that have single or multiple input parameters.
Some rules require a single input parameter. For example, if a policy requires that the first name of a person is mandatory, the rule must specify that the First Name field cannot be a null value. In this case, the input parameter is single because the values of the parameter can be either valid or null.
Some rules require several input parameters. For example, if a policy requires that a customer's full name be concatenated from the customer's first and last name, the rule must specify that the Customer Name field is a combination of the First Name and Last Name fields. In this case, the input parameters are multiple because the values of the Customer Name field are generated from two other fields.
For more information on input parameters, refer the Inputs topic in the Rule Specification Guide .
Automating Data Quality Rules Process
The following image shows the steps required to set up Axon with Enterprise Data Catalog and Informatica Data Quality so that Axon runs the rules automatically on data store fields:
The following steps describe how to automate data quality rules:
- 1. Set up Axon, Enterprise Data Catalog, and Informatica Data Quality.
- 2. Link objects as required between Axon and Enterprise Data Catalog. Link objects as required between Axon and Informatica Data Quality.
- 3. Onboard objects from Enterprise Data Catalog to Axon. Optionally, manually create links between Enterprise Data Catalog fields and Axon attributes.
- 4. Run rules on the objects onboarded from Enterprise Data Catalog in Informatica Data Quality.
Note: If the administrator has enabled the Data Marketplace option in the Admin Panel, make sure that you are in Data Governance view to set up rule automation and perform tasks in Axon.
Step 1. Set Up Axon, Enterprise Data Catalog, and Informatica Data Quality
The first step to automate rules is to set up Axon, Enterprise Data Catalog, and Informatica Data Quality by creating the required objects.
Perform the following steps to create the objects:
- 1. In Enterprise Data Catalog, create a resource to scan data sources, and configure the resource to discover data profiles and data domains.
- 2. In Axon, create a System object that maps to the resource you created in Enterprise Data Catalog. Link the System object to at least one Enterprise Data Catalog resource to enable onboarding objects from Enterprise Data Catalog to Axon.
- 3. In Axon, create a Glossary object.
- 4. Create an Axon resource type in Enterprise Data Catalog. Use the Axon resource type to scan the Axon glossaries.
- 5. In Axon, create a standard data quality rule for the glossary objects. The standard data quality rule is a textual definition of the rule that you want to run on the fields.
Step 2. Link Objects Between Axon, Enterprise Data Catalog, and Informatica Data Quality
The second step is to link the objects in Axon and Enterprise Data Catalog to onboard the fields, and link the objects in Axon and Informatica Data Quality to connect the rules.
Perform the following steps to link the objects:
- 1. In Axon, associate the Glossary object with the Enterprise Data Catalog data domains. In Enterprise Data Catalog, you can associate data domains and fields with the Axon glossary.
- 2. In Axon, link the standard data quality rule to an existing rule in Informatica Data Quality. This step associates a textual rule definition in Axon with a logical rule in Informatica Data Quality. Alternatively, Informatica CLAIRE® can recommend a rule and automatically create the rule in Informatica Data Quality.
Note: To recommend and create a rule automatically, the Axon Administrator must enable this option in the Admin Panel. For more information, see the Configure Data Quality Rule Automation topic in the Axon Data Governance 7.0 Administrator Guide.
- 3. In Axon, choose the option to automatically create local data quality rules for the standard data quality rule. This step enables data quality rule automation.
Step 3. Onboard Objects from Enterprise Data Catalog to Axon
The third step is to display the objects from Enterprise Data Catalog in Axon. You can manually link the attributes in Axon with the fields in Enterprise Data Catalog. You can also choose to automatically onboard the objects from Enterprise Data Catalog to Axon. When you choose to onboard objects, the data sets and attributes are automatically created and onboarded to Axon.
Perform the following steps to onboard the objects:
- 1. In Axon, choose the option to automatically onboard the objects from Enterprise Data Catalog.
- 2. After the key elements are identified, the data sets and attributes are automatically created and onboarded to Axon. Axon is now ready to send the fields to Informatica Data Quality to run the rules.
- 3. In Axon, you can choose to accept or reject the Attribute objects created automatically.
Step 4. Run Informatica Data Quality Rules on Onboarded Objects
The fourth step is to run the rules automatically. Axon automatically runs Informatica Data Quality rules on Enterprise Data Catalog fields and displays the scores.
Axon performs the following steps to display the scores:
- 1. Axon creates a local data quality rule for the attributes that apply to each data set. In this step, Axon creates an instance of a rule for each field scanned by Enterprise Data Catalog. Axon then sends the rule mapping requests for the attributes to Informatica Data Quality.
- 2. Informatica Data Quality receives the mapping requests and runs the rule on the fields as per the schedule that you have defined.
- 3. Informatica Data Quality generates data quality scores for each field.
- 4. Axon retrieves the data quality scores for each table column and displays the scores for each local rule.
Prerequisites
To run local data quality rules automatically, you must be familiar with resources and scanners in Enterprise Data Catalog, and rules and mappings in Informatica Data Quality.
Enterprise Data Catalog
To automate data quality rules, make sure that you verify the following prerequisites in Enterprise Data Catalog:
- 1. You have installed Enterprise Data Catalog. See the Product Availability Matrix for the supported versions.
- 2. You have configured the Enterprise Data Catalog parameters in the Admin Panel in Axon. You have also configured the following parameters for automated onboarding:
- - Enable Automatic Onboarding
- - Confidence Score Threshold
- - Axon Super Admin Email
- For more information, see Configure Access to Enterprise Data Catalog
- 3. The source database that you want Enterprise Data Catalog to scan is Oracle, SQL Server, IBM DB2, or Teradata.
- 4. The file sources for which you want to run rules are CSV flat files in native file systems.
Informatica Data Quality
To automate data quality rules, make sure that you verify the following prerequisites in Informatica Data Quality:
- 1. You have installed Informatica Data Quality. See the Product Availability Matrix for the supported versions.
- 2. The following services are running on Informatica Data Quality:
- - Model Repository Service with monitoring enabled
- - Data Integration Service
- - Content Management Service
- - Scheduling service configured to the Model Repository Service
- 3. You can connect Axon to Informatica Data Quality. For more information, refer the Configure Access to Informatica Data Quality topic in the Informatica Axon Data Governance 7.0 Administrator Guide.
- 4. Axon connects to the Axon Agent using the HTTP protocol.
- 5. The database where Informatica Data Quality stores the data quality scores is a relational database. The Axon Agent connects to this database to retrieve and display data quality scores in Axon.
- 6. To run data quality rules on fields, Informatica Data Quality must connect to the sources scanned by Enterprise Data Catalog. Make sure that all critical connection parameters are configured in Informatica Data Quality, such as the password and Support Mixed-Case Identifier option for the resource.
- 7. You have enabled rule automation in the Axon Admin Panel. To automate data quality rules, see the Automate Data Quality Rules topic in the Informatica Axon Data Governance 7.0 Administrator Guide.