You can define a data quality rule that you want to run on one or more data elements. The rule is run in the data quality application that is integrated with Data Governance and Catalog. After the rule is run and the results are ready, Data Governance and Catalog retrieves the rule scores and displays the results in graphical formats. You can also specify target and threshold values to categorize rule results as Good, Acceptable and Not Acceptable. For the catalog sources that you configure in Metadata Command Center, you can retrieve the data quality scores for the technical assets associated with the catalog source. You can disable the running of data quality rules for a catalog source in Metadata Command Center.
Prerequisites for running data quality rules
According to the data quality application that you use, you must configure the system and the users so that the data quality rules can be run properly. Before running data quality rules, ensure to enable the Data Quality option in Metadata Command Center for each catalog source.
Running data quality rules in Informatica Cloud Data Quality
If you want to define data quality rules in Data Governance and Catalog and run the rules in Informatica Cloud Data Quality, the following prerequisites apply:
•Data Governance and Catalog must be integrated with Informatica Cloud Data Quality.
•You must be able to access Informatica Cloud Data Quality using the same login credentials that you use for Data Governance and Catalog.
•To associate an existing rule in Data Quality with a Glossary asset in Data Governance and Catalog, you must have view permission for the rule specification in Data Quality.
•To create a rule in Data Quality and associate it with a Glossary asset in Data Governance and Catalog, you must have view and execute permissions for rules and profiles in Data Quality.
The organization administrator can assign you the required permissions and privileges in the Administrator service. For more information about assigning asset permissions and feature privileges to users, see Organization Administration in the Cloud Common Services help.
Running data quality rules in another data quality application
If you want to define data quality rules in Data Governance and Catalog and run the rules in another data quality application, the following prerequisites apply:
•Data Governance and Catalog must be integrated with the data quality application.
•You must have appropriate permissions and privileges to view, create and execute rules and profiles in the data quality application.
Supported assets for automated data quality rules
You can define and run automated data quality rules for the business and technical asset types.
Business assets
The following list shows the supported business assets:
- Data Set
- Data Quality Rule Template
- Glossary Business Term
- Glossary Domain
- Glossary Subdomain
- Glossary Metric
- Process
- Policy
- System
Technical assets
The following table shows the supported technical assets and the associated data types:
Technical assets
Supported data types
Amazon Athena
The following list shows the supported data types for the asset:
- bigint
- boolean
- char
- date
- decimal
- double
- float
- int
- smallint
- string
- timestamp
- tinyint
- varchar
Amazon Redshift
The following list shows the supported data types for the asset:
- smallint
- integer
- bigint
- decimal
- real
- double precision
- boolean
- char
- varchar
- date
- timestamp
- timestamptz
- geometry
- geography
- hllsketch
- super
- time
- timetz
- varbyte
Amazon S3
The string data type is supported for delimited CSV technical asset.
- The following list shows the supported data types for Avro technical asset:
- boolean
- int
- long
- double
- string
- record
- array
- The following list shows the supported data types for Parquet technical asset:
- boolean
- int32
- int64
- int96
- float
- double
- date
- decimal
- string
- time
- map
- struct
Google BigQuery
The following list shows the supported data types for the asset:
- boolean
- date
- datetime
- float
- integer
- numeric
- string
- time
- timestamp
Google Cloud Storage
- The following list shows the supported data types for Avro technical asset:
- boolean
- int
- long
- double
- string
- record
- array
- The following list shows the supported data types for Parquet technical asset:
- boolean
- int32
- int64
- int96
- float
- double
- date
- decimal
- string
- time
- map
- struct
JDBC
The following list shows the supported data types for the asset:
- bigint
- char(L)
- date
- decimal (p,s)
- float
- graphic
- integer
- numeric (p,s)
- smallint
- time
- timestamp
- varchar
- vargraphic
Microsoft Azure Data Lake Storage Gen2
- The following list shows the supported data types for Avro technical asset:
- boolean
- int
- long
- double
- string
- record
- array
- The following list shows the supported data types for Parquet technical asset:
- boolean
- int32
- int64
- int96
- float
- double
- date
- decimal
- string
- time
- map
- struct
Microsoft Azure SQL Server
The following list shows the supported data types for the asset:
- bigint
- numeric
- bit
- decimal
- int
- money
- smallint
- smallmoney
- tinyint
- float
- real
- date
- datetime2
- smalldatetime
- datetime
- time
- char
- varchar
- text
- nchar
- nvarchar
- ntext
Microsoft Azure Synapse
The following list shows the supported data types for the asset:
- datetime2
- datetime
- date
- time
- float
- real
- decimal
- numeric
- money
- smallmoney
- bigint
- int
- smallint
- tinyint
- bit
- nvarchar
- nchar
- varchar
- char
Microsoft SQL Server
The following list shows the supported data types for the asset:
- bigint
- numeric
- bit
- decimal
- int
- money
- smallint
- smallmoney
- tinyint
- float
- real
- date
- datetime2
- smalldatetime
- datetime
- time
- char
- varchar
- text
- nchar
- nvarchar
- ntext
Oracle
The following list shows the supported data types for Oracle and Oracle RDS assets:
- char
- date
- number
- number (p,s)
- timestamp
- varchar
- varchar2
- nchar
- nvarchar2
- double precision / float (126)
- timestamp with timezone
- timestamp with local timezone
- long
- interval day to second
- interval year to month
- binary float
- binary double
- float
- urowid
- rowid
SAP ERP
The following list shows the supported data types for SAP ERP assets:
- char
- clnt
- cuky
- curr
- dats
- dec
- int1
- int2
- lang
- lchr
- numc
- quan
- sstrg
- strg
- tims
- unit
Salesforce
The following list shows the supported data types for the asset:
- id
- reference
- boolean
- string
- datetime
- date
- double
- url
- percent
- currency
- phone
- textarea
- email
Snowflake
The following list shows the supported data types for the asset:
- number
- float/double
- varchar
- boolean
- date
- time
- timestamp_ltz
- timestamp_ntz
- timestamp_tz
- object
- array
- variant
Defining data quality rules for business assets
Define a data quality rule to be run on a Glossary asset in Data Governance and Catalog. To define the rule to run on glossary assets, create a data quality rule template, specify the properties of the rule template, and associate the rule template with the glossary asset on which you want to run the rule. The rule that you define is created in the data quality application that is integrated with Data Governance and Catalog. The data quality application runs the rule on the technical assets that are linked to the Glossary asset. After the rule is run, Data Governance and Catalog retrieves and displays the data quality scores of the rule.
Before you create a data quality rule template, make sure that the Glossary asset is linked to the technical assets on which you want to run the rules. For more information about linking Glossary assets to technical assets, see Curate glossaries for technical assets.
To manually create a data quality rule template that runs on the data elements linked to the glossary asset, perform the following steps:
1Open the glossary asset on which you want to run the rule, and click the Action menu on the right header of the asset page.
2From the Action menu, select Create Data Quality Rule Template.
3Alternatively, click New > Business Rules and select Data Quality Rule Template.
4Enter the following properties of the rule template:
Field
Description
Name
Name of the rule template.
Description
Description of the rule template.
Reference ID
Unique reference identifier for the rule template.
Dimension
Data quality dimension that applies to the data quality rule that is defined by the rule template.
Measuring Method
Method by which the data quality rule for the rule template is evaluated. Select one of the following values:
- Business Extract. The rule is measured on data that is exported for a particular business case.
- System Function: The rule is measured by a data quality system.
- Technical Script. The rule is measured by a script that is manually run by an analyst.
- Informatica Cloud Data Quality. The rule is measured by Informatica Cloud Data Quality.
Primary Glossary
Glossary asset that is linked to the technical assets on which you want to run the data quality rule.
Secondary Glossary
Additional glossary assets to which the data quality rule in the rule template applies.
Technical Rule Reference
Create a reference to a rule specification in the data quality application. You can create a new rule specification, or select from an existing rule specification.
- To create a new rule specification in the data quality application, click Create a new rule. Enter a description for the rule using natural language construction, and click View Recommendations. CLAIRE® reads the description that you enter and intelligently recommends a rule that it can create in the data quality application.
For more information about creating rules that CLAIRE® can interpret, see Guidelines for entering rule descriptions.
- To select from an existing rule specification in the data quality application, click Pick an existing rule.
After you select a rule option, click OK to go back to the rule template creation page.
Criticality
Criticality of the rule template. Select one of the following values:
- High
- Medium
- Low
Automation
Select the check box to indicate that you want the data quality rule to run automatically on the data elements that are linked to the Primary Glossary. If you select the check box, Data Governance and Catalog automatically runs the rule according to the schedule you specify in the Frequency field.
Target
Minimum acceptable data quality value for the asset to be considered "Good."
The target value is higher than the threshold value. For example, you can set the threshold value to 50 and the target value to 85.
Threshold
Minimum acceptable data quality value for the asset to be considered "Acceptable."
The target value is higher than the threshold value. For example, you can set the threshold value to 50 and the target value to 85.
Frequency
Frequency of running the data quality rule that is defined by the rule template.
- Select Daily to run the rule once in a day.
- Select Weekly to run the rule once in a week.
- Select Monthly to run the rule once in a month.
The rule is run when you click Create.
5When all the details of the rule template are ready, click Create.
When you click Create, the data quality application runs the rule on the data elements linked to the Glossary assets. If you selected the Automation check box, the data quality application runs the rule according to the schedule you specified in the Frequency field. You can now view the data quality scores when you open the rule template, the corresponding rule occurrences, the Glossary asset, and the linked technical assets. If a rule occurrence is created from an automated rule template, the term Auto- is prefixed to the name of the rule occurrence.
Defining data quality rules for data elements
Define a data quality rule to be run on a data element that is the part of the metadata extracted by Metadata Command Center. To define the rule, create a data quality rule occurrence, specify the properties of the rule occurrence, and associate the rule occurrence with the data element on which you want to run the rule.
Before you run the data quality rules, enable the Data Quality capability for the catalog source in Metadata Command Center.
The rule you define is created in the data quality application that is integrated with Data Governance and Catalog. The data quality application runs the rule on the data element. After the rule is run, Data Governance and Catalog retrieves and displays the data quality scores of the data element in the rule occurrence.
Note: The following guidelines apply for associating rule occurrences to data elements:
•The rule occurrences can be manually created, automatically generated, or can be associated with a data quality score card. For automatically generated rule occurrences, you can view the individual data quality scores of the data elements on which the automated data quality rules have been run. You can also manually run a data quality rule to generate manually created rule occurrences.
•In data profiling, when you run a profile that has a rule associated with it, you can view the data quality score card in Data Governance and Catalog after a successful profile run. Rule occurrences related to the score card are generated in Data Governance and Catalog. You can associate a data element to a rule occurrence that is generated from a score card.
•If you manually associate rule occurrences with data elements and then you purge the catalog source that the data element belongs to, the associated rule occurrences are also deleted.
To manually create a rule occurrence that runs on a data element, perform the following steps:
1Open the data element on which you want to run the rule, and click the Action menu on the right header of the data element page.
2From the Action menu, select Create Data Quality Rule Occurrence.
3Alternatively, click New > Business Rules and select Data Quality Rule Occurrence.
4Enter the following properties of the rule occurrence:
Field
Description
Rule Template
Data quality rule template that defines the parameters of the rule that you want to run on the data element.
Sync with Rule Template
Select to sync the rule occurrence with the rule template that you have specified.
If the rule occurrence is synced, the parameters defined in the rule template are applied for the data quality rule. The rule occurrence score inherits the technical rule reference, target and threshold values that you have specified in the rule template. The rule occurrence scores are updated according to the rule automation schedule that you have specified in the rule template.
If the rule occurrence is not synced, the rule occurrence is an independent asset that is not affected by the rule template parameters.
Name
Name of the rule occurrence.
Description
Description of the rule occurrence.
Reference ID
Unique identifier for the rule occurrence.
Dimension
Data quality dimension for which the data quality rule is run.
Measuring Method
Method by which the data quality rule for the rule occurrence is evaluated. This field can have one of the following values:
- Business Extract. The rule is measured on data that is exported for a particular business case.
- System Function: The rule is measured by a data quality system.
- Technical Script. The rule is measured by a script that is manually run by an analyst.
- Informatica Cloud Data Quality. The rule is measured by Informatica Cloud Data Quality.
Note: If you select Informatica Cloud Data Quality as the measuring method, the Technical Rule Reference field is mandatory. This field is not mandatory for other measuring methods.
Primary Data Element
Data element on which the data quality rule is run.
This data element is the input port for the rule in the integrated data quality system. The data quality score is always generated for the primary data element.
To add a primary data element, select a system asset or a catalog source. Next, select a technical data set or a business data set depending on whether you selected a catalog source or a system. Finally, select data elements from the selected data set.
If you do not enable the Data Quality option in Metadata Command Center for a catalog source, the options in this field are disabled. When you manually create a data quality rule occurrence and enter the primary data element to specify the data element on which the data quality rule is run, you cannot edit or change this primary input port for the data quality rule after the rule occurrence is created.
When you create a data quality score card in Data Profiling, a rule occurrence is generated from the data quality score card. You can now associate this rule occurrence with the primary data element. If you do not add the primary data element, no data quality score is displayed . Thus, the Primary Data Element option is only editable for rule occurrences associated with data quality score cards.
Secondary Data Element
Secondary input port for the data quality rule that is run on the Primary Data Element.
Technical Rule Reference
Create a reference to a rule specification in the data quality application. You can create a new rule specification, or select from an existing rule specification.
- To create a new rule specification in the data quality application, click Create a new rule. Enter a description for the rule using natural language construction, and click View Recommendations. CLAIRE® reads the description that you enter and intelligently recommends a rule that it can create in the data quality application.
- To select from an existing rule specification in the data quality application, click Pick an existing rule.
After you select a rule option, click OK to go back to the rule occurrence creation page.
Note: If you select Informatica Cloud Data Quality as the measuring method, the Technical Rule Reference field is mandatory. This field is not mandatory for other measuring methods.
When you manually create a rule occurrence, you can also specify the input parameters of the primary and secondary data elements gathered from multiple input parameters in the integrated data quality system. This helps you to evaluate the quality of an asset based on the inputs from multiple fields as defined in the data quality rule. For example, if you want to create a rule that validates the hire date of a candidate, you can use the hire date of the candidate as one input parameter and the date of birth of the candidate as another input parameter. You can then use the data of the two input parameters to view the data quality scores and validate the hire date of the candidate.
Input parameter mapping is mandatory for a data quality rule occurrence if you are using a multi input parameter rule from the integrated data quality system. Data quality scores are always reported on the primary data element.
Map each rule input parameter to a unique data element. Make sure that at least one rule input parameter is mapped to a primary data element.
Criticality
Criticality of the rule occurrence. The value can be High, Medium, or Low.
Target
Minimum acceptable data quality value for the asset to be considered "Good."
The target value is higher than the threshold value. For example, you can set the threshold value to 50 and the target value to 85.
Threshold
Minimum acceptable data quality value for the asset to be considered "Acceptable."
The target value is higher than the threshold value. For example, you can set the threshold value to 50 and the target value to 85.
Frequency
Frequency of running the data quality rule that is defined by the rule template.
- Select Daily to run the rule once in a day.
- Select Weekly to run the rule once in a week.
- Select Monthly to run the rule once in a month.
The rule is run when you click Create.
The following image shows the dialog box to select the primary data element:
The following example image shows a rule description and its CLAIRE® recommendation:
The following image shows the addition of rule reference for a single input parameter from an existing rule specification:
The following image shows the addition of a rule with multiple input parameters and the Show Preview icon to view the rule:
The following image shows how to map multiple input parameters to data elements:
The following image shows the input parameters mapped to the primary and secondary data elements:
5When all the details of the rule occurrence are ready, click Create.
When you click Create, the data quality application runs the rule on the data element. If you selected a schedule in the Frequency field, the data quality application runs the rule on the data element according to the rule schedule you specified in the Frequency field. When the data quality score is ready, Data Governance and Catalog displays the score in the rule occurrence.
Technical hierarchy of a rule occurrence
For an automatically generated rule occurrence, the catalog source is the highest unit and the data elements are the smallest units in the technical hierarchy.
Note: If you bulk import the assets, the rule occurrences do not have the mentioned technical hierarchy. As a result, the rule occurrences are not removed when you purge a catalog source.
When you enter a description for a data quality rule template, CLAIRE® reads the description and, using Natural Language Processing (NLP) technology, recommends a rule that it can create in the data quality application. You can review the recommended rule and decide whether the rule meets your requirement. If you agree to the recommendation, Data Governance and Catalog creates the rule in the data quality application and associates the rule with the rule template.
Consider the following guidelines when you enter a rule description so that CLAIRE® can recommend appropriate rules:
•The description can be in any language.
•The description must be within 200 characters and 30 words.
•The description must be in one sentence, and can contain letters, numbers, spaces, and special characters.
•The description can contain UTF-8 characters, spaces, and the following symbols: comma (,), hyphen (-), semi-colon (;), backslash (\), single quote ('), double quotes ("), angle brackets (< and >), equal sign (=), parentheses ({ and }), braces (( and )), square brackets ([ and ]), and period (.)
•Avoid spelling errors while creating a rule.
Note: If the business asset name for which you want to run a data quality rule contains an unsupported character, omit the character in the rule description. For example, if a business term is called "Customer_Name", do not enter the underscore in the rule description.
NLP texts to consider
You can use the following types of NLP rule descriptions. If the text you enter is insufficient or unclear, CLAIRE® might recommend that you rephrase the rule description.
Empty Value Rules
The following examples show descriptions of null value rules that CLAIRE® NLP technology can read:
•Sum may be NULL
•keep the section field empty
•Box value should be nullified
•Title should be Nothing
•Blank values are equal to null
Non-Empty Value Rules
The following examples show descriptions of rules that are not null values that CLAIRE® NLP technology can read:
•Age is always NOT NULL
•Name field cannot be blank or empty
•RFID values is not Null
•Radius must not be blank or empty
•doc number is not NULL
Comparison Rules
The following examples show descriptions of comparison rules that CLAIRE® NLP technology can read:
•Down payment number is greater or equal to thousand
•Down payment is going to either be 1000 or be above 1000
•Value cannot be less than 1,000
•Intensity is lesser than 1,000 lumes
•Bill number is larger or equal to zero
Range Rules
The following examples show descriptions of range rules that CLAIRE® NLP technology can read:
•Diameter exceeds 0 and doesn't exceed 5
•Diameter is bigger than zero and also Diameter is smaller than five
•Diameter is between one and five, but it cannot be one or five
•Diameter is at least 1 or higher, but not as high as 10
•Diameter can be equal to 1, or be between 1 and 10
Length Rules
The following examples show descriptions of length rules that CLAIRE® NLP technology can read:
•Zip code must not be longer than 6 characters
•Description should not be greater than 180 letters
•house rent should not exceed beyond 3000
•Only one sentence
•Oil Rig ID is 9 characters long
List Rules
To define a list, enter markups or delimiters, such as curly braces ({}) and semicolon (;). To facilitate naturally written sentences, you can include conjunctions in the delimiters. Enclose each item of a list within double quotes ("").
The following examples show descriptions of list rules that CLAIRE® NLP technology can read:
•The value of Index is "0", "1" or "2"
•Index shall be equivalent to "yes", "no" or "NA"
•Index will be one of these: "yes", "no", "NA"
•Index should have values within the limit of "yes", "no" and "NA"
•"Yes", "no" or "na" can only be the values that Index can hold
•The values "NA", "Nil", or "None" shouldn't be present in an email ID
•Employee ID should not have values within the limit of "0", "1", or "2"
Note: Use double quotes to indicate lists. If at least one double quote is found in the sentence, CLAIRE® NLP technology attempts to read the description as a list rule.
Date Rules
The following examples show descriptions of date value rules that CLAIRE® NLP technology can read:
•Expiration Date must be a date
•Birthday must be a date
•DOB can have values of type date only
•Joining Date should contain the value of date
•SSN is not of type date
Number Rules
The following examples show descriptions of number value rules that CLAIRE® NLP technology can read:
•Last name isn't a number
•Number of Seats must be a number
•Salary is numeric
•Phone Number is of type Numeric
NLP texts to avoid
Avoid the following types of NLP rule descriptions. If the text you enter is insufficient or unclear, CLAIRE® might recommend that you rephrase the rule description.
Double Negation Sentences
The following examples show descriptions of double negation sentences that CLAIRE® NLP technology cannot read:
•It is not defined city name to be not null
•Distance is not equal to or lesser than NOT NULL
•Date of birth cannot be not null
•It is not acceptable for Phone Number to be not NULL
Complex Sentences
The following examples show descriptions of complex sentences that CLAIRE® NLP technology cannot read:
•It’s not acceptable for date to equal NULL
•Nothing should be populated in name field
•Joining Date should not contain the value of date but contains numbers
•Product is not equal to 2 or not larger than 1
Grammatically Incorrect Sentences and Incorrect Spellings
The following examples show descriptions of grammatically incorrect sentences and incorrect spellings that CLAIRE® NLP technology cannot read:
•First name should be $NULL
•Serial cannot contain numerics
•Last_name is not a number and alphasnumeric
•Product is nt equal to or nt larger then 1
Run an automated data quality rule
When you define an automated data quality rule, Data Governance and Catalog runs the rules automatically according to the schedule you have specified in the Frequency field of the rule template or rule occurrence. You don’t need to do anything. You can open the rule occurrence to see the individual data quality scores of the date elements on which the rules have been run.
Consider the following scenarios while you run a data quality rule automation process:
1The Data Quality and Data Quality Rule Automation options are enabled in Metadata Command Center for a catalog source. The data quality rule occurrences are created automatically for the catalog source and you can manually run them on the data elements associated with the catalog source.
2The Data Quality option is enabled and Data Quality Rule Automation option is disabled in Metadata Command Center for a catalog source. Automatic rule occurrences are not created. However, you can still run the existing data quality rule occurrences on the data elements associated with the catalog source. You can manually create and run the data quality rule occurrences for the catalog source.
Automated rules for governance assets
To run automated rules on governance assets, make sure that you have done the following tasks when you define the rule:
•Select the Automation check box.
•Specify the rule schedule in the Frequency field.
When you click Create, Data Governance and Catalog runs the rule once immediately, and then runs all subsequent rules according to your specified schedule.
To run automated rules on data elements, make sure that you select a schedule in the Frequency field when you define the rule.
When you click Create, Data Governance and Catalog runs the rule once immediately, and then runs all subsequent rules according to your specified schedule.
Before you run a data quality rule automation process, ensure to enable the Data Quality and Data Quality Rule Automation options in Metadata Command Center for a catalog source. To define an automated data quality rule that you want to run on data elements, ensure that the assets are correctly associated with each other in Data Governance and Catalog.
Before you define an automated rule
Before you define an automated data quality rule, your must link the technical assets that represent the data elements to the corresponding Glossary business assets.
What is a technical asset? When you define a data quality rule, the rule is run on the data that is located in the source systems in your organization. Metadata Command Center extracts the metadata from the source systems and sends the metadata to Data Governance and Catalog. This metadata appears as technical assets in Data Governance and Catalog. Therefore, technical assets in Data Governance and Catalog are metadata representations of the data in your organization. When you open a technical asset, you can see the constituent data elements. Technical assets also display catalog metadata such as profiling and classification information for the data. For more information about technical assets, see the Asset Details help.
What is a business asset? Business assets are semantic representations of technical assets. You can enrich the metadata of technical assets by adding governance metadata to the business assets. For example, in a business asset, you can specify a meaningful name, enter a description, and denote the stakeholders of the data assets. For more information about business assets, see the Asset Details help.
How are technical and business assets related? Technical assets appear in Data Governance and Catalog because Metadata Command Center periodically extracts and sends the metadata. As a user, you create business assets in Data Governance and Catalog. The business assets that you create are independent assets. This is why you must link business assets with their corresponding technical assets. When the linking is complete, a business asset and a technical asset eventually represents the same data set that resides in your organization. You can define a data quality rule only for a business asset of the Glossary type. The rule, however, runs on the data elements that are represented by the Glossary asset or the corresponding technical asset.
The following diagram depicts the link that you must create between a technical asset and a Glossary asset:
For example, you might have five tables in your organization that contain employee names and personal information. These five tables appear as five technical assets in Data Governance and Catalog. To govern these five tables, you can create a single business asset called "Employee Record" of the Glossary Business Term type, and link the business term to the five technical assets. If you now run a mobile number check rule for the business term, the rule is run on all five tables.
For more information about linking a technical asset to a Glossary asset, see Manually associating glossary assets with technical .
Defining an automated rule
Define an automated data quality rule as a data quality rule template in Data Governance and Catalog, and associate a Glossary asset to the rule template.
What is a data quality rule template? A data quality rule template is the definition for a rule. It contains a textual description of the rule and other essential rule parameters such as dimension, threshold value, target value, and criticality. In the rule template, you must provide a reference to a rule specification in your data quality application. You can either provide a reference to an existing rule specification in your data quality application, or you can create a new rule specification in your data quality application from the rule template page in Data Governance and Catalog. When you provide this reference, a link between the rule template in Data Governance and Catalog and the rule specification in the data quality application is created. When this link is created, the rule template becomes the business representation of the data quality rule in your data quality application. To understand the fields of a rule template asset, see Data Quality Rule Template in the Asset Details help.
How is a rule template and rule specification related? When you create a data quality rule template, you must specify a Glossary asset that you want to associate with the rule template. This association specifies the data elements on which the data quality rule must run. The parameters that you define in the rule template are applicable to the data quality rule that is run on the data elements that are represented by the Glossary asset. If you do not associate the rule template with a Glossary asset, the rule template remains an independent asset, and there are no data elements on which the specified rule must run.
The following diagram depicts the association that you must create between a data quality rule template and a Glossary asset:
For example, you can create a data quality rule template called "Mobile Number Check" in Data Governance and Catalog and link the rule template to the Default/mob_phno_check rule specification in your data quality application.
For more information about create a data quality rule template, see Defining data quality rules for governance assets.
After you define an automated rule
When you create a rule template, the data quality rule is automatically run on the data elements represented by the Glossary asset, and Data Governance and Catalog displays the rule scores as rule occurrences.
What happens when you create a rule template? After you create a rule template, Data Governance and Catalog sends the data elements of the Glossary asset to the data quality application. The rule specification that you specified in the rule template is run on the data elements, and the data quality application generates a rule score for each data element. The data quality application sends the rule scores to Data Governance and Catalog, where it appears as data quality rule occurrences.
What is a data quality rule occurrence? A data quality rule occurrence is the representation of the data quality score of a single data element. While the data quality rule template is the definition of a rule, a data quality rule occurrence is the instance of the rule that is run on a single data element. To understand the fields of a rule occurrence asset, see Data Quality Rule Occurrence in the Asset Details help.
The following diagram depicts the rule occurrences that are created when a data quality rule is run on data elements:
For more information about viewing data quality scores, see View data quality scores in assets.
Putting it all together
The following diagram depicts the entire process of data quality rule automation:
1Link the technical assets that represent the data elements to the corresponding Glossary asset.
2Create a data quality rule template.
aIn the rule template, specify the Glossary asset that is linked to the technical assets on which you want to run the data quality rule.
bCreate a reference between the rule template in Data Governance and Catalog and a rule specification in your data quality application.
3When you save the data quality rule template, Data Governance and Catalog sends the data elements to the data quality application.
4The data quality application runs the rule on the data elements and generates data quality scores.
aThe data quality scores appear in Data Governance and Catalog as rule occurrences.
bEach rule occurrence corresponds to a data element on which the data quality rule was run.
Run a data quality rule manually
Apart from the automated data quality rule that you might have configured, you can manually run a data quality rule on governance assets any time.
You might want to manually run a data quality rule for a business asset in the following situations:
•When you defined the rule template, you did not select the Automation check box.
•When you defined the rule template, you selected the Automation check box and specified the schedule in the Frequency field.
•You want to run the rule at a time other than the schedule you specified in the Frequency field of the rule template. For example, if you specified a weekly schedule, you want to run the rule before one week has elapsed.
Before you run a data quality rule manually, the following prerequisites apply:
•Enable the Data Quality option in Metadata Command Center for the catalog source.
•Your organization administrator has granted you the Execute permission on data quality rule occurrence assets through access policies in Metadata Command Center.
To run a data quality rule manually, go to the Score tab of the rule occurrence page and click the Run Now option on the top right corner of the page. This triggers a job that retrieves the latest data quality score for the rule occurrence. When the job is triggered successfully, you can monitor the job in Metadata Command Center.
The following image shows the Score tab of a data quality rule occurrence: