Configure rules that are affected by Dictionary Synchronization in order to increase performance of IDQ rule execution

Situation/ motivation

  • For each IDQ rule, PIM creates executor objects and stores them in a pool, because creation is expensive. Executors are used to execute IDQ rules in parallel. These executors are created lazily on demand. Also IDQ SDK internally synchronizes executor creation, which means, creation is queued.

  • Changes to dictionaries aka reference tables are only picked up by the rule if the executor is disposed and re-created

  • Currently, on the PIM side, it is not known which reference tables are used by which IDQ rule. Hence, whenever, a dictionary is synchronized, all executors are disposed, and created lazily on demand.

  • Because of that the performance drastically degrades whenever a dictionary is synchronized AND many new executors are requested for IDQ rule execution.

Solution

  • It is now possible, to configure, which IDQ rules are affected by which dictionary (aka Reference Table). Therefor the DictionaryRuleConfiguration.xml is used, which is stored in the PIM server's folder /configuration/HPM/dataquality, and looks like this:

DictionaryRuleConfiguration.xml
<?xml version="1.0" encoding="UTF-8"?>
<dictionaryToRuleConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="DictionaryRuleConfiguration.xsd">
<isActive>true</isActive>
<dictionary>
<dictionaryIdentifier>Informatica_DQ_Content\Dictionaries\General\profanity_infa.dic</dictionaryIdentifier>
<affectedRules>
<ruleIdentifier>Informatica_PIM_Content/PreDefined_Rules/Check_Profanity</ruleIdentifier>
<ruleIdentifier>Informatica_PIM_Content/Custom_Rules/MyCustomRule1</ruleIdentifier>
</affectedRules>
</dictionary>
</dictionaryToRuleConfiguration>
  • This configuration is only active, if this XML file is placed in the mentioned folder AND if <isActive> is set to true. Otherwise, the behavior is the same as before, means with every dictionary synchronization ALL rule executor objects are disposed and created lazily on demand.

  • The <dictionary> element defines, which rules (defined by the <ruleIdentifier>s within the <affectedRule> element) are affected by a specific dictionary (specified by the <dictionaryIdentifier>). You can define as many <dictionary> elements as you wish.

  • The <dictionaryIdentifier> has to be copied from the dictionary view (part of the Dictionary perspective) in the PIM Desktop client.

  • The <ruleIdentifier> is composed of the folders and subfolders of the IDQ rule and the name of the IDQ rule itself. This information can be found through the Data quality perspective in the PIM Desktop Client:

  • When a dictionary, which is configured here, is synchronized, ONLY the executor objects of the configured affected rules are destroyed and recreated lazily on demand - all other rule executor objects are not hit.

  • When a dictionary, which is NOT configured here, is synchronized, the behavior is the same as before, means ALL rule executor objects are disposed and created lazily on demand.

Be aware, that as soon as a specific dictionary is configured in the DictionaryRuleConfiguration.xml file, synchronization of that dictionary NEVER leads to a destruction of not affected rule executor objects. That means that if those "not affected" rules executor objects might anyhow have dependencies to the configured dictionary, they might work on a very old state of this dictionary, which might lead to an inconsistent behavior when executing IDQ rules.

The DictionaryRuleConfiguration.xml configuration is only loaded once during server startup. Means, after changing that file, the PIM server has to be restarted, in order to activate the changes.

If this something goes wrong, during loading of this configuration file (e.g. the corresponding DictionaryRuleConfiguration.xsd is not existing in the /configuration/HPM/dataquality folder, or there is an syntax error in the DictionaryRuleConfiguration.xml file), an ERROR is logged in the server's log file. But still the server starts up and the behavior is the same as before, means with every dictionary synchronization ALL rule executor objects are disposed and created lazily on demand.