DQ Preprocessors

DQ preprocessors are part of a data quality configuration and can be used to increase the initially selected data set processed by the data quality engine.

Introduction

Sometimes data quality rules need additional PIM data in order to successfully run a DQ check, so a static list of entity items or an item set defined by an entity report query needs to be enriched with more objects before DQ execution.

For instance, take the DQ rule which checks that a GTIN of an item is unique among the entire GDSN packaging hierarchy. In order to do this, all other items of the hierarchy need to be provided to the DQ engine as well. As this logic needs to be put on top of the DQ input data defined e.g. by an entity report, you can implement this in a DQ preprocessor which can then be added to your DQ configuration.

How to contribute a custom preprocessor

In order to contribute a custom preprocessor implementation you need to perform the following steps:

  1. Build the business logic for obtaining the additional entity items in a class which implements interface com.heiler.ppm.dataquality.core.preprocessor.DataQualityPreProcessor. The only method hands over an EntityItemList, which is the inital input data to DQ. Based on this input data, additional entity items can be determined, e.g. by calling another entity report which returns all items in a given item list's hierarchy.

    Be aware of the following:

    - The resulting EntityItemList must have the same root entity as the preprocessor has been contributed for.

    - The preprocessor might be called several times during one DQ execution run, as the inital data set is processed in packages of a 1000 objects, so special care must be taken with respect to performance.

  2. Add a new contribution to extension point com.heiler.ppm.dataquality.core.dqPreProcessors with the following elements:

    1. identifier: unique identifier of the contribution

    2. class: Class implementing interface com.heiler.ppm.dataquality.core.preprocessor.DataQualityPreProcessor, containing the actual logic for obtaining the additional entity item set provided to DQ

    3. entity: Identifier of the root entity this preprocessor implementation obtains objects for

    4. name: Optional name, to be displayed in UI

How to include a preprocessor in the DQ configuration

In order to configure a preprocessor to always be executed before a DQ run, select it from the combo box at the bottom of the Data quality configuration view:

images/download/attachments/69405572/image2015-6-5_17_9_22.png

Note: The displayed preprocessors depend on the selected input data type of the configuration, so if the Item data type is selected, only the preprocessors for Entity "Item" are selectable.