Configuration of CLAIRE product classification service

A configuration file (config.ini) is provided. It's located in the CLAIRE accelerator package under server.ai folder. From this configuration file you are able to set parameters like authentication information, server paths and port number for the CLAIRE recommendation service under section 'DEFAULT'.

config.ini

[DEFAULT]
# Username and password for basic authentication for ai server. Please ensure that you configured Product360 CLAIRE accelerator accordingly. 
user = admin
password = admin
 
# Folders path where to store the data and models
upload_folder = c:/informatica/ai/data/product_classification
model_path = c:/informatica/ai/model/product_classification
 
# Python logging level. Possible values are: NOTSET, DEBUG, INFO, WARNING, ERROR, CRITICAL
logging_level = DEBUG
 
# Port number which the ai server uses
port_number = 5000

The parameters are mandatory and described here:

Parameter name	Description	Valid values	Default
user	Username for basic authentication against the server	<user_name>	admin
password	Password for basic authentication against the server	<user_password>	admin
upload_folder	Path to the folder where data (.csv files) used for the training process will be saved	any valid path to an existing folder	c:/informatica/ai/data
model_path	Path to the folder where trained models used for the classification process will be stored	any valid path to an existing folder	c:/informatica/ai/model
logging_level	Python logging level	CRITICAL, ERROR, WARNING, INFO, NOTSET	DEBUG
port_number	Port number used by the ai server for communication	<user_name>	5000

Product360 configuration

In the configuration file claire.properties which is located in the conf directory of the Product360 Server the connection to the CLAIRE recommendations server should be configured:

claire.properties

################################################################################
###                                                                          ###
### Claire server settings                                                   ###
### =========================                                                ###
### These settings contain all configurations needed for CLAIRE features     ###
### like auto-classification or translation.                                 ###
###                                                                          ###
################################################################################
 
################################################################################
# General Claire server settings
# 
# These settings describe the connection with the claire server
#
claire.server.url = http://localhost:5000
claire.server.user = admin
claire.server.password = admin
 
################################################################################
# Classification
# 
# Connection settings for claire server used for classification training and auto-classification
# If these values are empty, the classification feature is considered to be inactive.
#
claire.classification.server.url = ${claire.server.url}
claire.classification.server.user = ${claire.server.user}
claire.classification.server.password = ${claire.server.password}

AI Training for Auto Classification

The classification of product data is supervised learning, and the first step is to train a model based on existing data. To generate a model the CLAIRE accelerator leverages the power of Product 360´s export. To train a model, an export template has to be used in order to define the data for the training. It usually consists of the assignment to a structure group and one or more text values per product record and we ship examples to use. In addition, the export template must configure the post processing step "Classification training". This post processing step will send the created file to the CLAIRE recommendation service which will eventually train a model based on the data.

images/download/attachments/463605025/Architecture.png

Supported content languages for model generation

Machine learning models allowing to support auto classification use cases can be sensitive to the exact language being used for their training. Every individual model needs to be trained on the same content language for all records used. We have extensively tested English and German during development and received very good results with it.

Besides these two here is the overall list of languages which we expect to work well with the accelerator:

Dutch
English
Finnish
French
German
Italian
Norwegian
Portuguese
Spanish
Swedish

Note that any content language not listed above is likely to not work with the accelerator. Also note that we have not done full test cycles with each of the languages listed above.

Standard export templates for AI training

With the CLAIRE accelerator package 3 pre-built export templates are provided under Resources/Export Templates and can be used as is or as template for your own training exports.

AI Classification Training Items.ext
AI Classification Training Products.ext
AI Classification Training Variants.ext

Training data

Please ensure that there is enough training data to create your machine learning model. A training based on a few thousand records only might not bring best results. Also, the labels should be meaningful and more than just a short description like "Blue T-Shirt". Generally, the prediction results are heavily dependent on the quantity and quality of the data you feed the training with.

Create your own export templates for AI classification training

It is possible to create custom export templates for AI training, e.g., if the data you want to train on is in custom fields. Some preconditions to keep in mind are:

It must be a csv file
The purpose of the export template must be "AI Training"
There must be a column having the header title "category". This column needs to contain the structure group assignment within the training structure.
All label fields (can be multiple) must have the column header "label"
The export template must have the post processing step "Classification training" attached

Create AI classification training export profiles

After creating an export template or importing the examples you can create AI classification trainings in the context selection view. Although the "AI Classification Trainings" are basically exports we separated them from the "regular" export profiles for a better user experience.

images/download/attachments/463605025/image2022-6-10_14-2-15.png

Also, in the process overview the AI trainings are separated from the "regular" export job executions for better process traceability

As we are using the export functionality of Product 360 you have the full power of configuring and scheduling export profiles to manage your AI trainings.

Please note that we currently don´t have any delta training. That means you have to retrain the model including exporting all of your data for every training.

Start AI classification training

To start an AI classification training, follow the steps bellow:

Right click on the export profile
Select Start training...
The export parameter configuration dialog will open up. At this stage you set the export parameters for executing the training.
Click on Finish and wait until the training has been successfully completed.

Export variables

Variable name	Description
Training ML approach	The ML approach used for the training process. Possible values are "Deep learning" or "Natural language processing". See section Natural language processing and deep learning explained for details.
Training duration (only deep learning)	The training duration (in hours) for deep learning that should be used to train the model. By default, it's set to 10 hours which should give reasonable results according to our tests. In case of natural language processing trainings this parameter won't be taken into consideration and the generation of a model will be a lot faster.
Training language	The language of text fields exported to train model.
Training structure	The structure the training should be based on.

Display available models

There is an entry in the "Management" menu of the Desktop UI called "Show CLAIRE models" where you can display all trained classification models, the algorithm they are based on, and their accuracy based on macro F1 (see chapter Best Practices and Recommendations under section Model accuracy measurement for details).

images/download/attachments/463605025/image2022-7-4_16-23-43.png

Natural language processing and deep learning explained

Deep learning:

Convolutional neural network, specifically tailored for text classification tasks. Such models are based on decades of research into both AI and biology and are the technology of choice for many modern applications, including autonomous driving and speech recognition. Such networks consist of a large number of computational units that vaguely resemble some properties of neurons in human brain. Ordinarily, anyone who trains such models needs to know a lot of low-level details, to get the best performance. For this accelerator however everything is handled automatically "under the hood" for your convenience. Tests have shown that we need around 10h of training to get a deep learning model of good accuracy.

Natural language processing:

This approach produces a model rooted in classic techniques such as bag-of-words and "traditional" classifiers, such as a support vector machine or a random forest. These models are simpler, compared to deep learning, but train much faster (in our tests it took less than an hour). It might seem that such techniques are getting completely obsoleted by deep learning, but the reality is that they're still useful, although usually tend to produce models of lower accuracy.

Batch Classification

In order to provide an auto classification in a batch process setup (for example after the import of data) the accelerator comes with the following capabilities:

images/download/attachments/463605025/BatchArchicture.png

After installation of the accelerator there is a new data quality category named "CLAIRE" which looks like this:

images/download/attachments/463605025/image2020-11-2_14-5-50.png

How to use the auto classification rule configuration

It is highly recommended to clone one of the existing auto classification rule configurations and then modify the input ports according to your needs. The "default" rule configuration is just a template and should not be used as is. Even if you delete this rule configuration, category or group, it will come up on next server start again.

After the data quality rule configuration has been cloned and the parameter adjusted, it can be used for triggers or direct execution just as every other data quality rule configuration. The parameters are mandatory and described here:

Parameter	Description
Source Field	The value which will be taken as input for the prediction.
Threshold	Only recommendations with a confidence equal or above this value (in %) will be considered.
Structure identifier	Only recommendations for this structure will be considered.
Mapping mode	Set "MOVE" to replace existing mapping(s) or "ADD" to add a mapping.
ML approach	Set "DL" for using the deep learning ML approach or "NLP" for using natural language processing. See section Natural language processing and deep learning explained for details.
Retain existing assignments	If true, objects which already have an assignment to a group of the defined structure system will be skipped.

It is possible to optionally configure a direct link within the data quality status detail tab of the Web UI which allows a user not only to spot a possible failed batch execution on item level but also to open a flex UI for fixing the classification problem right away. For that it is required to edit DQNavigationDefinition.xml which can be found in the webdefinitions folder of the Product 360 server. The following lines have to be added to it to link an erroneous status with a flex UI:

DQNavigationDefinition.xml

 <!-- AutoClassification via CLAIRE -->
 <ruleNavigation ruleName="Auto Classification Item"  flexTemplateName= "Classification with Claire" rootEntity="Article" />

Note that the ruleName property needs to be set to the actual rule name and the flexTemplateName to the name of the flex UI template you want to associate with the rule itself (e.g., a version of the Claire UI as seen below)

For details on the expected performance of batch executions for auto classification please revise the chapter Best Practices and Recommendations under the section Recommendations for best results.

Configuration of CLAIRE Flex UI

images/download/attachments/463605025/FlexUIArchitecture.png

The CLAIRE panel can be integrated into any flex UI with a component for classification. Below you see an example of how to configure a CLAIRE panel into your flex UI definition.

<group identifier="Claire info">
	<layoutData>
		<parameter key="colSpan" value="1" />
		<parameter key="rowSpan" value="7" />
	</layoutData>
	<component identifier="claire full" type="claire" i18NKey="Claire" >
		<layoutData>
			<parameter key="collapsible" value="true"/>
			<parameter key="collapsed" value="false"/>
		</layoutData>
		<parameter key="context" value="classification"/>
		<parameter key="sourceField" value="ArticleLang.DescriptionLong(en)"/>
		<parameter key="algorithm" value="deeplearning"/>
		<parameter key="threshold" value="80"/>
		<parameter key="selectionThreshold" value="80"/>
	</component>
</group>

Parameter name	Description	Valid values	Default
context	The context or use case the CLAIRE panel will be used for. Only recommendations for this use case will be shown. If empty or not set all recommendations will be shown.	classification	<empty>
sourceField	The source field which will be used for prediction. The field has to be fully qualified and formatted according to the REST API syntax.	ArticleLang.DescriptionLong(de)	ArticleLang.DescriptionLong(en)
algorithm	The algorithm used for classifications predictions which will be shown. If not configured all panels for classification will be shown. See section Natural language processing and deep learning explained for details.	deeplearning, nlp	<all>
threshold	All predictions of CLAIRE below or equal to this threshold will not be shown in the flex UI component	0-100	80
selectionThreshold	All predictions of CLAIRE above or equal to this threshold will be automatically selected in the CLAIRE flex UI component checkbox	0-100	80

images/download/attachments/463605025/image2020-11-19_14-52-17.png