Status Cache Initialization

Overview

The Status Cache is reponsible for providing the execution result for every Data Quality execution for each corresponding entity object.
Instead of loading each result from the database, all results are stored in-memory.

The initialization of the Status Cache takes place during server startup because vital functionality depends on it (e.g. the Data Quality Dashboard).

Schema Changes

The StatusRevision table has a new column called StatusEntryJSON which contains a binary compresssed JSON object which holds all StatusEntry records.
The application server checks for the existence of the that column, and in case it does not exist it will create and populate it during server startup.

Initialization

The StatusCache initialization supports multiple strategies for populating the cache:

  • Initialization from Database

  • Initialization over Network

  • Initialization from Local Storage

Both Network and Local Storage strategies are optional and can be disabled, however they offer superior performance.

Database

The initialization from database supports parallel processing. By default, parallel processing is enabled if the database contains more than 200.000 status objects.
The default parallel degree is determined upon the db.available.cpu setting in the server.properties file.

Database-related properties
# Status cache initialization will utilize multiple threads once the threshold has been reached.
# Default: 200000
com.heiler.ppm.status.server/statusCache.parallelTreshold = 200000
 
# The maximum amount of concurrent threads to query the database.
# Default: empty (using "number of DB CPU cores" configured in server.properties)
com.heiler.ppm.status.server/statusCache.parallelDegree =

Network

The initialization over network supports parallel processing and is enabled by default.
Since there is only a single socket connection between servers, the idea behind the parallel degree is that one thread is compressing the cache elements while another thread is sending the data to the other server.

Network-related properties
# Allows the status cache initialization over network.
# Default: true
com.heiler.ppm.status.server/statusCache.networkEnabled = true
 
# The amount of status cache elements contained in every network request.
# Default: 500000
com.heiler.ppm.status.server/statusCache.networkBatchSize = 500000
 
# The maximum amount of parallel threads to transfer the status cache over the network.
# While there can be only one active socket at a time, the other threads will be preparing their payload.
# In case there are less CPU cores available, all CPU cores except one will be utilized.
# Default: 4
com.heiler.ppm.status.server/statusCache.networkParallelDegree = 4

Local Storage

The initialization from local storage supports parallel processing and is enabled by default. It is by far the fastest initialization strategy.
During shutdown, the application server will save its current StatusCache instance like a snapshot to the local storage, using a compressed LZF binary format.

The application server will use all available CPU cores except for one during the shutdown process, because saving the snapshot is a CPU bound process due to the compression algorithm.
Likewise, the application server will write one data file per available CPU core.

Then during startup, the server reads the timestamp of the snapshot and performs a clean refresh of all changed status objects from the database, using the snapshot timestamp.
Since the local storage has a hard I/O limit, more reader threads do not neccessarily mean better performance - actually it's quite the opposite.
This is why the default is only uses four CPU cores and not all available CPU cores.

Local Storage-related properties
# Allows the status cache initialization from local storage.
# Default: true
com.heiler.ppm.status.server/statusCache.localStorageEnabled = true
 
# The local file path of the status cache snapshot
# Default: empty (using workspace)
com.heiler.ppm.status.server/statusCache.localStoragePath =
 
# The maximum amount of parallel threads to read the status cache from the local storage.
# This process is I/O bound, i.e. increasing parallel degree could lead to worse performance.
# In case there are less CPU cores available, all CPU cores except one will be utilized.
# Default: 4
com.heiler.ppm.status.server/statusCache.localStorageParallelDegree = 4