Parallel Export

How can exports be executed in parallel? What impact does it have on your export template?

Introduction

It is possible to export data in parallel in order to fully utilize the resources of your system and to heavily speed up the export process in general. In this chapter we will take a short look at how data export is executed, how the user interface has been adjusted and how to enable parallel export. We also take a look at what kind of impact these options have on your export templates and on the logging of the export itself.

Parallel Export Execution

Depending on the configured package size, during an export the objects are grouped into one or more packages (see Configuration guide). For each package the required data is retrieved from the database first and tokens inside the export template will be replaced next. This is regardless of sequential or parallel execution. In case of a parallel execution there are two thread pools, one for data loading and one for data processing that will process the packages in parallel. While the Data Load thread pool uses a fixed number of threads which depend on the number of available cores in your system, the export thread pool can be configured.

images/download/attachments/215123886/Multi-ThreadedExport.png

In order for an export to be run in parallel, the data provider has to support it by implementing the SupportsParallelExport interface and there have to be enough data objects to build more than one package. In case of a parallel execution, each package will get its own thread for export if the configured thread pool allows it. Increasing the total amount of threads inside the thread pool, also increases the number of packages which will be executed in parallel which in turn also increase the amount of memory that is required to export the requested data. As soon as one package has finished, a temporary export file will be written. Those files will be concatenated to create the final export file at the end of the export process. Depending on your configuration of the export template, one or more resulting export files will be created.

Memory consumption

In a sequential export one data set is executed at a time. For easier understanding let's assume this data set is an item.

The data fields used in the main module and all its sub-modules are loaded in memory.

In a parallel export, not one but multiple items are processed at the same time, depending on the configuration of your system. That means the memory consumption is higher because all the data is in memory at the same time.

images/download/attachments/372377691/Parallel_Export_-_Memory_Consumption.png

Configuration

For parallel export execution, two configuration settings are important. They can be found in the plugin_customization.ini file.

The first one is the package size, export_maxDataPackageSize, which has been available for a long time. It specifies how large the data packages that are processed may be.

# The size of each data package to be exported by data providers which support splitted export
# Default value: 25000
# com.heiler.ppm.texttemplate.core/export_maxDataPackageSize=25000
# com.heiler.ppm.texttemplate.core/export_structureMaxDataPackageSize=10000

The second configuration setting is new, maxExportProcessorThreads. It specifies how many export packages can be processed simultaneously for export.

# The count of threads processing export data
# This number should be set depending on the size of the system, see sizing guide
# Default is 4
# com.heiler.ppm.texttemplate.core/maxExportProcessorThreads=4

Depending on the available memory, the values can be adapted to the respective system.

User Interface

When creating an export template, the user can decide if a main module of that template should be executed in parallel or not. All sub-modules will inherit that setting. It is not possible to execute a sub-module in parallel when its main module is not doing so either. By default, parallel export of a module is enabled. To change this option for a main module, select the module which you would like to adjust for parallel execution and select or de-select the checkbox as shown below. Keep in mind that you may need to reconsider the use of some export functions when enabling this functionality as they may change the end result of the export. For more information about this topic, take a look at the Export functions chapter.
All existing data providers support the parallel export.

images/download/attachments/372377691/image2019-10-28_11-5-22.png images/download/attachments/372377691/image2019-10-28_11-6-26.png

Logging

To gather more details about the current export status especially in case of an parallel export, additional progress information can be enabled for logging of an export. To enable this feature, navigate to the "Log" section of your export template and select the associated checkbox as shown below. Enabling this feature will log additional information about each loaded package during the export. On default, this option is disabled.

images/download/attachments/372377691/image2019-10-28_11-17-26.png images/download/attachments/372377691/Logging.PNG

As you can see above, as soon as this feature is enabled the process view contains information about how many packages have been processed, in what sequence they have been worked on and how many items have been included in each package. This feature may be useful when errors in your export template require further attention.

Performance Analysis

When selecting "Performance analysis" of an executed export inside the process overview perspective, you can find additional thread specific logging about parallel processing of data packages of each module. Selecting a module with enabled parallel processing shows all threads that were running for this module and gives detailed information about each export step taken during processing.

images/download/attachments/372377691/image2019-10-28_11-40-6.png

This information can be useful when comparing a parallel export to a sequential one or to make future time estimations about similar exports in the future.

Template Migration

When opening an old export template, it will be migrated automatically to the new version. However, parallel processing of each module will be disabled.

That means all existing export jobs will run in non-parallel mode and will work as before. If you want an export to run in parallel mode you have to actively adjust the export template and re-schedule the corresponding export job.

What is migrated - and what is not?

Minor changes have been made to the data model for export templates, so older templates will automatically be migrated when loaded.

However, due to the now possible parallel data processing during export, the behavior of some export functions has changed. All templates that use these functions must be checked and adjusted if necessary.

The affected functions are:

  • ValueGet/ValueSet

  • NumberSet/NumberGet/NumberIncrement

  • LoopCounter/DatasetCounter

You can test if an export template works in parallel mode and writes the same output file(s) as before by executing the same export in non-parallel and parallel mode with identical data. You should make sure that you have data for at least four parallel packages.

For further details please refer to chapters Compatibility and Export functions.