Use CLAIRE Tuning to tune a mapping task that runs on an advanced cluster.
CLAIRE, Informatica's AI engine, runs the mapping task several times and uses machine learning to assess the performance of each run. It uses the information to create a tuning recommendation for the set of Spark properties that optimizes task performance. CLAIRE Tuning considers parameters such as the complexity of the mapping, the size of the data, and the processing capacity on the advanced cluster.
You can run initial tuning or enable continuous tuning. When you run initial tuning, you can view the tuning recommendation to see a list of recommended Spark properties and their values. You can apply the recommendation to use the values in the mapping task. When you enable continuous tuning, CLAIRE silently monitors the mapping task and adjusts the Spark properties over time.
Continuous tuning is more effective if you run initial tuning first. During initial tuning, CLAIRE gets an optimized set of Spark properties that it can use as a baseline to make additional adjustments during continuous tuning.
If you run initial tuning on a mapping task that incrementally loads files, tuning runs on all of the source files. The recommended properties and values might not be optimal for future jobs that load and process only modified files.
Guidelines to get an accurate recommendation
Use the following guidelines to get an accurate recommendation during the tuning job:
•Use sample data that closely matches the actual volume of the data that the mapping task will process.
•Make sure that the mapping logic handles duplicate data in the target. The tuning job will write data to the target multiple times.
•Set resource limits on your cloud environment by configuring the appropriate Spark properties before you tune the mapping task. Your cloud service provider charges you for the resources that each run uses.
For example, if you know that you can allocate only 4 GB to the Spark driver, you can configure spark.driver.memory=4G in the mapping task. CLAIRE will honor the pre-defined Spark property to create a tuning recommendation for other Spark properties.
Configuring tuning
Configure CLAIRE Tuning in the mapping task details.
The following image shows where you can configure tuning in the mapping task details:
Initial tuning
Run initial tuning to get a tuning recommendation with a list of recommended Spark properties and their values.
To configure initial tuning, set the number of times that CLAIRE runs the mapping task, with 10 being the minimum. Click Tune to begin tuning. When tuning begins, Data Integration creates a tuning job with multiple subtasks to represent each run of the mapping task. You must wait for all subtasks to complete before you can view the tuning results.
Each time that CLAIRE runs the mapping task, CLAIRE gathers task performance data to improve its recommendation for an optimal set of Spark properties.
Initial tuning results
When initial tuning is complete, you can view the tuning recommendation and the performance improvement. The improvement is measured in the amount of time that it takes for the mapping task to run using the recommended set of Spark properties.
The following image shows the tuning results for a particular mapping task:
You can apply the recommendation to use the Spark property values in the mapping task. You can also revert the Spark properties to their original values and apply the recommendation again.
Guidelines to apply a tuning recommendation
Use the following guidelines when you apply a tuning recommendation to make sure that job performance is optimal:
•Use the full set of Spark properties to achieve the performance improvement. Using a partial set of the recommended Spark properties might not be optimal.
•Do not edit the Spark properties in the mapping task in between the time that you begin tuning and the time that you apply the tuning recommendation. If you make significant changes to the Spark properties, tune the mapping task again.
Continuous tuning
Enable continuous tuning to silently monitor every run of the mapping task and adjust the Spark properties over time.
For example, you design a mapping task in your development environment and run initial tuning. When you migrate the mapping task to your production environment, you expect production loads to vary day-by-day. Continuous tuning analyzes the varying parameters to adjust the Spark properties.
During continuous tuning, CLAIRE analyzes all runs of the mapping task. The adjusted Spark properties override the Spark property values that are set in the mapping task. You can view the values of the adjusted Spark properties in the Spark driver and agent job logs.
Note: When you copy or import a mapping task with continuous tuning enabled, continuous tuning restarts from the Spark properties that are set in the mapping task.