Data Integration job log files

Data Integration generates log files to help you monitor running, failed, and completed jobs. You can access some of the log files from the All Jobs, Running Jobs, and My Jobs pages, and from the job details.

Data Integration generates the following types of log files:

Error rows file

Data Integration generates error rows files for synchronization task and masking task instances. An error rows file shows the rows that failed and the reason why each row failed. The error rows file includes the first 50 fields of a source error row.

For example, the following error appears in the error rows file when the task tries to insert two records with the same external ID into a Salesforce target:

Error loading into target [HouseholdProduct__c] : Error received from salesforce.com. Fields [ExternalId__c]. Status code [DUPLICATE_VALUE]. Message [Duplicate external id specified: 1.0].

Session log file

Data Integration generates a session log file for each job. This log gives you a high level view of the time spent for different operations.

The session log provides mapping compilation time, translation time, simplification time, optimization time, total time to create the LDTM, Spark task submission time, Spark task [InfaSpark0] execution start and end time, and Total time to perform the LDTM operation.

If a job fails, analyze the session log file first to help you troubleshoot the job.

Reject file

Data Integration creates a reject file for each flat file and Oracle target in a mapping or mapping task that contains error rows. The reject file contains information about each rejected target row and the reason that the row was rejected. Data Integration saves the reject file to the following default folder:

$PMBadFileDir/<task federated ID>

Execution plan

Data Integration generates an execution plan that shows the Scala code that an advanced cluster uses to run the data logic in a mapping in advanced mode. You can use the Scala code to debug issues in the mapping.

Agent job log

Data Integration generates an agent job log that shows the logic that the Secure Agent uses to push the Spark execution workflow for a mapping in advanced mode to an advanced cluster for processing.

The agent job log contains information such as metering, time the application was submitted to the cluster, and the time the application completed. This log can help you troubleshoot delays in running the Spark task in the Session log, and you can see when the Spark task was processed on the Secure Agent.

Spark driver and Spark executor logs

An advanced cluster generates Spark driver and Spark executor logs to show the logic that the cluster uses to run a job. Use these logs to identify issues or errors with the Spark process. This log also contains information about the different executors being created and the tasks that are starting or have been completed.

Initialization script log

If an initialization script runs on an advanced cluster, the cluster generates an init script log to show the script output.

Cloud-init log

If an initialization script runs on the advanced cluster, the cluster generates a cloud-init log that contains information about how cluster nodes were initialized and bootstrapped. You can use the cloud-init log to check if any init scripts failed to run.

Note: You can view the cloud-init log only in an AWS environment.

Spark event log

An advanced cluster generates a Spark event log to stream runtime events for tasks that run on the cluster.

The Spark event log records different events in a JSON-encoded format while the application is running. This log contains the events associated with the Spark application, such as the different jobs in the application, different stages, individual tasks, and interaction between entities.

The Spark event log also contains events related to the software infrastructure like driver information, executor creation, memory usage by executors, environment configuration, and the logical and physical plans of the Spark application. Use this log to trace what happened during every step of the Spark application run.

To find the Spark event log, open the Spark driver log and search for SingleEventLogFileWriter. The result of the search shows the path of the Spark event log. For example:

23/01/09 04:38:35 INFO SingleEventLogFileWriter - Logging events to s3://bucket/log_location_in_cluster_condifuration/eventLogs/atscaleagent/spark-a7bea557ede14382b4807d35b5404b97.inprogress

When the application completes, download the Spark event log from the location s3://bucket/log_location_in_cluster_condifuration/eventLogs/atscaleagent/ as file spark-a7bea557ede14382b4807d35b5404b97.

To interpret the Spark event log, import it into a Spark history server and examine the log using the History server monitor. Check the following tabs:

- The Jobs tab shows all the detailed metrics.
- The Stages tab lists all the completed stages. You can see detailed information on the total number of tasks succeeded or failed, input and output data volume, and shuffle read and shuffle write data volume. Click on any stage to view the DAG visualization diagram.
- The Environments tab shows the Spark-related parameters used to run the Spark job.
- The Executors tab shows detailed information about the executor Pods and driver Pod.

For more information, refer to the Apache Spark documentation.

Advanced logs

The Advanced Log Location contains the Spark executor logs apart from the Spark driver and Agent job logs. The executor logs can help you troubleshoot issues with individual executors.