The job failed but there are many logs I can view. Where do I start?
Troubleshoot the job by examining the logs in the following order:
1Execution plan. Debug the Scala code for the job.
2Session log. Debug the logic that compiles the job and generates the Spark execution workflow.
3Agent job log. Debug how the Secure Agent pushes the Spark execution workflow to the advanced cluster for processing.
4Spark driver and executor logs. Debug how the advanced cluster runs the job.
You can download the execution plan, session log, agent job log, and Spark driver log in Monitor.
To find the Spark executor log, copy the advanced log location for a specific Spark task that failed. Then, navigate to the log location on your cloud platform and download the log.
I can't find all of the log files for the job that failed. I've tried to download the logs from both Monitor and the log location on my cloud platform.
The logs that are available for the job depend on the step where the job failed during processing.
For example, if the job fails before the job is pushed to the advanced cluster, the Spark driver and executor logs are not generated in the log location, and Monitor cannot query the logs from the cloud platform either.
You can recover some of the log files, but you might have to use other types of logs to troubleshoot the job.
I can't find the Spark driver and Spark executor logs. Can I recover them?
If you can't download the Spark driver log from the user interface, you can recover the log using the Spark driver Pod. You cannot recover Spark executor logs.
When the Secure Agent pushes a job to an advanced cluster, the Secure Agent creates one Spark driver Pod and multiple Spark executor Pods to run the Spark tasks. You can use the Spark driver Pod to recover the Spark driver log, but you cannot recover the Spark executor logs. The Spark driver Pod deletes the Spark executor Pods immediately after a job succeeds or fails.
Note: When a job succeeds or fails, the Spark driver Pod is deleted after 5 minutes by default. If you need to increase the limit to assist troubleshooting, contact Informatica Global Customer Support.
To recover the Spark driver log, perform the following tasks:
1Find the name of the Spark driver Pod in the agent job log. For example, see the name of the Spark driver Pod in the following message:
2019/04/09 11:10:15.511 : INFO :Spark driver pod [spark-passthroughparquetmapping-veryvery-longlongname-1234567789-infaspark02843891945120475434-driver] was successfully submitted to the cluster.
If you cannot download the agent job log in Monitor, the log is available in the following directory on the Secure Agent machine:
The file name of the agent job log uses the format AgentLog-<Spark job ID>.log. You can find the Spark job ID in the session log. For example, the Spark job ID is 0c2c5f47-5f0b-43af-a867-da011452c19dInfaSpark0 in the following message of the session log:
2019-05-09T03:07:52.129+00:00 <LdtmWorkflowTask-pool-1-thread-9> INFO: Registered job to status checker with Id 0c2c5f47-5f0b-43af-a867-da011452c19dInfaSpark0
2Confirm that the Spark driver Pod exists. If the driver Pod was deleted, you cannot retrieve the Spark driver log.
To confirm that the driver Pod exists, navigate to the following directory on the Secure Agent machine:
3Find the cluster instance ID in one of the following ways:
- Locate the cluster instance ID in the session log. For example, you might see the following message:
2019/05/07 16:22:00.20 : INFO :[SPARK_2005] Uploading the local file in the path [/export/home/builds/ws/yxiao_hadoopvm_ML/Mercury/platformdiscale/main/components/cluster/hadoop-tests/cats/edtm/spark/./target/hadoop3a0b1db6-76ea-4317-8272-5b3a8dfd2171_InfaSpark0/log4j_infa_spark.properties] to the following shared storage location: [s3a://soki-k8s-local-state-store/k8s-infa/testcluster2.k8s.local/staging/sess4280021555102778947/log4j_infa_spark.properties].
Note the following cloud storage location that you see in the message: