High Availability Configuration
To configure high availability for the streaming mapping, configure a state store directory for the source and guaranteed processing of the messages streamed by the source. Also configure the Spark execution parameters to enable the mapping to run without failing.
To configure high availability, perform the following configurations:
- State store configuration
- Configure a state store directory. Spark uses the state store directory to store the checkpoint information at regular intervals during the execution of the mapping. If a failure occurs, Spark restarts processing by reading from this state store directory.
- Execution parameters
- To ensure that the mapping runs without failing, configure the maximum number of tries to submit the mapping to Spark for processing. Configure the spark.yarn.maxAppAttempts and yarn.resourcemanager.am.max-attempts execution parameters when you configure the mapping properties. The values that you specify for both parameters must be equal and less than the values configured on the CDH or HortonWorks configuration.