Databricks Integration
The Data Integration Service can push mappings to the Databricks environment. Databricks is an analytics cloud platform that you can use with Microsoft Azure cloud services or Amazon Web Services. Databricks incorporates the open-source Apache Spark cluster technologies and capabilities.
Informatica automatically installs the binaries required to integrate the Informatica domain with the Databricks environment.
The integration requires Informatica connection objects and cluster configurations. A cluster configuration is a domain object that contains configuration parameters that you import from the Databricks cluster. You then associate the cluster configuration with connections to access the Databricks environment.
Run-time Process on the Databricks Spark Engine
When you run a job on the Databricks Spark engine, the Data Integration Service pushes the processing to the Databricks cluster, and the Databricks Spark engine runs the job.
The following image shows the components of the Informatica and the Databricks environments:

- 1. The Logical Data Transformation Manager translates the mapping into a Scala program, packages it as an application, and sends it to the Databricks Engine Executor on the Data Integration Service machine.
- 2. The Databricks Engine Executor submits the application through REST API to the Databricks cluster, requests to run the application, and stages files for access during run time.
- 3. The Databricks cluster passes the request to the Databricks Spark driver on the driver node.
- 4. The Databricks Spark driver distributes the job to one or more Databricks Spark executors that reside on worker nodes.
- 5. The executors run the job and stage run-time data to the Databricks File System (DBFS) of the workspace.