Create a Databricks Cluster Configuration
A Databricks cluster configuration is an object in the domain that contains configuration information about the Databricks cluster. The cluster configuration enables the Data Integration Service to push mapping logic to the Databricks environment.
Use the Administrator tool to import configuration properties from the Databricks cluster to create a cluster configuration. You can import configuration properties from the cluster or from a file that contains cluster properties. You can choose to create a Databricks connection when you perform the import.
Note: Ensure that you integrate a Databricks cluster with only one Informatica domain.
Importing a Databricks Cluster Configuration from the Cluster
When you import the cluster configuration directly from the cluster, you provide information to connect to the cluster.
Before you import the cluster configuration, get cluster information from the Databricks administrator.
- 1. From the Connections tab, click the ClusterConfigurations node in the Domain Navigator.
- 2. From the Actions menu, select New > Cluster Configuration.
The Cluster Configuration wizard opens.
- 3. Configure the following properties:
Property | Description |
|---|
Cluster configuration name | Name of the cluster configuration. |
Description | Optional description of the cluster configuration. |
Distribution type | The distribution type. Choose Databricks. |
Method to import the cluster configuration | Choose Import from cluster. |
Databricks domain | Domain name of the Databricks deployment. |
Databricks access token | The token ID created within Databricks required for authentication. Note: If the token has an expiration date, verify that you get a new token from the Databricks administrator before it expires. |
Databricks cluster ID | The cluster ID of the Databricks cluster. |
Create connection | Choose to create a Databricks connection. If you choose to create a connection, the Cluster Configuration wizard associates the cluster configuration with the Databricks connection. If you do not choose to create a connection, you must manually create one and associate the cluster configuration with it. |
- 4. Click Next to verify the information on the summary page.
Importing a Databricks Cluster Configuration from a File
You can import properties from an archive file to create a cluster configuration.
Complete the following tasks to import a Databricks cluster from a file:
- 1. Get required cluster properties from the Databricks administrator.
- 2. Create an .xml file with the cluster properties, and compress it into a .zip or .tar file.
- 3. Log in to the Administrator tool and import the file.
Create the Import File
To import the cluster configuration from a file, you must create an archive file.
To create the .xml file for import, you must get required information from the Databricks administrator. You can provide any name for the file and store it locally.
The following table describes the properties required to import the cluster information:
Property Name | Description |
|---|
cluster_name | Name of the Databricks cluster. |
cluster_ID | The cluster ID of the Databricks cluster. |
baseURL | URL to access the Databricks cluster. |
accesstoken | The token ID created within Databricks required for authentication. |
Optionally, you can include other properties specific to the Databricks environment.
When you complete the .xml file, compress it into a .zip or .tar file for import.
Sample Import File
The following text shows a sample import file with the required properties:
<?xml version="1.0" encoding="UTF-8"?><configuration>
<property>
<name>cluster_name</name>
<value>my_cluster</value>
</property>
<property>
<name>cluster_id</name>
<value>0926-294544-bckt123</value>
</property>
<property>
<name>baseURL</name>
<value>https://provide.adatabricks.net/</value>
</property>
<property>
<name>accesstoken</name>
<value>dapicf76c2d4567c6sldn654fe875936e778</value>
</property>
</configuration>
Import the Cluster Configuration
After you create the .xml file with the cluster properties, use the Administrator tool to import into the domain and create the cluster configuration.
- 1. From the Connections tab, click the ClusterConfigurations node in the Domain Navigator.
- 2. From the Actions menu, select New > Cluster Configuration.
The Cluster Configuration wizard opens.
- 3. Configure the following properties:
Property | Description |
|---|
Cluster configuration name | Name of the cluster configuration. |
Description | Optional description of the cluster configuration. |
Distribution type | The distribution type. Choose Databricks. |
Method to import the cluster configuration | Choose Import from file. |
Upload configuration archive file | The full path and file name of the file. Click the Browse button to navigate to the file. |
Create connection | Choose to create a Databricks connection. If you choose to create a connection, the Cluster Configuration wizard associates the cluster configuration with the Databricks connection. If you do not choose to create a connection, you must manually create one and associate the cluster configuration with it. |
- 4. Click Next to verify the information on the summary page.