INFACore SDK for Python > Databricks integration with INFACore using Databricks Connect > Before you begin
  

Before you begin

Before you install the INFACore Python SDK in the development environment, make sure you meet the following prerequisites:

Set up the Databricks cluster

Set up the Databricks cluster for use with Databricks Connect. Databricks Connect runs your jobs remotely on a Databricks cluster using Spark APIs. The Databricks cluster must be using Databricks Runtime version 5.1 or later.
  1. 1In the compute configuration of the Databricks cluster, go to the advanced options.
  2. 2Edit the Spark configuration section and enter the following code snippet:
  3. spark.databricks.service.server.enabled true
  4. 3Enter the following code snippet, based on whether you want to use AWS Databricks or Azure Databricks:
  5. 4Restart the cluster.

Install Databricks Connect in your development environment

Install the Databricks Connect library in your development environment.
  1. 1Create a development environment. Ensure that your Python environment is compatible with the cluster version.
  2. For example, if you are using Anaconda, run the following code snippet to create a Databricks environment that is compatible with cluster version 3.x:
    conda create --name databricks python=3.8.10
  3. 2Activate the development environment.
  4. 3Install the Databricks Connect library.
  5. pip install -U databricks-connect==<databricks_connect_version>*
  6. The Databricks Connect version number needs to be compatible with the Databricks cluster version you are working with.

Configure Databricks Connect

Configure Databricks Connect and test the set up in your development environment before you can use it with INFACore.
  1. 1From the command prompt, run the following command:
  2. databricks-connect configure
  3. 2When prompted, provide the required information:
  4. Databricks Host: <Databricks hostname>
    *Databricks token: <Authentication token>
    *Cluster ID* <Databricks cluster ID>
    *Org ID: <Databricks ecosystem organization ID>
    *Port: <Port number configured in the cluster Spark configuration>
    Set new configuration values (leave input empty to accept default):
  5. 3Test the Databricks Connect setup by running the following command:
  6. databricks-connect test