Catalog Source Configuration > Databricks > Before you begin
  

Before you begin

Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:

Verify permissions

To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.

Permissions to extract metadata

Ensure that you have the required permissions to enable metadata extraction.

Permissions to run data profiles

Ensure that you have the required permissions to run profiles.
Grant the following permissions:

Permissions to perform data classification

You don't need any additional permissions to run data classification.

Permissions to perform relationship discovery

You don't need any additional permissions to run relationship discovery.

Permissions to perform glossary association

You don't need any additional permissions to run glossary association.

Create a connection

Before you configure the Databricks catalog source, create a connection object in Administrator.
    1In Administrator, select Connections.
    2Click New Connection.
    3Enter the following connection details:
    Property
    Description
    Connection Name
    Name of the Databricks connection. Must be unique within the organization.
    Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -
    Maximum length is 255 characters.
    Description
    Optional description of the connection.
    Maximum length is 4000 characters.
    Type
    Type of connection. Ensure that the type is Databricks.
    4Enter properties specific to the Databricks connection:
    Property
    Description
    Use Secret Vault
    Stores sensitive credentials for this connection in the secrets manager that is configured for your organization.
    This property appears only if secrets manager is set up for your organization.
    This property is not supported by Data Ingestion and Replication.
    When you enable the secret vault in the connection, you can select which credentials that the Secure Agent retrieves from the secrets manager. If you don't enable this option, the credentials are stored in the repository or on a local Secure Agent, depending on how your organization is configured.
    Runtime Environment
    The name of the runtime environment where you want to run tasks.
    Select a Secure agent, Hosted Agent, or serverless runtime environment.
    Hosted Agent is not applicable for mappings in advanced mode.
    You cannot run an application ingestion and replication, database ingestion and replication, or streaming ingestion and replication task on a Hosted Agent or serverless runtime environment.
    SQL Warehouse JDBC URL
    Databricks SQL Warehouse JDBC connection URL. Required to connect to a Databricks SQL warehouse. Also applies to Databricks clusters.
    Note: Databricks SQL Serverless is the recommended Databricks cluster type.
    To get the SQL Warehouse JDBC URL, go to the Databricks console and select the JDBC driver version from the JDBC URL menu.
    For JDBC URL version 2.6.22 or earlier, use the following syntax:
    jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;
    For JDBC URL version 2.6.25 or later, use the following syntax:
    jdbc:databricks://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;Application ingestion and replication and database ingestion and replication tasks can use JDBC URL version 2.6.25 or later or 2.6.22 or earlier. The URLs must begin with the prefix jdbc:databricks://, as follows: jdbc:databricks://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;
    Ensure that you set the required environment variables in the Secure Agent. Also specify the correct JDBC Driver Class Name under advanced connection settings.
    Note: Specify the database name in the Database Name connection property. If you specify the database name in the JDBC URL, it is not considered. The Databricks Host, Organization ID, and Cluster ID properties are not considered if you configure the SQL warehouse JDBC URL property.
    Databricks Token
    Personal access token to access Databricks.
    Required for SQL warehouse and Databricks cluster.
    Ensure that you have permissions to attach to the cluster identified in the Cluster ID property.
    For mappings, you must have additional permissions to create Databricks clusters.
    Catalog Name
    If you use Unity Catalog, the name of an existing catalog in the metastore.
    Optional for SQL warehouse. Doesn't apply to Databricks cluster.
    You can also specify the catalog name in the end of the SQL warehouse JDBC URL.
    Note: The catalog name cannot contain special characters.
    For more information about Unity Catalog, see the Databricks documentation.
    Property
    Description
    JDBC Driver Class Name
    The name of the JDBC driver class. Optional for SQL warehouse and Databricks cluster.
    For JDBC URL versions 2.6.22 or earlier, specify the driver class name as com.simba.spark.jdbc.Driver.
    For JDBC URL versions 2.6.25 or later, specify the driver class name as com.databricks.client.jdbc.Driver.
    Staging Environment
    The cloud provider where the Databricks cluster is deployed.
    Required for SQL warehouse and Databricks cluster.
    Select one of the following options:
    • - AWS
    • - Azure
    • - Personal Staging Location
    Default is Personal Staging Location.
    You can select the Personal Staging Location as the staging environment instead of Azure or AWS staging environments to stage data locally for mappings and tasks.
    If you select Personal Staging Location for a connection that Data Ingestion and Replication uses, the Parquet data files for application ingestion and replication jobs or database ingestion and replication jobs can be staged to a local personal storage location, which has a data retention period of 7 days. You must also specify a Database Host value. If you use Unity Catalog, note that a personal storage location is automatically provisioned.
    Personal staging location doesn't apply to Databricks cluster.
    You cannot use personal staging location with Databricks unmanaged tables.
    You cannot use personal staging location when you configure mappings in advanced mode.
    Note: You cannot switch between clusters once you establish a connection.
    Databricks Host
    The host name of the endpoint the Databricks account belongs to.
    Required for Databricks cluster. Doesn't apply to SQL warehouse.
    You can get the Databicks Host from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks all-purpose cluster.
    The following example shows the Databicks Host in JDBC URL: jdbc:spark:// <Databricks Host> :443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/<Org Id>/<Cluster ID>; AuthMech=3; UID=token; PWD=<personal-access-token> The value of PWD in Databricks Host, Organization Id, and Cluster ID is always <personal-access-token>.
    Cluster ID
    The ID of the cluster.
    Required for Databricks cluster. Doesn't apply to SQL warehouse.
    You can get the cluster ID from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks all-purpose cluster.
    The following example shows the Cluster ID in JDBC URL:
    jdbc:spark://<Databricks Host>:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/<Org Id>/ <Cluster ID>; AuthMech=3;UID=token; PWD=<personal-access-token>
    Organization ID
    The unique organization ID for the workspace in Databricks.
    Required for Databricks cluster. Doesn't apply to SQL warehouse.
    You can get the Organization ID from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks all-purpose cluster.
    The following example shows the Organization ID in JDBC URL:
    jdbc:spark://<Databricks Host>:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/ <Organization ID> / <Cluster ID>;AuthMech=3;UID=token; PWD=<personal-access-token>
    5Click Test Connection.

Import a relationship inference model

Import a relationship inference model if you want to configure the relationship discovery capability. You can either import a predefined relationship inference model, or import a model file from your local machine.
    1In Metadata Command Center, click Explore on the navigation panel.
    2Expand the menu and select Relationship Inference Model. The following image shows the Explore page with the Relationship Inference Model menu:The image shows the Explore page with the Relationship Inference Model menu and the Import Predefined Content options.
    3Select one of the following options:
    The imported models appear in the list of relationship inference models on the Relationship Discovery tab.