Property | Description |
---|---|
Connection Name | Name of the connection. Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -, Maximum length is 255 characters. |
Description | Description of the connection. Maximum length is 4000 characters. |
Type | Databricks |
Use Secret Vault | Stores sensitive credentials for this connection in the secrets manager that is configured for your organization. This property appears only if secrets manager is set up for your organization. This property is not supported by Data Ingestion and Replication. When you enable the secret vault in the connection, you can select which credentials that the Secure Agent retrieves from the secrets manager. If you don't enable this option, the credentials are stored in the repository or on a local Secure Agent, depending on how your organization is configured. For information about how to configure and use a secrets manager, see Secrets manager configuration. |
Runtime Environment | The name of the runtime environment where you want to run tasks. Select a Secure agent, Hosted Agent, or serverless runtime environment. Hosted Agent is not applicable for mappings in advanced mode. You cannot run an application ingestion and replication, database ingestion and replication, or streaming ingestion and replication task on a Hosted Agent or serverless runtime environment. |
SQL Warehouse JDBC URL | Databricks SQL Warehouse JDBC connection URL. This property is required only for Databricks SQL warehouse. Doesn't apply to all-purpose cluster and job cluster. To get the SQL Warehouse JDBC URL, go to the Databricks console and select the JDBC driver version from the JDBC URL menu. For JDBC driver version 2.6.25 or later (recommended), use the following syntax: jdbc:databricks://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>; For JDBC driver version 2.6.22 or earlier, use the following syntax: jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>; Important: Effective in the October 2024 release, Simba JDBC driver versions 2.6.22 and earlier entered deprecation. While you can use the Simba driver in the current release, Informatica intends to drop support for Simba JDBC driver versions 2.6.22 and earlier in the April 2025 release. Informatica recommends that you use the Databricks JDBC driver version 2.6.38. For more information on how to use the Databricks JDBC driver for Databricks Connector, see Databricks JDBC driver Knowledge Base article. Application ingestion and replication and database ingestion and replication tasks can use JDBC URL version 2.6.25 or later or 2.6.22 or earlier. The URLs must begin with the prefix jdbc:databricks://, as follows: jdbc:databricks://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>; Ensure that you set the required environment variables in the Secure Agent. Also specify the correct JDBC Driver Class Name under advanced connection settings. Note: Specify the database name in the Database Name connection property. If you specify the database name in the JDBC URL, it is not considered. |
Property | Description |
---|---|
Databricks Token | Personal access token to access Databricks. This property is required for SQL warehouse, all-purpose cluster, and job cluster. |
Catalog Name | The name of an existing catalog in the metastore when you use Unity Catalog. This property is optional for SQL warehouse. Doesn't apply to all-purpose cluster and job cluster. The catalog name cannot contain special characters. For more information about Unity Catalog, see the Databricks documentation. |
Property | Description |
---|---|
Client ID | The client ID of the service principal. |
Client Secret | The client secret associated with the Client ID of the service principal. |
Catalog Name | The name of an existing catalog in the metastore when you use Unity Catalog. This property is optional for SQL warehouse. Doesn't apply to all-purpose cluster and job cluster. The catalog name cannot contain special characters. For more information about Unity Catalog, see the Databricks documentation. |
Property | Description |
---|---|
Database | The name of the schema in Databricks. The name can contain only alphanumeric characters and hyphen (-). This property is optional for SQL warehouse, all-purpose cluster, and job cluster. For Data Integration, if you do not specify a value, all databases available in the workspace are listed. The value you specify overrides the schema specified in the SQL Warehouse JDBC URL connection property. If you do not specify a value, all databases available in the workspace are listed. The value you specify overrides the schema specified in the SQL Warehouse JDBC URL connection property. |
JDBC Driver Class Name | The name of the JDBC driver class. This property is optional for SQL warehouse, all-purpose cluster, and job cluster. Default is com.databricks.client.jdbc.Driver |
Staging Environment | The staging environment where your data is temporarily stored before processing This property is required for SQL warehouse, all-purpose cluster, and job cluster. Select one of the following options as the staging environment:
Personal staging location doesn't apply to all-purpose cluster and job cluster. Personal staging location doesn't apply to mappings in advanced mode. Important: Effective in the October 2024 release, personal staging location entered deprecation. While you can use the functionality in the current release, Informatica intends to drop support for the functionality in a future release. Informatica recommends that you use a Volume to stage the data. Volume doesn't apply to all-purpose cluster and job cluster. You can use a volume only on a Linux machine and with JDBC driver versions 2.6.25 or later. Volume doesn't apply to mappings in advanced mode. Default is Volume. If you select Personal Staging Location for a connection that Data Ingestion and Replication uses, the Parquet data files for application ingestion and replication jobs or database ingestion and replication jobs can be staged to a local personal storage location, which has a data retention period of 7 days. You must also specify a Database Host value. If you use Unity Catalog, note that a personal storage location is automatically provisioned. You cannot use personal staging location with Databricks unmanaged tables. Note: You cannot switch between clusters after you establish a connection. |
Volume Path | The absolute path to the files within a volume in Databricks. Specify the path in the following format: /Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path> |
Databricks Host | The host name of the endpoint the Databricks account belongs to. This property is required only for all-purpose cluster and job cluster. Doesn't apply to SQL warehouse. You can get the Databricks Host from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks all-purpose cluster. The following example shows the Databricks Host in JDBC URL: jdbc:spark://<Databricks Host>:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/<Org Id>/<Cluster ID>; AuthMech=3; UID=token; PWD=<personal-access-token> The value of PWD in Databricks Host, Organization Id, and Cluster ID is always <personal-access-token>. |
Cluster ID | The ID of the cluster. This property is required only for all-purpose cluster and job cluster. Doesn't apply to SQL warehouse. You can get the cluster ID from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks all-purpose cluster The following example shows the Cluster ID in JDBC URL: jdbc:spark://<Databricks Host>:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/<Org Id>/<Cluster ID>; AuthMech=3;UID=token; PWD=<personal-access-token> |
Organization ID | The unique organization ID for the workspace in Databricks. This property is required only for all-purpose cluster and job cluster. Doesn't apply to SQL warehouse. You can get the Organization ID from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks all-purpose cluster The following example shows the Organization ID in JDBC URL: jdbc:spark://<Databricks Host>:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/<Organization ID>/ <Cluster ID>;AuthMech=3;UID=token; PWD=<personal-access-token> |
Min Workers1 | The minimum number of worker nodes to be used for the Spark job. Minimum value is 1. This property is required only for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. |
Max Workers1 | The maximum number of worker nodes to be used for the Spark job. If you don't want to autoscale, set Max Workers = Min Workers or don't set Max Workers. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. |
DB Runtime Version1 | The version of job cluster to spawn when you connect to job cluster to process mappings. This property is required only for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. Select the Databricks runtime version 9.1 LTS or 13.3 LTS. |
Worker Node Type1 | The worker node instance type that is used to run the Spark job. This property is required only for all-purpose cluster and job cluster. Doesn't apply to SQL warehouse. For example, the worker node type for AWS can be i3.2xlarge. The worker node type for Azure can be Standard_DS3_v2. |
Driver Node Type1 | The driver node instance type that is used to collect data from the Spark workers. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. For example, the driver node type for AWS can be i3.2xlarge. The driver node type for Azure can be Standard_DS3_v2. If you don't specify the driver node type, Databricks uses the value you specify in the worker node type field. |
Instance Pool ID1 | The instance pool ID used for the Spark cluster. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. If you specify the Instance Pool ID to run mappings, the following connection properties are ignored:
|
Elastic Disk1 | Enables the cluster to get additional disk space. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. Enable this option if the Spark workers are running low on disk space. |
Spark Configuration1 | The Spark configuration to use in the job cluster. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. The configuration must be in the following format: "key1"="value1";"key2"="value2";... For example, "spark.executor.userClassPathFirst"="False" Doesn't apply to Data Ingestion and Replication tasks. |
Spark Environment Variables1 | The environment variables to export before launching the Spark driver and workers. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. The variables must be in the following format: "key1"="value1";"key2"="value2";... For example, "MY_ENVIRONMENT_VARIABLE"="true" Doesn't apply to Data Ingestion and Replication tasks. |
1Doesn't apply to mappings in advanced mode. |
Property | Description |
---|---|
S3 Authentication Mode | The authentication mode to connect to Amazon S3. Select one of the following authentication modes:
This authentication mode applies only to SQL warehouse. |
S3 Access Key | The key to access the Amazon S3 bucket. |
S3 Secret Key | The secret key to access the Amazon S3 bucket. |
S3 Data Bucket | The existing S3 bucket to store the Databricks data. |
S3 Staging Bucket | The existing bucket to store the staging files. |
S3 VPC Endpoint Type1 | The type of Amazon Virtual Private Cloud endpoint for Amazon S3. You can use a VPC endpoint to enable private communication with Amazon S3. Select one of the following options:
|
Endpoint DNS Name for S31 | The DNS name for the Amazon S3 interface endpoint. Replace the asterisk symbol with the bucket keyword in the DNS name. Enter the DNS name in the following format: bucket.<DNS name of the interface endpoint> For example, bucket.vpce-s3.us-west-2.vpce.amazonaws.com |
IAM Role ARN1 | The Amazon Resource Number (ARN) of the IAM role assumed by the user to use the dynamically generated temporary security credentials. Set the value of this property if you want to use the temporary security credentials to access the Amazon S3 staging bucket. For more information about how to get the ARN of the IAM role, see the AWS documentation. |
Use EC2 Role to Assume Role1 | Optional. Select the check box to enable the EC2 role to assume another IAM role specified in the IAM Role ARN option. The EC2 role must have a policy attached with a permission to assume an IAM role from the same or different AWS account. |
STS VPC Endpoint Type1 | The type of Amazon Virtual Private Cloud endpoint for AWS Security Token Service. You can use a VPC endpoint to enable private communication with Amazon Security Token Service. Select one of the following options:
|
Endpoint DNS Name for AWS STS1 | The DNS name for the AWS STS interface endpoint. For example, vpce-01f22cc14558c241f-s8039x4c.sts.us-west-2.vpce.amazonaws.com |
S3 Service Regional Endpoint | The S3 regional endpoint when the S3 data bucket and the S3 staging bucket need to be accessed through a region-specific S3 regional endpoint. This property is optional for SQL warehouse. Doesn't apply to all-purpose cluster and job cluster. Default is s3.amazonaws.com. |
S3 Region Name1 | The AWS cluster region in which the bucket you want to access resides. Select a cluster region if you choose to provide a custom JDBC URL that does not contain a cluster region name in the JDBC URL connection property. |
Zone ID1 | The zone ID for the Databricks job cluster. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. Specify the Zone ID only if you want to create a Databricks job cluster in a particular zone at runtime. For example, us-west-2a. Note: The zone must be in the same region where your Databricks account resides. |
EBS Volume Type1 | The type of EBS volumes launched with the cluster. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. |
EBS Volume Count1 | The number of EBS volumes launched for each instance. You can choose up to 10 volumes. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. Note: In a Databricks connection, specify at least one EBS volume for node types with no instance store. Otherwise, cluster creation fails. |
EBS Volume Size1 | The size of a single EBS volume in GiB launched for an instance. This property is optional for job cluster. Doesn't apply to SQL warehouse and all-purpose cluster. |
1Doesn't apply to mappings in advanced mode. |
Property | Description |
---|---|
ADLS Storage Account Name | The name of the Microsoft Azure Data Lake Storage account. |
ADLS Client ID | The ID of your application to complete the OAuth Authentication in the Active Directory. |
ADLS Client Secret | The client secret key to complete the OAuth Authentication in the Active Directory. |
ADLS Tenant ID | The ID of the Microsoft Azure Data Lake Storage directory that you use to write data. |
ADLS Endpoint | The OAuth 2.0 token endpoint from where authentication based on the client ID and client secret is completed. |
ADLS Filesystem Name | The name of an existing file system to store the Databricks data. |
ADLS Staging Filesystem Name | The name of an existing file system to store the staging data. |