Operation | Support |
---|---|
Read | Yes |
Write | Yes |
Property | Description |
---|---|
Connection Name | Name of the connection. Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -, Maximum length is 255 characters. |
Databricks Host | The host name of the endpoint the Databricks account belongs to. Use the following syntax: jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token> Note: You can get the URL from the Advanced Options of JDBC or ODBC in the Databricks Delta analytics cluster or all purpose cluster. The value of PWD in Databricks Host, Org Id, and Cluster ID is always <personal-access-token>. |
Cluster ID | The ID of the Databricks analytics cluster. You can get the cluster ID from the JDBC URL. Use the following syntax: jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token> |
Organization Id | The unique organization ID for the workspace in Databricks. Use the following syntax: jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token> |
Databricks Token | Personal access token to access Databricks. Ensure that you have permissions to attach to the cluster identified in the Cluster ID property. |
SQL Endpoint JDBC URL | Databricks SQL endpoint JDBC connection URL. Use the following syntax: jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>; This field is required to connect to the Databricks SQL endpoint. Ensure that you set the required environment variables in the Secure Agent. Note: The Databricks Host, Organization ID, and Cluster ID properties are not considered if you configure the SQL Endpoint JDBC URL property. For more information on Databricks Delta SQL endpoint, contact Informatica Global Customer Support. |
Database | The database in Databricks Delta that you want to connect to. By default, all databases available in the workspace are listed. |
JDBC Driver Class Name | The name of the JDBC driver class. Specify the driver class name as com.simba.spark.jdbc.Driver. |
Cluster Environment | The cloud provider where the Databricks cluster is deployed. Choose from the following options:
Default is AWS. The connection attributes depend on the cluster environment you select. For more information, see the AWS cluster properties and Azure cluster properties sections. |
Min Workers | The minimum number of worker nodes to be used for the Spark job. Minimum value is 1. |
Max Workers | The maximum number of worker nodes to be used for the Spark job. If you don't want to autoscale, set Max Workers = Min Workers or don't set Max Workers. |
DB Runtime Version | The Databricks runtime version. Select 7.3 LTS from the list. |
Worker Node Type | The worker node instance type that is used to run the Spark job. For example, the worker node type for AWS can be i3.2xlarge. The worker node type for Azure can be Standard_DS3_v2. |
Driver Node Type | The driver node instance type that is used to collect data from the Spark workers. For example, the driver node type for AWS can be i3.2xlarge. The driver node type for Azure can be Standard_DS3_v2. If you don't specify the driver node type, Databricks uses the value you specify in the worker node type field. |
Instance Pool ID | The instance pool ID used for the Spark cluster. If you specify the Instance Pool ID, the following connection properties are ignored:
|
Enable Elastic Disk | Enables the cluster to get additional disk space. Enable this option if the Spark workers are running low on disk space. |
Spark Configuration | The Spark configuration to use in the Databricks cluster. The configuration must be in the following format: "key1"="value1";"key2"="value2";.... For example: "spark.executor.userClassPathFirst"="False" |
Spark Environment Variables | The environment variables to export before launching the Spark driver and workers. The variables must be in the following format: "key1"="value1";"key2"="value2";.... For example: "MY_ENVIRONMENT_VARIABLE"="true" |
Property | Description |
---|---|
Database Name | Overrides the database name provided in connection and the database name provided during metadata import. |
Table Name | Overrides the table name used in the metadata import with the table name that you specify. |
Pre SQL | The pre-SQL command to run on the Databricks Delta source table before the agent reads the data. For example, if you want to update records in the database before you read the records from the table, specify a pre-SQL statement. The query must include a fully qualified table name. You can specify multiple pre-SQL commands, each separated with a semicolon. |
Post SQL | The post-SQL command to run on the Databricks Delta table after the agent completes the read operation. For example, if you want to delete some records after the latest records are loaded, specify a post-SQL statement. The query must include a fully qualified table name. You can specify multiple post-SQL commands, each separated with a semicolon. |
Staging Location | Relative directory path to store the staging files.
Note: If staging location is not specified for Unity Catalog, the mapping fails. |
Job Timeout | Maximum time in seconds that is taken by the Spark job to complete processing. If the job is not completed within the time specified, the Databricks cluster terminates the job and the mapping fails. If the job timeout is not specified, the mapping shows success or failure based on the job completion. |
Job Status Poll Interval | Poll interval in seconds at which the Secure Agent checks the status of the job completion. Default is 30 seconds. |
DB REST API Timeout | The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns 5xx HTTP error code. Default is 10 minutes. |
DB REST API Retry Interval | The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns 5xx HTTP error code. This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API. Default is 30 seconds. |
Advanced Property | Description |
---|---|
Target Database Name | Overrides the database name provided in the connection and the database selected in the metadata browser for existing targets. |
Target Table Name | Overrides the table name at runtime for existing targets. |
Pre SQL | The pre-SQL command to run before the agent writes to Databricks Delta. For example, if you want to assign sequence object to a primary key field of the target table before you write data to the table, specify a pre-SQL statement. You can specify multiple pre-SQL commands, each separated with a semicolon. |
Post SQL | The post-SQL command to run after the agent completes the write operation. For example, if you want to alter the table created by using create target option and assign constraints to the table before you write data to the table, specify a post-SQL statement. You can specify multiple post-SQL commands, each separated with a semicolon. |
Write Disposition | Overwrites or adds data to the existing data in a table. You can select from the following options:
|
Staging Location | Relative directory path to store the staging files.
Note: Mandatory if you want to create a new target at runtime when the source query is set for Unity Catalog. |
Job Timeout | Maximum time in seconds that is taken by the Spark job to complete processing. If the job is not completed within the time specified, the Databricks cluster terminates the job and the mapping fails. If the job timeout is not specified, the mapping shows success or failure based on the job completion. |
Job Status Poll Interval | Poll interval in seconds at which the Secure Agent checks the status of the job completion. Default is 30 seconds. |
DB REST API Timeout | The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns 5xx HTTP error code. Default is 10 minutes. |
DB REST API Retry Interval | The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns 5xx HTTP error code. This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API. Default is 30 seconds. |
Update Mode | Defines how rows are updated in the target tables. Select from the following options:
|