Databricks Connector > Mappings for Databricks > Targets for Databricks
  

Targets for Databricks

Add a Target transformation to write data to a target.
When you add a Target transformation to a mapping, you define the target connection, target objects, and target properties related to the Databricks connection type.
The following table lists the target properties that are supported by SQL warehouse, all-purpose cluster, and job cluster:
Property
SQL warehouse1
All-purpose cluster2
Job cluster3
Source Type - Single Object
Yes
Yes
Yes
Source Type - Parameter
Yes
Yes
Yes
Object - Existing
Yes
Yes
Yes
Object - Create New at Runtime
Yes
Yes
Yes
Create target - Object Name, Table Location, Database Name
Yes
Yes
Yes
Create target - Table Properties
Yes
No
No
Operation - Insert, Update, Upsert, Delete
Yes
No
Yes
Operation - Data Driven
Yes
No
No
Target Database Name
Yes
No
Yes
Target Table Name
Yes
No
Yes
Update Override Query
Yes
No
No
Write Disposition - Append, Truncate
Yes
No
Yes
Write Disposition - Truncate Always
Yes
No
No
Update Mode - Update as update, Update else insert
Yes
No
Yes
Staging Location
Yes
No
Yes
Pre SQL
Yes
No
No
Post SQL
Yes
No
No
DTM Staging File Size
Yes
No
No
Job Timeout
No
No
Yes
Job Status Poll Interval
No
No
Yes
DB REST API Timeout
No
No
Yes
DB REST API Retry Interval
No
No
Yes
Forward Rejected Rows
Yes
No
Yes
1The Secure Agent connects to the SQL warehouse at design time and runtime.
2The Secure Agent connects to the all-purpose cluster to import the metadata at design time.
3The Secure Agent connects to the job cluster to run the mappings.

Target properties for Databricks

In a mapping, you can configure a Target transformation to represent a Databricks object.
The following table describes the Databricks properties that you can configure in a Target transformation:
Property
Description
Connection
Name of the target connection. Select a target connection or click New Parameter to define a new parameter for the target connection.
Target Type
Target type. Select one of the following types:
  • - Single Object.
  • - Parameter. Select Parameter to define the target type when you configure the task.
Object
Name of the target object.
Create Target
Creates a target.
Enter a name for the target object and select the source fields that you want to use. By default, all source fields are used. You can select an existing target object or create a new target object at runtime.
You cannot parameterize the target at runtime.
Operation
Defines the type of operation to be performed on the target table.
Select from the following list of operations:
  • - Insert (Default)
  • - Update
  • - Upsert
  • - Delete
  • - Data Driven1
When you use an upsert operation, you must configure the Update Mode in target details as Update else Insert.
If the key column gets null value from the source, the following actions take place for different operations:
  • - Update. Skips the operation and does not update the row.
  • - Delete. Skips the operation and does not delete the row.
  • - Upsert. Inserts a new row instead of updating the existing row.
Update Columns
The fields to use as temporary primary key columns when you update, upsert, or delete data on the Databricks Delta target tables. When you select more than one update column, the mapping task uses the AND operator with the update columns to identify matching rows.
Applies to update, upsert, delete and data driven operations.
Data Driven Condition1
Flags rows for an insert, update, delete, or reject operation based on the expressions that you define.
For example, the following IIF statement flags a row for reject if the ID field is null. Otherwise, it flags the row for update:
IIF (ISNULL(ID), DD_REJECT, DD_UPDATE )
Required if you select the data driven operation.
1Applies only to mappings in advanced mode.
The following table describes the Databricks advanced properties that you can configure in a Target transformation:
Advanced Property
Description
Target Database Name1
Overrides the database name provided in the connection and the database selected in the metadata browser for existing targets.
Note: You cannot override the database name when you create a new target at runtime.
Target Table Name1
Overrides the table name at runtime for existing targets.
Update Override Query
Overrides the default update query that the agent generates for the update operation specified in this field.
Use the merge command for the update operation.
Write Disposition
Overwrites or adds data to the existing data in a table. You can select from the following options:
  • - Append. Appends data to the existing data in the table even if the table is empty.
  • - Truncate. Overwrites the existing data in the table. Only applies to Insert operation and non-empty sources.
  • - Truncate Always. Overwrites the existing data in the table. Applies to insert, update, upsert, and delete target operations for empty and non-empty sources.
Note: You cannot perform Truncate Always when you don't specify the database name in the connection properties and while creating a new target at runtime.
Update Mode1
Defines how rows are updated in the target tables. Select from the following options:
  • - Update As Update: Rows matching the selected update columns are updated in the target.
  • - Update Else Insert: Rows matching the selected update columns are updated in the target. Rows that don't match are appended to the target.
Staging Location
Relative directory path to store the staging files.
  • - If the Databricks is hosted on the AWS platform, use the path relative to the Amazon S3 staging bucket.
  • - If the Databricks is hosted on the Azure platform, use the path relative to the Azure Data Lake Store Gen2 staging filesystem name.
Note: When you use the unity catalog, a pre-existing location on user's cloud storage must be provided in the Staging Location.
Pre SQL
The pre-SQL command to run before the agent writes to Databricks.
For example, if you want to assign sequence object to a primary key field of the target table before you write data to the table, specify a pre-SQL statement.
You can specify multiple pre-SQL commands, each separated with a semicolon.
Post SQL
The post-SQL command to run after the agent completes the write operation.
For example, if you want to alter the table created by using create target option and assign constraints to the table before you write data to the table, specify a post-SQL statement.
You can specify multiple post-SQL commands, each separated with a semicolon.
DTM Staging File Size
The size of the flat file that Data Integration creates locally in a temporary folder to stage the data before writing to Databrick.
Default is 50 MB.
Job Timeout1
Maximum time in seconds that is taken by the Spark job to complete processing.
If the job is not completed within the time specified, the job terminates and the mapping fails.
If the job timeout is not specified, the mapping shows success or failure based on the job completion.
Job Status Poll Interval1
Poll interval in seconds at which the Secure Agent checks the status of the job completion.
Default is 30 seconds.
DB REST API Timeout1
The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns 5xx HTTP error code.
Default is 10 minutes.
DB REST API Retry Interval1
The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns 5xx HTTP error code.
This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API.
Default is 30 seconds.
Forward Rejected Rows
Determines whether the transformation passes rejected rows to the next transformation or drops rejected rows. By default, the agent forwards rejected rows to the next transformation.
1Doesn't apply to mappings in advanced mode.

Create a target table at runtime

You can use an existing target or create a target to hold the results of a mapping. If you choose to create the target, the agent creates the target if it does not exist already when you run the task.
You can create both managed and external tables for mappings and mappings in advance mode.
To specify the target properties, perform the following tasks:
    1Select the Target transformation in the mapping.
    2To specify the target, click the Target tab.
    3Select the target connection.
    4For the target type, choose Single Object or Parameter.
    5 Specify the target object or parameter.
    6To specify a target object, perform the following tasks:
    1. aClick Select and choose a target object. You can select an existing target object or create a new target object at runtime. This image displays the page where you can create a new target at runtime.
    2. bTo create a target object at runtime, select Create New at Runtime.
    3. cIn the Object Name field, enter the name of the target table that you want to create. Specify the table name in lowercase.
    4. dIn the Table Location field, enter the location of the target table data.
    5. The table location is relative to the data bucket or data filesystem name specified in the connection. External table is created if you specify the table location.
      For unity catalog, specify a pre-existing location to create a new target at runtime.
      When you use personal staging location, specify a pre-existing table location with an absolute path to the directory to create a new target at runtime. The following image shows the table location for personal staging location. PSL unity catalog with table location
    6. eIn the Database Name field, specify the Databricks database name.
    7. The database name that you specify in the connection properties takes precedence.
      Note: If you do not specify the Database Name when you create a new target or in the Database connection attribute, the default database is used to create a new target irrespective of the database name specified in the SQL Warehouse JDBC URL.
    8. fSpecify the Table Properties to optimize the table configuration settings. You can use table properties to tag tables with information that are not tracked by SQL queries. To see the list of table properties and table options, see the Databricks documentation.
    9. gClick OK.

Rules and guidelines for create target at runtime

When you configure a mapping with the Create New at Runtime option, consider the following rules:

Override the update operation

You can specify an update override to override the update query that the Secure Agent generates for the update operation.
When you configure an update override, the Secure Agent uses the query that you specify, stages the data in files, and then loads that data into a temporary table using the merge command. The data from the temporary table is then loaded to the Databricks target table. The syntax that you specify for the update query must be supported by Databricks.
Specify the update override query in the following format:
MERGE INTO <Target table name> AS A USING :TU AS B ON A.<Column1> = B.<Column1> WHEN MATCHED THEN UPDATE SET A.<Column2> = B.<Column2>, A.<Column3>= B.<Column3> ... A.<ColumnN>= B.<ColumnN>
where, :TU represents the incoming data source for the target transformation. The Secure Agent replaces :TU with a temporary table name while running the mapping and does not validate the update query.
Note: While specifying the name of the target table, you must provide a fully qualified name in the format: <Catalog_Name>.<Database_Name>.<Target_TableName>.
When you configure an update override in a mapping to write to Databricks, consider the following rules and guidelines:

Passthrough partitioning

You can configure partitioning to optimize the mapping or performance at run time when you write data to Databricks targets.
The partition type controls how the agent distributes data among partitions at partition points. You can define the partition type as passthrough partitioning. The agent distributes rows of target data based on the number of threads that you define for the partition.

Determine the order of processing for multiple targets

When you configure a mapping to write to multiple targets in a single pipeline, you can configure each target for any of the write operations. The order of the target operation is not deterministic.
However, if you want to process the target operations to process in a specific order such as delete, update, and insert, you need to set certain properties either in the Secure Agent or in the task properties.

Set -DEnableSingleCommit=true in the Secure Agent properties

Perform the following tasks to set the property for the Secure Agent:
  1. 1Open Administrator and select Runtime Environments.
  2. 2Select the Secure Agent for which you want to set the property.
  3. 3On the upper-right corner of the page, click Edit.
  4. 4In the System Configuration Details section, select the Type as DTM for the Data Integration Service.
  5. 5Edit the JVM options and set the property to -DEnableSingleCommit=true.

Set the EnableSingleCommit property in the task properties

Perform the following tasks to set the property in the task:
  1. 1On the Runtime Options tab in the mapping task properties, navigate to the Advanced Session Properties section.
  2. 2From the Session Property Name list, select Custom Properties, and set the Session Property Value to Yes.

Optimize the staging performance of a mapping

Data Integration, by default, creates a flat file locally in a temporary folder to stage the data before writing to Databricks. You can set Data Integration to optimize the staging performance of the write operation on a Linux machine.
If you do not set the staging property, Data Integration performs staging without the optimized settings, which might impact the performance of the task.
You can optimize the staging performance only when you use the SQL warehouse to run the mappings.

Setting the staging property

You can optimize the mapping performance of the write operation on Linux.
You need to set the following staging property in the agent properties:
INFA_DTM_STAGING_ENABLED_CONNECTORS
Perform the following tasks to set the staging property for the Tomcat in the Secure Agent properties:
  1. 1In Administrator, click Runtime Environments.
  2. 2Edit the Secure Agent for which you want to set the property.
  3. 3In the System Configuration Details section, select the Service as Data Integration Server and the type as Tomcat.
  4. 4Set the value of the Tomcat property to the plugin ID of Databricks Connector.
  5. You can find the plugin ID in the manifest file located in the following directory: <Secure Agent installation directory>/downloads/<Databricks Delta package>/CCIManifest
    The following image shows the property set for the Secure Agent for the write operation: This image shows the staging property you can set for the write operation
When you run the mapping, a flat file is created in the following directory in your Linux machine:
\tmp\DELTA_stage
You can check the session logs. If the staging is done through the flat file successfully, Data Integration logs the following message in the session log:
The INFA_DTM_STAGING is successfully enabled to use the flat file to create local staging files.

Rules and guidelines

Consider the following guidelines when you enable the staging property for the write operation: