Databricks Connector > Mappings for Databricks > Lookups for Databricks
  

Lookups for Databricks

Add a Lookup transformation to retrieve data based on a specified lookup condition. When you add a Lookup transformation to a mapping, you define the lookup connection, lookup objects, and lookup properties related to Databricks.
In the Lookup transformation, select the lookup connection and object. Then, define the lookup condition and the outcome for multiple matches.
The mapping queries the lookup source based on the lookup fields and the defined lookup condition. The lookup operation returns the result to the Lookup transformation, which then passes the results downstream.
You can configure the following lookups:
Lookup transformation doesn't apply to mappings in advanced mode.
For more information about Lookup transformation, see Transformations in the Data Integration documentation.

Lookup properties for Databricks

The following table describes the Databricks lookup object properties that you can configure in a Lookup transformation:
Property
Description
Connection
Name of the lookup connection.
You can select an existing connection, create a new connection, or define parameter values for the lookup connection property.
If you want to overwrite the lookup connection properties at runtime, select the Allow parameter to be overridden at run time option.
Source Type
Type of the source object. Select Single Object, Query, or Parameter.
Parameter
A parameter file where you define values that you want to update without having to edit the task. Select an existing parameter for the lookup object or click New Parameter to define a new parameter for the lookup object.
The Parameter property appears only if you select parameter as the lookup type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option.
Lookup Object
Name of the lookup object for the mapping.
Multiple Matches
Behavior when the lookup condition returns multiple matches. You can return all rows, any row, the first row, the last row, or an error.
You can select from the following options in the lookup object properties to determine the behavior:
  • - Return first row
  • - Return last row
  • - Return any row
  • - Return all rows
  • - Report error
The following table describes the Databricks lookup object advanced properties that you can configure in a Lookup transformation:
Advanced Property
Description
Database Name
Overrides the database specified in the connection.
Table Name
Overrides the table specified in the connection.
Staging Location
Relative directory path to store the staging files.
  • - If the Databricks is hosted on the AWS platform, use the path relative to the Amazon S3 staging bucket.
  • - If the Databricks is hosted on the Azure platform, use the path relative to the Azure Data Lake Store Gen2 staging filesystem name.
SQL Qverride
Overrides the default SQL query used to read data from the Databricks custom query source.
Pre-SQL
SQL statement that you want to run before reading data from the source.
Post-SQL
SQL statement that you want to run after reading data from the source.
Job Timeout
Maximum time in seconds that is taken by the Spark job to complete processing. If the job is not completed within the time specified, the Databricks cluster terminates the job and the mapping fails.
If the job timeout is not specified, the mapping shows success or failure based on the job completion.
Job Status Poll Interval
Poll interval in seconds at which the Secure Agent checks the status of the job completion. Default is 30 seconds.
DB REST API Timeout
The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns 5xx HTTP error code.
Default is 10 minutes.
DB REST API Retry Interval
The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns 5xx HTTP error code.
This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API.
Default is 30 seconds.
Tracing Level
Sets the amount of detail that appears in the log file. You can choose terse, normal, verbose initialization, or verbose data.
Default is normal.
Lookup Data Filter1
Limits the number of lookups that the mapping performs on the cache of the lookup source table based on the value you specify in the filter condition.
This property is applicable when you enable lookup cache on the Advanced tab for the Lookup transformation.
Maximum length is 32768 characters.
For more information about this property, see Lookup source filter in Transformations.
1Doesn't apply to mappings in advanced mode.

Multiple match options in lookups

When you configure a lookup, you define the behavior when a lookup condition returns more than one match. You can return all rows, any row, the first row, the last row, or an error.
The following configurations have multiple match policy restrictions:

Enable lookup caching

When you configure a Lookup transformation in a mapping, you can cache the lookup data during the runtime session.
When you select Lookup Caching Enabled, Data Integration queries the lookup source once and caches the values for use during the session, which can improve performance. You can configure dynamic lookup caching and can specify the directory to store the cached lookup.
Note: You cannot perform insert and update operations to the same target at runtime when you dynamically cache the lookup objects in a mapping.
For information about lookup caching, see the chapter " Lookup transformations" in Transformations in the Data Integration documentation.