Use a Lookup transformation to retrieve data based on a specified lookup condition.
Use a Microsoft Azure Synapse SQL Lookup transformation to look up data in a Microsoft Azure Synapse SQL object. For example, the source table includes the customer code, but you want to include the customer name in the target table to make summary data easy to read.
You can use the Microsoft Azure Synapse SQL Lookup transformation to look up the customer name in another Microsoft Azure Synapse SQL object.
You can also set the default column value for the return field in a cached Lookup transformation in a Microsoft Azure Synapse SQL mapping. However, in advanced mode, you can't set the default column value for the return field.
You can create the following lookups when you configure field mappings in a mapping task:
•Connected with cached or uncached
•Unconnected lookup
•Connected lookup with dynamic lookup cache
You cannot configure a connected or unconnected uncached lookup for mappings in advanced mode.
The following table describes the Microsoft Azure Synapse SQL lookup object properties that you can configure in a Lookup transformation:
Property
Description
Connection
Name of the lookup connection.
You can select an existing connection, create a new connection, or define parameter values for the lookup connection property.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
You can switch between a non-parameterized and a parameterized Microsoft Azure Synapse SQL connection. When you switch between the connections, the advanced property values are retained.
Source Type
Type of the source object.
Select Single Object, Query, or Parameter.
You cannot use custom query as the source type in cached and dynamic cache lookups.
Lookup Object
Name of the lookup object for the mapping.
Parameter
A parameter file where you define values that you want to update without having to edit the task.
Select an existing parameter for the lookup object or click New Parameter to define a new parameter for the lookup object.
The Parameter property appears only if you select parameter as the source type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Define Query
If the source type is a query, displays the Edit Custom Query dialog box. Enter a valid custom query and click OK.
Multiple Matches
The behavior when the lookup condition returns multiple matches.
Select one of the following options:
- Return first row1
- Return last row1
- Return any row
- Return all rows
- Report error
Note: If the lookup table has an IDENTITY column, the data for the first row and last row is returned based on the sorting of the IDENTITY column. If the matched rows have duplicates then the values returned are random.
1Doesn't apply to mappings in advanced mode.
You cannot use Binary, Varbinary, and Datetimeoffset datatypes columns as lookup conditions.
When you use a cached lookup with Microsoft Azure Synapse SQL connection and if the lookup condition contains a NULL value, the lookup condition is ignored.
The following table describes the Microsoft Azure Synapse SQL lookup advanced properties that you can configure in a Lookup transformation:
Property
Description
Azure Blob Container Name
Required if you select Azure Blob storage in the connection properties.
The name of the container in Microsoft Azure Blob Storage.
The container name must not contain special characters.
ADLS FileSystem Name
Required if you select ADLS Gen2 storage in the connection properties.
The name of the file system in Microsoft Azure Data Lake Storage Gen2.
The file system name must not contain special characters.
You can also specify the path of the directory under the file system. Use only a forward slash to specify the directory path.
Schema Name Override
Overrides the schema specified in the connection.
Table Name Override
Overrides the table name of the imported Microsoft Azure Synapse SQL source table.
Field Delimiter
Character used to separate fields in the file. Default is 0x1e. You can specify 'TAB' or 0-256 single-char printable and non-printable ASCII characters. Non-printable characters must be specified in hexadecimal.
Note: Multi-char ASCII characters except TAB are not applicable. You cannot use the following non-printable characters:
00x0, 0x0, 0x0A , 0x1B, 0x0D, and 0x1F
Number of Concurrent Connections to Blob Store1
Number of concurrent connections to extract data from the Microsoft Azure Blob Storage. When reading a large-size blob, you can spawn multiple threads to process data. Configure Blob Part Size to partition a large-size blob into smaller parts.
Default is 4. Maximum is 10.
Blob Part Size1
Partitions a blob into smaller parts each of specified part size. When reading a large-size blob, consider partitioning the blob into smaller parts and configure concurrent connections to spawn required number of threads to process data in parallel.
Default is 8 MB.
Pre-SQL
Pre-SQL command that must be run before reading data from the source.
Post-SQL
Post-SQL command that must be run after reading data from the source.
SQL Override
When you read data from a Microsoft Azure Synapse SQL object, you can configure SQL overrides and define constraints.
On Pre-Post SQL Error
Determines the behavior when a task that includes pre-SQL or post-SQL commands encounters errors. You can select any of the following options:
- Continue. The task continues regardless of errors.
- Stop. The task stops when errors occur while executing pre-SQL or post-SQL commands.
Quote Character
The Secure Agent skips the specified character when you read data from Microsoft Azure Synapse SQL.
Default is 0x1f.
Interim Directory1
Optional. Path to the staging directory in the Secure Agent machine.
Specify the staging directory where you want to stage the files when you read data from Microsoft Azure Synapse SQL. Ensure that the directory has sufficient space and you have write permissions to the directory.
Default staging directory is /tmp.
You cannot specify an interim directory when you use the Hosted Agent.
Tracing Level1
Sets the amount of detail that appears in the log file. You can choose terse, normal, verbose initialization, or verbose data. Default is normal.
Lookup Data Filter1
Limits the number of lookups that the mapping performs on the cache of the lookup source table based on the value you specify in the filter condition.
This property is applicable when you select object as the source type and enable lookup cache on the Advanced tab for the Lookup transformation.
Maximum length is 32768 characters.
1Doesn't apply to mappings in advanced mode.
Enabling lookup caching
When you configure a Lookup transformation in a mapping, you can cache the lookup data during the runtime session.
When you select Lookup Caching Enabled, Data Integration queries the lookup source once and caches the values for use during the session, which can improve performance. You can specify the directory to store the cached lookup.
Lookup Cache Persistent
Use lookup cache persistent to save the lookup cache file to reuse it the next time Data Integration processes a Lookup transformation configured to use the cache.
You can specify the file name prefix to use with persistent lookup cache files in the Cache File Name Prefix field.
If the lookup table changes occasionally, you can enable the Re-cache from Lookup Source property to rebuild the lookup cache.
Dynamic Lookup Cache
Use a dynamic lookup cache to keep the lookup cache synchronized with the target. By default, the dynamic lookup cache is disabled and represents static cache.
If the cache is static, the data in the lookup cache does not change as the mapping task runs.
If the task uses the cache multiple times, the task uses the same data. If the cache is dynamic, the task updates the cache based on the actions in the task, so if the task uses the lookup multiple times, downstream transformations can use the updated data.
Pre-build Lookup Cache
Use a pre-build lookup cache to build the lookup cache before the Lookup transformation receives data. Multiple lookup cache files can be built at the same time to improve performance.
For information about lookup caching, see Transformations in the Data Integration documentation.
Optimize the staging performance for a mapping
Data Integration, by default, creates a flat file locally in a temporary folder to stage the data after reading from or before writing to Microsoft Azure Synapse SQL. You can set Data Integration to optimize the staging performance.
If you do not set the staging property, Data Integration performs staging without the optimized settings, which might impact the performance of the task.
Consider the following rules when you enable the staging property:
•When you read multiple objects, you cannot optimize the staging performance.
•If you run a mapping enabled for SQL ELT optimization, the mapping runs without SQL ELT optimization.
•If the data contains timestamp data types with time zone, the job runs without staging the data in the local flat file.
•If you configure a mapping in advanced mode, the mapping runs without staging the data in the local flat file.
•When you use a custom query or SQL override with the Order By clause and the AzureSynapseDisableStagingForSort property is set to true, the mapping runs without staging the data in the local flat file.
•When you read source data with decimal values and write to a Microsoft Azure Synapse SQL target and enable the staging property to optimize the staging performance, the decimal values in the target get truncated and rounded off to the nearest integer.
To avoid this issue, in the system configuration details section, select the Type as DTM for the Data Integration service, and set the value for DisableInfaDoubleHandlingForStaging as yes.
•When you enable the staging optimization property for lookups configured to use persistent cache and define the cache file name prefix, the mapping fails at runtime for consecutive runs.
•When you use Parquet as the staging file format, you cannot use the staging property to optimize the staging performance and the mapping fails with validation errors.
Enabling the Microsoft Azure Synapse SQL Connector to optimize the staging performance
Perform the following tasks to enable the Microsoft Azure Synapse SQL Connector to optimize the staging performance:
1In Administrator, click Runtime Environments.
2Edit the Secure Agent for which you want to set the property.
3In the System Configuration Details section, select the Service as Data Integration Server and the type as Tomcat.
4Set the value of the Tomcat property to the plugin ID of the Microsoft Azure Synapse SQL connector.
You can find the plugin ID in the manifest file located in the following directory: <Secure Agent installation directory>/downloads/<Microsoft Azure Synapse SQL package>/CCIManifest
The following image shows the property set for the Secure Agent for the read, write, and lookup operations:
To set the property for multiple connectors, specify the plugin ID for each connector separated by a comma.
When you run the mapping, the flat file is created in the following directory in your machine: C:\Windows\Temp\AzureSynapse\stage\<AzureSynapse_Target.txt>
You can check the session logs. If the flat file is created successfully, Data Integration logs the following message in the session log:
The INFA_DTM_STAGING is successfully enabled to use the flat file to create local staging files.