Google BigQuery Connectors > Part II: Data Integration with Google BigQuery V2 Connector > Mappings for Google BigQuery V2 > Google BigQuery V2 lookups in mappings
  

Google BigQuery V2 lookups in mappings

You can create lookups for objects in a Google BigQuery V2 mapping. You can retrieve data from a Google BigQuery V2 lookup object based on the specified lookup condition.
When you configure a lookup in a Google BigQuery V2 mapping, you select the lookup connection and lookup object. You also define the behavior when a lookup condition returns more than one match.
You can use the = (Equal to) and != (Not equal to) operators in a lookup condition.
You can add the following lookups to a Google BigQuery object when you configure field mappings in a mapping task:
You can only configure a cached lookup for mappings in advanced mode.
Note: You cannot configure an uncached lookup for a lookup object that uses a Google BigQuery V2 connection in complex connection mode.
The following table describes the Google BigQuery V2 lookup object properties that you can configure in a Lookup transformation:
Property
Description
Connection
Name of the lookup connection.
Source Type
Type of the source object. Select Single Object or Parameter.
Lookup Object
Name of the Google BigQuery lookup object for the mapping.
Multiple Matches
Behavior when the lookup condition returns multiple matches. You can return all rows, any row, the first row, the last row, or an error.
You can select from the following options in the lookup object properties to determine the behavior:
  • - Return first row¹
  • - Return last row¹
  • - Return any row
  • - Return all rows
  • - Report error
Only return any row, return all rows, and report error is applicable for mappings in advanced mode.
¹ Doesn't apply to mappings in advanced mode.
The following table describes the Google BigQuery V2 lookup object advanced properties that you can configure in a Lookup transformation with caching enabled:
Property
Description
Source Dataset ID
Optional. Overrides the Google BigQuery dataset name that you specified in the connection.
Source Table Name
Optional. Overrides the Google BigQuery table name that you specified in the Lookup transformation.
Number of Rows to Read
Specifies the number of rows to read from the Google BigQuery source table.
Allow Large Results¹
Determines whether Google BigQuery V2 Connector creates arbitrarily large result tables to query large source tables.
If you select this option, you must specify a destination table to store the query results.
Query Results Table Name¹
Required if you select the Allow Large Results option.
Specifies the destination table name to store the query results. If the table is not available in the dataset, Google BigQuery V2 Connector creates the destination table with the name that you specify.
Job Poll Interval In Seconds¹
The number of seconds after which Google BigQuery V2 Connector polls the status of the read job operation.
Default is 10.
Read Mode
Specifies the read mode to read data from the Google BigQuery source.
You can select one the following read modes:
  • - Direct. In direct mode, Google BigQuery V2 Connector reads data directly from the Google BigQuery source table.
  • Note: When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source.
  • - Staging¹. In staging mode, Google BigQuery V2 Connector exports data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery V2 Connector downloads the data from Google Cloud Storage into the local stage file and then reads data from the local stage file.
Default is Direct mode.
Number of Threads for Downloading Staging Files¹
Specifies the number of files that Google BigQuery V2 Connector downloads at a time to enable parallel download.
This property applies to staging mode.
Data format of the staging file¹
Specifies the data format of the staging file. You can select one of the following data formats:
  • - Avro
  • - JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • - Parquet
  • - CSV. Supports flat data.
  • Note: In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
This property applies to staging mode.
Local Stage File Directory¹
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the Google BigQuery source data temporarily before it reads the data.
This property applies to staging mode.
Staging File Name¹
Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage.
This property applies to staging mode.
Enable Staging File Compression¹
Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery V2 Connector reads data from the staging file.
You can enable staging file compression to reduce cost and transfer time.
This property applies to staging mode.
Persist Destination Table¹
Indicates whether Google BigQuery V2 Connector must persist the query results table after it reads data from the query results table.
By default, Google BigQuery V2 Connector deletes the query results table.
pre SQL¹
SQL statement that you want to run before reading data from the source.
For example, if you want to select records in the database before you read the records from the table, specify the following pre SQL statement:
SELECT * FROM [api-project-80697026669:EMPLOYEE.DEPARTMENT] LIMIT 1000;
post SQL¹
SQL statement that you want to run after reading data from the source.
For example, if you want to update records in a table after you read the records from a source table, specify the following post SQL statement:
UPDATE [api-project-80697026669.EMPLOYEE.PERSONS_TGT_DEL]
SET phoneNumber.number=1000011, phoneNumber.areaCode=100 where fullname='John Doe'
pre SQL Configuration¹
Specify a pre SQL configuration.
For example,
DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
post SQL Configuration¹
Specify a post SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
SQL Override Query¹
Overrides the default SQL query used to read data from the Google BigQuery source.
When you specify SQL override query, you must specify a dataset name in the Source Dataset ID advanced source property.
Ensure that the list of selected columns, data types, and the order of the columns that appear in the query matches the columns, data types, and order in which they appear in the source object.
Ensure that you only map all the columns in the SQL override query to the target.
Does not apply when you enable partitioning.
Use Legacy SQL For SQL Override¹
Indicates that the SQL Override query is specified in legacy SQL.
Use the following format to specify a legacy SQL query for the SQL Override Query property:
SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName]
Clear this option to define a standard SQL override query.
Use the following format to specify a standard SQL query for the SQL Override Query property:
SELECT * FROM 'projectID.datasetID.tableName'
Label¹
You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer.
For more information about labels and their usage requirements, see Assign a label to the transformations.
Billing Project ID¹
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and extract jobs.
If you omit the project ID here, the Secure Agent runs query and extract jobs in the Google Cloud project corresponding to the Project ID value specified in the Google BigQuery V2 connection.
Retry Options¹
Comma-separated list to specify the following retry options:
  • - Retry Count. The number of retry attempts to read data from Google BigQuery.
  • - Retry Interval. The time in seconds to wait between each retry attempt.
  • - Retry Exceptions. The list of exceptions separated by pipe (|) character for which the retries are made.
Use the following format to specify the retry options:
For example,
RetryCount:5,RetryInterval:1,RetryExceptions:java.net.ConnectException|java.io.IOException
Note: The retry options are available for preview. Preview functionality is supported for evaluation purposes but is unwarranted and is not production-ready. Informatica recommends that you use in non-production environments only. Informatica intends to include the preview functionality in an upcoming release for production use, but might choose not to in accordance with changing market or technical circumstances. For more information, contact Informatica Global Customer Support. To use the functionality, your organization must have the appropriate licenses.
Number of Spark Partitions²
Specifies the maximum number of partitions that the Spark engine splits the data into.
Default is 1.
Lookup Data Filter¹
Limits the number of lookups that the mapping performs on the cache of the lookup source table based on the value you specify in the filter condition.
This property is applicable when you select object as the source type and enable lookup cache on the Advanced tab for the Lookup transformation.
Maximum length is 32768 characters.
For more information about this property, see Lookup source filter in Transformations.
¹ Doesn't apply to mappings in advanced mode.
² Applies only to mappings in advanced mode.

Unconnected lookup transformation

You can configure an unconnected Lookup transformation for the Google BigQuery source in a mapping. Use the Lookup transformation to retrieve data from Google BigQuery based on a specified lookup condition.
An unconnected Lookup transformation is a Lookup transformation that is not connected to any source, target, or transformation in the pipeline.
An unconnected Lookup transformation receives input values from the result of a :LKP expression in another transformation. The Integration Service queries the lookup source based on the lookup ports and condition in the Lookup transformation and passes the returned value to the port that contains the :LKP expression. The :LKP expression can pass lookup results to an expression in another transformation.
Note: You cannot configure a uncached unconnected Lookup transformation for the Google BigQuery source in a mapping.
For more information about the Lookup transformation, see Transformations.

Configuring an unconnected lookup transformation

To configure an unconnected Lookup transformation, select the Unconnected Lookup option, add incoming fields, configure the lookup condition, and designate a return value. Then configure a lookup expression in a different transformation.
    1Add a Lookup transformation in a mapping.
    2On the General tab of the Lookup transformation, enable the Unconnected Lookup option.
    The image shows the General tab of the Lookup transformation where you can enable the Unconnected Lookup option.
    3On the Incoming Fields tab of the Lookup transformation, create an incoming field for each argument in the :LKP expression.
    For each lookup condition you create, add an incoming field to the Lookup transformation.
    The image shows the Incoming Fields tab of the Lookup transformation where you can add an incoming field to the Lookup transformation.
    4In the Lookup Object tab, import the lookup object.
    The image shows the Lookup Object tab of the Lookup transformation where you can import the Lookup object.
    The Multiple Matches property value Return all rows in an unconnected lookup is not supported.
    5Designate a return value.
    You can pass multiple input values into a Lookup transformation and return one column of data. Data Integration returns one value from the lookup query. Use the return field to specify the return value.
    The image shows the Lookup Condition tab of the Lookup transformation where you can pass multiple input values into a Lookup transformation and return one column of data.
    6Configure a lookup expression in another transformation.
    Provide input values for an unconnected Lookup transformation from a :LKP expression in a transformation that uses an Expression transformation. The arguments are local input fields that match the Lookup transformation input fields used in the lookup condition.
    The image shows the Field Expression dialog box tab of the Expression transformation where you can supply input values for an unconnected Lookup transformation from a :LKP expression.
    7Map the fields with the target.

Enabling lookup caching

When you configure a Lookup transformation in a mapping, you can cache the lookup data during the runtime session.
When you select Lookup Caching Enabled, Data Integration queries the lookup source once and caches the values for use during the session, which can improve performance. You can specify the directory to store the cached lookup.
Lookup Cache Persistent
Use lookup cache persistent to save the lookup cache file to reuse it the next time Data Integration processes a Lookup transformation configured to use the cache.
You can specify the file name prefix to use with persistent lookup cache files in the Cache File Name Prefix field.
If the lookup table changes occasionally, you can enable the Re-cache from Lookup Source property to rebuild the lookup cache.
When you enable lookup cache persistent, the mappings don't show messages related to staging optimization.
Dynamic Lookup Cache
Use a dynamic lookup cache to keep the lookup cache synchronized with the target. By default, the dynamic lookup cache is disabled and represents static cache.
If the cache is static, the data in the lookup cache does not change as the mapping task runs.
If the task uses the cache multiple times, the task uses the same data. If the cache is dynamic, the task updates the cache based on the actions in the task, so if the task uses the lookup multiple times, downstream transformations can use the updated data.
For information about lookup caching, see Transformations in the Data Integration documentation.

Optimize lookup performance in staging mode

You can configure Data Integration to optimize the staging performance of a lookup operation.
You can enhance the lookup operation performance by setting a staging property, INFA_DTM_LKP_STAGING_ENABLED_CONNECTORS, for the Secure Agent. Data Integration first copies the data from Google BigQuery source into a flat file located in the local staging file directory. When the staging file contains all the data, Data Integration processes this data read.
Consider the following rules when you enable the staging property:

Enabling Google BigQuery V2 Connector to optimize the lookup performance

Perform the following steps to set the staging property for the Tomcat in the Secure Agent properties:
  1. 1In Administrator, click Runtime Environments. The Runtime Environments page appears.
  2. 2Select the Secure Agent for which you want to set the custom configuration property.
  3. 3Click Edit Secure Agent icon corresponding to the Secure Agent you want to edit in Actions. The Edit Secure Agent page appears.
  4. 4In the System Configuration Details section, select the Service as Data Integration Server and the type as Tomcat.
  5. 5Set the value of the Tomcat property INFA_DTM_LKP_STAGING_ENABLED_CONNECTORS to the plugin ID of the Google BigQuery V2 Connector.
  6. You can find the plugin ID in the manifest file located in the following directory:
    <Secure Agent installation directory>/downloads/<GoogleBigQueryV2 package>/CCIManifest
  7. 6Click Save.
  8. 7Restart the Data Integration Service.
  9. 8In the Google BigQuery V2 connection, set the UseRFC4180CSVParser:true custom property in the Provide Optional Properties connection property.
You can check the session logs. If the flat file is created successfully, Data Integration logs the following message in the session log:
The reader is configured to run in [DTM_STAGING_CSV] mode.
In the Google BigQuery advanced source properties, set the lookup mode as staging and set the Data Format of the staging file property to CSV.
When you enable the staging mode to lookup source data, you can see the following message in the logs:
LKPDP_1:READER_1_1> SDKS_38636 [2023-04-21 15:07:23.020] Plug-in #601601: DTM Staging is enabled for connector for Source Instance [Source].
When you disable the staging mode to lookup source data, you can see the following message in the logs:
LKPDP_1:READER_1_1> SDKS_38637 [2023-04-21 15:42:45.538] Plug-in #601601: DTM Staging is disabled for connector for Source Instance [Source].

Setting default column value for the lookup and output ports

In cached and connected Lookup transformation, you can set the default column value for the lookup and output ports. When no matching field is found for the lookup query, the default column value is inserted to the output table.
To set the default column value, select the Use Lookup Field Default Value option in the Lookup advanced properties.
Consider the following rules and guidelines when you set the default column value:

Rules and guidelines for Lookup transformation

Certain rules and guidelines apply when you configure a Lookup transformation.
When you configure a Google BigQuery lookup, adhere to the following guidelines: