Google BigQuery Connectors > Part II: Data Integration with Google BigQuery V2 Connector > SQL ELT with Google BigQuery V2 Connector > SQL ELT optimization for mapping tasks
  

SQL ELT optimization for mapping tasks

You can use SQL ELT optimization to push the transformation logic to the Google BigQuery database.

SQL ELT optimization

When you run a task configured for SQL ELT optimization, Data Integration converts the transformation logic to an SQL query. Data Integration sends the query to the database, and the database runs the query. The amount of transformation logic that Data Integration pushes to the database depends on the database, the transformation logic, and the mapping configuration. Data Integration processes all transformation logic that it cannot push to a database.
Configure SQL ELT optimization for a mapping in the tasks properties. You cannot configure SQL ELT optimization for a mapping in advanced mode.

SQL ELT optimization types

When you apply SQL ELT optimization, the task pushes transformation logic to the source or target database based on the optimization type you specify in the task properties. Data Integration translates the transformation logic into SQL queries or Google BigQuery commands to the Google BigQuery database. The database runs the SQL queries or Google BigQuery commands to process the transformations.
You can configure the following types of SQL ELT optimization in a mapping:
Full
Data Integration first pushes as much of the transformation logic as possible to process in the target database. If the target database cannot process some of the transformation logic, it pushes that logic for processing to the source database. Data Integration processes all the remaining transformation logic that it cannot push to the target or source database. This is applicable for mappings that read from or write to Google BigQuery.
When you select full SQL ELT optimization for mappings that read from Google Cloud Storage and write to Google BigQuery, Data Integration pushes as much of the transformation logic as possible to process in the target database. Data Integration processes all the transformation logic that it cannot push to the target database.
Source
Data Integration pushes down as much as the transformation logic as possible to process in the source database.
When you select source SQL ELT optimization, Data Integration pushes the transformation logic for all the supported transformations downstream in the mapping, but excludes the target transformation.

SQL ELT optimization scenarios

You can configure SQL ELT optimization for the following scenarios in mappings:
Important: To configure SQL ELT optimization using the Google BigQuery V2 Connector, verify that your organization has the Mappings-Advanced SQL ELT Optimization license. To get the license, contact Global Customer Support.
Source and target endpoints
Supported SQL ELT optimization scenarios in mappings
SQL ELT optimization type
Google BigQuery source
Google BigQuery target
Reads from and writes to Google BigQuery using the Google BigQuery V2 connection.
Full, Source
Google Cloud Storage source
Google BigQuery target
Reads from Google Cloud Storage using a Google Cloud Storage V2 connection and writes to Google BigQuery using a Google BigQuery V2 connection.
Full
Amazon S3 source
Google BigQuery target
Reads from Amazon S3 using an Amazon S3 V2 connection and writes to Google BigQuery using a Google BigQuery V2 connection.
Full
Note: You can use the Secure Agent or the Hosted Agent to run mappings enabled with SQL ELT optimization.

SQL ELT optimization preview

Before you can run a mapping task configured for SQL ELT optimization, you can preview if SQL ELT optimization is possible when you create the mapping. You can preview SQL ELT optimization from the SQL ELT Optimization panel in the Mapping Designer.
After you select the required SQL ELT optimization options and run the preview, Data Integration creates and runs a temporary SQL ELT optimization preview mapping task. When the job completes, Data Integration displays the SQL queries to be executed and any warnings in the SQL ELT Optimization panel. The warning messages help you understand which transformations in the configured mapping are not applicable for SQL ELT optimization. If SQL ELT optimization fails, Data Integration lists any queries generated up to the point of failure. You can edit the mapping and fix the required transformations before you run the mapping for SQL ELT optimization.
You can also view the temporary job created under My Jobs and download the session log to view the queries generated.
For more information about how to preview SQL ELT optimization, see the topic "SQL ELT optimization preview" in Mappings in the Data Integration help.

Configuring SQL ELT optimization

To optimize a mapping, add the mapping to a task, and then configure SQL ELT optimization in the mapping task.
    1Create a mapping task.
    2In the SQL ELT Optimization section on the Runtime Options tab, set the SQL ELT optimization value to Full or To Source.
    3If full SQL ELT optimization is not available, select how Data Integration handles SQL ELT optimization in the SQL ELT Optimization Fallback Option menu:
    Note: The fallback options are not applicable to mappings in advanced mode.
When you run the mapping task, the transformation logic is pushed to the Google BigQuery database.

Configuring a custom query or an SQL override for the Google BigQuery source object

You can push down a custom query or an SQL override to Google BigQuery.
Before you run a task that contains a custom query as the source object or you configure an SQL override, you must set the Create Temporary View session property in the mapping task properties.
Note: If you do not set the Create Temporary View property, the mapping runs without SQL ELT optimization.
Perform the following task to set the property:
  1. 1In the mapping task, navigate to the SQL ELT Optimization section on the Runtime Options tab.
  2. 2Select Create Temporary View.
  3. 3Click Finish.

Context based optimization for multiple targets

When you configure a mapping to write to multiple Google BigQuery targets or write to the same Google target table in two Target transformations, you can further optimize the write operation when you configure full SQL ELT optimization.
To optimize, you can choose to configure an insert, update, upsert, delete, or data driven operation for multiple targets individually. You can select the same Google BigQuery target table in multiple Target transformations and perform different operations for each of the Target transformations to run independent of each other.
When you configure a mapping enabled for full SQL ELT optimization to write to the same Google BigQuery target table in two target transformations, you can specify the optimization context for slowly changing dimension type 2 merge scenario.
You can enable the SCD Type 2 merge when you write to same Google BigQuery table in two Target transformations and perform different operations for each of the Target transformations, where you use one target to insert data and the other target to update data. Data Integration combines the queries for both the targets and issues a Merge query.
Note: Multi-insert mode is not applicable for Google BigQuery targets.

Understanding an SCD type 2 merge mapping

The SCD Type 2 merge mapping uses a BigQuery source and two target transformations that write to the same Google BigQuery table. One target transformation updates the table while the other transformation inserts data to the Google BigQuery table.
The following image shows a mapping that writes slowly changing dimension data to a Google BigQuery target table:
The image shows a mapping that writes slowly changing dimension data to a Google BigQuery table.
Add expression and lookup transformations to compare source data against the existing target data. You enter the lookup conditions and source columns that you want the Data Integration to compare against the existing target.
For each source row without a matching row in the target, the Expression transformation marks the new row. For each source row with a matching row in the target, the Expression transformation compares existing source and target columns with the MD5() function. If those columns do not match, the Expression marks the existing target row as an inactive row and inserts a new target row as an active row. The mapping then splits into two data groups using the Router transformation.
You must generate an UUID value through the Expression transformation and add it as a unique ID column and also as the first column in the target. Additionally, you can add an active status flag, MD5() hash value, start timestamp, and end timestamp columns to write to the target through the Expression.
The first data flow from the Router transformation passes only new rows to the Expression transformation. The Expression transformation inserts new rows to the target. The Expression transformation also assigns an UUID value and updates the start timestamp, MD5() function hash value, and the active status as 1 for each new row.
In the second data flow, the Router transformation passes only changed rows to pass to the Expression transformation. The Expression transformation inserts changed rows to the target. The Expression transformation updates the active status as 0 and adds the end timestamp for the existing row in the target.

Clean stop a SQL ELT optimization job

When a task enabled for SQL ELT optimization is running, you can clean stop the job to terminate all the issued statements and processes spawned by the job.
You can use this option for mappings enabled for SQL ELT optimization that use the Google BigQuery V2 connection either in the source or target transformation, or both.
Use the Clean Stop option on the My Jobs page in Data Integration and the All Jobs and Running Jobs page in Monitor.
See the following exceptions before you clean stop a SQL ELT optimization task:

SQL ELT optimization using a Google BigQuery V2 connection

You can configure SQL ELT optimization for a mapping that contains a Google BigQuery V2 connection. SQL ELT optimization enhances the mapping performance. You can configure full SQL ELT optimization when you read data from an Google Cloud Storage source and write to a Google BigQuery target.
You can configure SQL ELT optimization for a mapping task to read from or write data to Google BigQuery objects associated with different projects in different Google service accounts within the same region.

SQL ELT optimization compatibility

You can configure the task to push transformations, variables, functions, and operators to the database.
When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent operators, variables, and functions in the database. If there is no equivalent operator, variable, and function, the Secure Agent processes the transformation logic.

Functions with Google BigQuery V2

The following table lists the functions that can be pushed to the Google BigQuery database by using full SQL ELT optimization:
Function
Function
Function
Function
ABS()
IN()
MOD()
SYSDATE()
ADD_TO_DATE()
INSTR()
POWER()
SYSTIMESTAMP()
AVG()
IS_DATE()
REG_REPLACE()
TAN()
CEIL()
IS_NUMBER()
REPLACECHR()
TANH()
CHR()
IS_SPACES()
REPLACESTR()
TO_BIGINT
CONCAT()
ISNULL()
ROUND(DATE)
TO_CHAR(DATE)
COS()
LAST_DAY()
ROUND(NUMBER)
TO_CHAR(NUMBER)
COSH()
LENGTH()
RPAD()
TO_CHAR(STRING)
COUNT()
LN()
RTRIM()
TO_DATE()
DATE_COMPARE()
LOG()
SIGN()
TO_DECIMAL()
DATE_DIFF()
LOWER()
SIN()
TO_FLOAT()
DECODE()
LPAD()
SINH()
TO_INTEGER()
EXP()
LTRIM()
SQRT()
TRUNC(DATE)
FLOOR()
MAX()
STDDEV()
TRUNC(NUMBER)
GET_DATE_PART()
MD5()
SUBSTR()
UPPER()
IIF()
MIN()
SUM()
VARIANCE()

Rules and guidelines

When you push functions to Google BigQuery, adhere to the following guidelines:
MD5() function
When you push the MD5 function to Google BigQuery, adhere to the following guidelines:

Operators with Google BigQuery V2

When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent operators in the database. If there is no equivalent operator, the Secure Agent processes the transformation logic.
The following table lists the SQL ELT optimization operators that you can push to the Google BigQuery database by using full SQL ELT optimization:
Operator
Operator
+
=
-
>=
*
<=
/
!=
%
AND
||
OR
>
NOT
<

Rules and guidelines

When the argument in the NOT operator is a null value, the mapping runs with source SQL ELT optimization or without SQL ELT optimization.

Transformations with Google BigQuery V2

When you configure SQL ELT optimization, the Secure Agent tries to push the configured transformation to Google BigQuery.
The following list summarizes the availability of transformations that you can push down to Google BigQuery:

Aggregator transformation

You can configure full SQL ELT optimization to push an Aggregator transformation to process in Google BigQuery.
You can perform the following aggregate calculations:
When you configure an Aggregator transformation, you must use each of the incoming ports either in an aggregate function or in a group by field to define how to group data for aggregate expressions.

Lookup transformation

You can configure full SQL ELT optimization to push a Lookup transformation to process in Google BigQuery. This applies to both connected and unconnected lookups.
You can add the following lookups:
When you look up data and the lookup condition finds multiple matches, the lookup returns all rows. In a mapping with Google BigQuery as target, you must set the Multiple Matches option for the lookup object to Return all rows. If you enabled Multiple Matches to any option other than Return all rows, the agent ignores it.
When you configure a Lookup transformation based on a Google BigQuery source, adhere to the following guidelines:
When you configure an unconnected Lookup transformation, consider the following rules:

SQL transformation

You can use an SQL transformation to push supported scalar functions to Google BigQuery. When you configure SQL ELT optimization for a mapping, you can use Java or SQL user-defined functions (UDFs) in a SQL transformation and run queries with the Google BigQuery target endpoint.
You can use only the SELECT clause SQL statement to push down a function. The following snippet demonstrates the syntax of a simple SELECT SQL query:
SELECT <function_name1>(~Arg~), <function_name2> (~Arg~)...
You can push a SQL transformation with the following restrictions:

Variables with Google BigQuery V2

When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent variables in the database. If there is no equivalent variable, the Secure Agent processes the transformation logic.
The following table lists the SQL ELT optimization variables that can be used in an Google BigQuery database. Columns marked with an X indicate that the variable can be pushed to the Google BigQuery database by using full SQL ELT optimization.
Variable
SQL ELT optimization
SESSSTARTTIME
X
SYSDATE
-
WORKFLOWSTARTTIME
-

Data types with Google BigQuery V2

The following table lists the Google Cloud Storage data types based on the file format type that can be pushed to the Google BigQuery database:
File format type
Google Cloud Storage data type
Delimited
BigInt, Number, String
Avro
Binary, Byte, Double, Float, Int, Long, String
Parquet
Binary, Date, Decimal, Double, Float, Int32, Int64, Int96, String
JSON
Double, Int, Long, String
The following table lists the Google BigQuery native data types that can be mapped to the comparable transformation data types in a mapping configured with SQL ELT optimization:
Google BigQuery data type
Transformation data type
Boolean
String
Date
Date/Time
DateTime
Date/Time
Float
Double
Integer
BigInt
Numeric
Decimal
Default precision 28, scale 9.
String
String
Byte
Byte
Time
Date/Time
Timestamp
Date/Time
When you set the custom property OptimizeCastsInPDO:true in Google BigQuery V2 connection optional properties, you can map the following date time data types in Google BigQuery source to a target in a mapping enabled with SQL ELT optimization:
Source data type
Supported target data type
Date
Date, Date/Time
Time
Time
Date/Time
Date, Time, Date/Time
Timestamp
Time, Timestamp

Read from and write to Google BigQuery

You can configure SQL ELT optimization in a mapping to read from and write to Google BigQuery using a Google BigQuery V2 connection.

Example

You work in a motorbike retail company with more than 30,000 dealerships and 2000 inspection centers globally. The company stores millions of records in Google BigQuery hosted on GCP. You want to use Data Integration to perform some transformations on the data before you write back to Google BigQuery.
Use a Google BigQuery V2 connection in the mapping to read from the Google BigQuery source and write the processed data to the Google BigQuery target. Configure full SQL ELT optimization in the mapping to enhance the performance.

Supported features

You must configure a Google BigQuery V2 connection with simple or hybrid mode when you enable SQL ELT optimization in a mapping task.
Note: If you configure a Google BigQuery V2 connection with complex mode, the Secure Agent logs an SQL ELT optimization validation error in the session logs file and the mappings run in the Informatica runtime environment without full SQL ELT optimization.
When you configure SQL ELT optimization, the mappings support the following advance properties for a Google BigQuery V2 source:
When you configure SQL ELT optimization, the mappings support the following advance properties for a Google BigQuery V2 connected and unconnected lookup:
When you configure SQL ELT optimization, the mappings support the following properties for an Google BigQuery V2 target:
Note: If you configure target advanced properties that are not supported, the Secure Agent logs an validation error in the session logs and the mappings run in the Informatica runtime environment without full SQL ELT optimization.

Rules and guidelines for mappings that read from and write to Google BigQuery

When you configure SQL ELT optimization in a mapping that reads from and writes to Google BigQuery, consider the following guidelines:

Read from Google Cloud Storage and write to Google BigQuery

You can configure SQL ELT optimization for a mapping that uses a Google Cloud Storage connection in the Source transformation to read from Google Cloud Storage and a Google BigQuery V2 connection in the Target transformation to write to Google BigQuery.

Example

You work for a rapidly growing data science organization. Your organization develops software products to analyze financials, building financial graphs connecting people profiles, companies, jobs, advertisers, and publishers. The organization uses infrastructure based on Google Cloud Platform and stores its data in Google Cloud Storage files. The organization plans to implement a business intelligence service to build visualization and perform real-time analysis. You can load data from Google Cloud Storage to Google BigQuery by configuring the transformations to support the adequate data warehouse model and the consuming requirements.
Create an Google Cloud Storage V2 connection to read data form the Google Cloud Storage source. Create an Google BigQuery V2 connection and use SQL ELT optimization to write data to the Google BigQuery target to enhance the performance and reduce the cost involved.

Supported features

When you configure SQL ELT optimization, the Google Cloud Storage V2 connection supports the following properties:
When you configure SQL ELT optimization, the mappings support the following properties for a Google Cloud Storage V2 source:

Rules and guidelines for mappings that read from Google Cloud Storage V2 source

Use the following rules and guidelines when you configure SQL ELT optimization in a mapping that reads from a Google Cloud Storage V2 source and writes to a Google BigQuery target:

Read from Amazon S3 and write to Google BigQuery

You can configure SQL ELT optimization for a mapping that uses an Amazon S3 V2 connection in the Source transformation to read from Amazon S3 and a Google BigQuery connection in the Target transformation to write to Google BigQuery.

Example

You work for a healthcare organization. Your organization offers a suite of services to manage electronic medical records, patient engagement, telephonic health services, and care coordination services. The organization uses infrastructure based on Amazon Web Services and stores its data on Amazon S3. The management plans to load data to a data warehouse to perform healthcare analytics and create data points to improve operational efficiency. To load data from an Amazon S3 based storage object to Google BigQuery, you must use ETL and ELT with the required transformations that support the data warehouse model.
Use an Amazon S3 V2 connection to read data from a file object in an Amazon S3 source and a Google BigQuery connection to write to a Google BigQuery target. Configure full SQL ELT optimization in the mapping to optimize the performance.

Amazon S3 prerequisites

You need to complete the following prerequisites in Amazon S3 before you can read data from an Amazon S3 source and write to Google BigQuery:
For more information on configuring these prerequisites in Amazon S3, see the Amazon S3 documentation.

Supported features

When you configure SQL ELT optimization, the following connection properties of Amazon S3 V2 source are supported:
When you configure SQL ELT optimization, the mappings support the following properties for an Amazon S3 V2 source:
When you configure SQL ELT optimization, the mapping supports the following transformations:
Note: You can run a mapping that reads from an Amazon S3 source and writes to a Google BigQuery target, both belonging to different regions.
For information about the configurations for the listed options, see the help for the Amazon S3 V2 Connector.

Rules and guidelines for mappings that read from Amazon S3 source

Use the following rules and guidelines when you configure SQL ELT optimization in a mapping that reads from an Amazon S3 source and writes to a Google BigQuery target:

Rules and guidelines for SQL ELT optimization

Certain rules and guidelines apply when you enable a mapping for SQL ELT optimization to a Google BigQuery database.
When you configure a Google BigQuery source, Google Cloud Storage source, or Google BigQuery target, adhere to the following guidelines:

Troubleshooting a SQL ELT optimization task

Mapping fails when configured to read date, timestamp, or datetime information and write to default date/time format
When you configure a mapping to read date, timestamp, or datetime information from a string column and process the data with the default date/time format to write to Google BigQuery target, the mapping fails with the following error:
[ERROR] The Secure Agent failed to run the full SQL ELT query due to the following error: [Invalid timestamp: '12/31/1972 00:00:00.000001']
To resolve this issue, set the JVM option -DHonorInfaDateFormat=true for the Secure Agent.
Perform the following steps to configure the JVM option in Administrator:
  1. 1Select Administrator > Runtime Environments.
  2. 2On the Runtime Environments page, select the Secure Agent machine that runs the mapping.
  3. 3Click Edit.
  4. 4In the System Configuration Details section, select Data Integration Server as the Service and DTM as the Type.
  5. 5Edit the JVMOption system property and set the value to -DHonorInfaDateFormat=true.
  6. 6Click Save.
Mapping fails when configured to read time data in string data type and write to date/time data type
When you set the JVM option system property for the DTM type to -DHonorInfaDateFormat=false for the Secure Agent and also set the optional property OptimizeCastsInPDO=true in the Google BigQuery connection and run a mapping to read the time data in string data type from an Amazon S3 or Google Cloud Storage source and write to date/time data type, the mapping fails with the following error:
The Secure Agent failed to run the full SQL ELT query due to the following error: [Invalid timestamp: '12/31/1972 00:00:00.000001']
To resolve this issue, set the JVM option -DHonorInfaDateFormat=true for the Secure Agent and also provide the source data with the format provided in the DateTime Format String advanced session property in the mapping task.
Perform the following steps to configure the JVM option in Administrator:
  1. 1Select Administrator > Runtime Environments.
  2. 2On the Runtime Environments page, select the Secure Agent machine that runs the mapping.
  3. 3Click Edit.
  4. 4In the System Configuration Details section, select Data Integration Server as the Service and DTM as the Type.
  5. 5Edit the JVMOption system property and set the value to -DHonorInfaDateFormat=true.
  6. 6Click Save.