You can use SQL ELT optimization to push the transformation logic to the Google BigQuery database.
SQL ELT optimization
When you run a task configured for SQL ELT optimization, Data Integration converts the transformation logic to an SQL query. Data Integration sends the query to the database, and the database runs the query. The amount of transformation logic that Data Integration pushes to the database depends on the database, the transformation logic, and the mapping configuration. Data Integration processes all transformation logic that it cannot push to a database.
Configure SQL ELT optimization for a mapping in the tasks properties. You cannot configure SQL ELT optimization for a mapping in advanced mode.
SQL ELT optimization types
When you apply SQL ELT optimization, the task pushes transformation logic to the source or target database based on the optimization type you specify in the task properties. Data Integration translates the transformation logic into SQL queries or Google BigQuery commands to the Google BigQuery database. The database runs the SQL queries or Google BigQuery commands to process the transformations.
You can configure the following types of SQL ELT optimization in a mapping:
Full
Data Integration first pushes as much of the transformation logic as possible to process in the target database. If the target database cannot process some of the transformation logic, it pushes that logic for processing to the source database. Data Integration processes all the remaining transformation logic that it cannot push to the target or source database. This is applicable for mappings that read from or write to Google BigQuery.
When you select full SQL ELT optimization for mappings that read from Google Cloud Storage and write to Google BigQuery, Data Integration pushes as much of the transformation logic as possible to process in the target database. Data Integration processes all the transformation logic that it cannot push to the target database.
Source
Data Integration pushes down as much as the transformation logic as possible to process in the source database.
When you select source SQL ELT optimization, Data Integration pushes the transformation logic for all the supported transformations downstream in the mapping, but excludes the target transformation.
SQL ELT optimization scenarios
You can configure SQL ELT optimization for the following scenarios in mappings:
Important: To configure SQL ELT optimization using the Google BigQuery V2 Connector, verify that your organization has the Mappings-Advanced SQL ELT Optimization license. To get the license, contact Global Customer Support.
Source and target endpoints
Supported SQL ELT optimization scenarios in mappings
SQL ELT optimization type
Google BigQuery source
Google BigQuery target
Reads from and writes to Google BigQuery using the Google BigQuery V2 connection.
Full, Source
Google Cloud Storage source
Google BigQuery target
Reads from Google Cloud Storage using a Google Cloud Storage V2 connection and writes to Google BigQuery using a Google BigQuery V2 connection.
Full
Amazon S3 source
Google BigQuery target
Reads from Amazon S3 using an Amazon S3 V2 connection and writes to Google BigQuery using a Google BigQuery V2 connection.
Full
Note: You can use the Secure Agent or the Hosted Agent to run mappings enabled with SQL ELT optimization.
SQL ELT optimization preview
Before you can run a mapping task configured for SQL ELT optimization, you can preview if SQL ELT optimization is possible when you create the mapping. You can preview SQL ELT optimization from the SQL ELT Optimization panel in the Mapping Designer.
After you select the required SQL ELT optimization options and run the preview, Data Integration creates and runs a temporary SQL ELT optimization preview mapping task. When the job completes, Data Integration displays the SQL queries to be executed and any warnings in the SQL ELT Optimization panel. The warning messages help you understand which transformations in the configured mapping are not applicable for SQL ELT optimization. If SQL ELT optimization fails, Data Integration lists any queries generated up to the point of failure. You can edit the mapping and fix the required transformations before you run the mapping for SQL ELT optimization.
You can also view the temporary job created under My Jobs and download the session log to view the queries generated.
For more information about how to preview SQL ELT optimization, see the topic "SQL ELT optimization preview" in Mappings in the Data Integration help.
Configuring SQL ELT optimization
To optimize a mapping, add the mapping to a task, and then configure SQL ELT optimization in the mapping task.
1Create a mapping task.
2In the SQL ELT Optimization section on the Runtime Options tab, set the SQL ELT optimization value to Full or To Source.
3If full SQL ELT optimization is not available, select how Data Integration handles SQL ELT optimization in the SQL ELT Optimization Fallback Option menu:
- Partial SQL ELT. Default. Data Integration pushes as much transformation logic as possible to the source database. The task processes any transformation logic that it can't push to a database. You can use Partial SQL ELT only when you read from and write to Google BigQuery.
- Non SQL ELT. The task runs without SQL ELT optimization.
- Fail Task. Data Integration fails the task.
Note: The fallback options are not applicable to mappings in advanced mode.
When you run the mapping task, the transformation logic is pushed to the Google BigQuery database.
Configuring a custom query or an SQL override for the Google BigQuery source object
You can push down a custom query or an SQL override to Google BigQuery.
Before you run a task that contains a custom query as the source object or you configure an SQL override, you must set the Create Temporary View session property in the mapping task properties.
Note: If you do not set the Create Temporary View property, the mapping runs without SQL ELT optimization.
Perform the following task to set the property:
1In the mapping task, navigate to the SQL ELT Optimization section on the Runtime Options tab.
2Select Create Temporary View.
3Click Finish.
Context based optimization for multiple targets
When you configure a mapping to write to multiple Google BigQuery targets or write to the same Google target table in two Target transformations, you can further optimize the write operation when you configure full SQL ELT optimization.
To optimize, you can choose to configure an insert, update, upsert, delete, or data driven operation for multiple targets individually. You can select the same Google BigQuery target table in multiple Target transformations and perform different operations for each of the Target transformations to run independent of each other.
When you configure a mapping enabled for full SQL ELT optimization to write to the same Google BigQuery target table in two target transformations, you can specify the optimization context for slowly changing dimension type 2 merge scenario.
You can enable the SCD Type 2 merge when you write to same Google BigQuery table in two Target transformations and perform different operations for each of the Target transformations, where you use one target to insert data and the other target to update data. Data Integration combines the queries for both the targets and issues a Merge query.
Note: Multi-insert mode is not applicable for Google BigQuery targets.
Understanding an SCD type 2 merge mapping
The SCD Type 2 merge mapping uses a BigQuery source and two target transformations that write to the same Google BigQuery table. One target transformation updates the table while the other transformation inserts data to the Google BigQuery table.
The following image shows a mapping that writes slowly changing dimension data to a Google BigQuery target table:
Add expression and lookup transformations to compare source data against the existing target data. You enter the lookup conditions and source columns that you want the Data Integration to compare against the existing target.
For each source row without a matching row in the target, the Expression transformation marks the new row. For each source row with a matching row in the target, the Expression transformation compares existing source and target columns with the MD5() function. If those columns do not match, the Expression marks the existing target row as an inactive row and inserts a new target row as an active row. The mapping then splits into two data groups using the Router transformation.
You must generate an UUID value through the Expression transformation and add it as a unique ID column and also as the first column in the target. Additionally, you can add an active status flag, MD5() hash value, start timestamp, and end timestamp columns to write to the target through the Expression.
The first data flow from the Router transformation passes only new rows to the Expression transformation. The Expression transformation inserts new rows to the target. The Expression transformation also assigns an UUID value and updates the start timestamp, MD5() function hash value, and the active status as 1 for each new row.
In the second data flow, the Router transformation passes only changed rows to pass to the Expression transformation. The Expression transformation inserts changed rows to the target. The Expression transformation updates the active status as 0 and adds the end timestamp for the existing row in the target.
Clean stop a SQL ELT optimization job
When a task enabled for SQL ELT optimization is running, you can clean stop the job to terminate all the issued statements and processes spawned by the job.
You can use this option for mappings enabled for SQL ELT optimization that use the Google BigQuery V2 connection either in the source or target transformation, or both.
Use the Clean Stop option on the My Jobs page in Data Integration and the All Jobs and Running Jobs page in Monitor.
See the following exceptions before you clean stop a SQL ELT optimization task:
•When you clean stop a task enabled for source SQL ELT optimization that reads from or writes to Google BigQuery and the target or source properties in the mapping contains pre-SQL or post-SQL statements, even if the select query is terminated, the job continues to run the target post-SQL query.
•When you start a job enabled for full SQL ELT optimization and clean stop it immediately, and if the mapping is configured to create a new target at runtime, the table is created even if the job is terminated.
SQL ELT optimization using a Google BigQuery V2 connection
You can configure SQL ELT optimization for a mapping that contains a Google BigQuery V2 connection. SQL ELT optimization enhances the mapping performance. You can configure full SQL ELT optimization when you read data from an Google Cloud Storage source and write to a Google BigQuery target.
You can configure SQL ELT optimization for a mapping task to read from or write data to Google BigQuery objects associated with different projects in different Google service accounts within the same region.
SQL ELT optimization compatibility
You can configure the task to push transformations, variables, functions, and operators to the database.
When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent operators, variables, and functions in the database. If there is no equivalent operator, variable, and function, the Secure Agent processes the transformation logic.
Functions with Google BigQuery V2
The following table lists the functions that can be pushed to the Google BigQuery database by using full SQL ELT optimization:
Function
Function
Function
Function
ABS()
IN()
MOD()
SYSDATE()
ADD_TO_DATE()
INSTR()
POWER()
SYSTIMESTAMP()
AVG()
IS_DATE()
REG_REPLACE()
TAN()
CEIL()
IS_NUMBER()
REPLACECHR()
TANH()
CHR()
IS_SPACES()
REPLACESTR()
TO_BIGINT
CONCAT()
ISNULL()
ROUND(DATE)
TO_CHAR(DATE)
COS()
LAST_DAY()
ROUND(NUMBER)
TO_CHAR(NUMBER)
COSH()
LENGTH()
RPAD()
TO_CHAR(STRING)
COUNT()
LN()
RTRIM()
TO_DATE()
DATE_COMPARE()
LOG()
SIGN()
TO_DECIMAL()
DATE_DIFF()
LOWER()
SIN()
TO_FLOAT()
DECODE()
LPAD()
SINH()
TO_INTEGER()
EXP()
LTRIM()
SQRT()
TRUNC(DATE)
FLOOR()
MAX()
STDDEV()
TRUNC(NUMBER)
GET_DATE_PART()
MD5()
SUBSTR()
UPPER()
IIF()
MIN()
SUM()
VARIANCE()
Rules and guidelines
When you push functions to Google BigQuery, adhere to the following guidelines:
•To push IS_DATE() function to Google BigQuery, you must configure the output field in the expression transformation to a column of string data type.
•When you push the IS_DATE() function to Google BigQuery and use the SS.US format argument and specify values with the SS.MS or SS format, the IS_DATE() returns true.
•When you push the IS_DATE() function to Google BigQuery and use the MON format argument and specify values with the MONTH format, the IS_DATE() returns true.
•When you push the IS_DATE() function to Google BigQuery and use the MONTH format argument and specify values with the MON format, the IS_DATE() returns true.
•When you use Is_Number(), Is_Spaces(), and Is_Date() in an Expression transformation, the output field type supports only integer and string data types.
•When you push a function to Google BigQuery and the mapping runs without SQL ELT optimization, the IS_DATE() returns Boolean values of 0 or 1 to the Google BigQuery target table. If the mappings run with SQL ELT optimization, the IS_DATE() returns Boolean values of true or false to the Google BigQuery target table.
•When you specify Is_Number(), Is_Spaces(), or Is_Date() functions in an Expression transformation, the output field type supports only integer and string data types.
•To push the TO_CHAR(DATE) function to Google BigQuery, you must use the following string and format arguments:
- YYYY
- YY
- RR
- Q
- MM
- MON
- MONTH
- DD
- DDD
- DY
- DAY
- HH12
- HH24
- MI
- SS
- SS.MS
- SS.US
- am
- AM
- pm
- PM
•To push the TO_DATE(string, format) function to Google BigQuery, you must use the following format arguments:
- YYYY
- YY
- RR
- MM
- MON
- MONTH
- DD
- HH12
- HH24
- MI
- SS
- SS.MS
- SS.US
- am
- AM
- pm
- PM
•To push the ADD_TO_DATE(date, format, amount) or TRUNC(date, format) function to Google BigQuery, you must use the following format arguments:
- YYYY
- YY
- YYY
- Y
- MM
- MON
- MONTH
- D
- DD
- DDD
- DY
- DAY
- HH
- HH12
- HH24
- MI
- SS
- MS
- US
•To push the GET_DATE_PART(date, format) function to Google BigQuery, you must use the following format arguments:
- YYYY
- YY
- YYY
- Y
- MM
- MON
- MONTH
- DD
- DDD
- DY
- DAY
- HH
- HH12
- HH24
- MI
- SS
- MS
- US
•When you push the GET_DATE_PART() function to the Google BigQuery database and specify null in the format argument, the mapping runs without SQL ELT optimization.
•When you push the LAST_DAY() function to the Google BigQuery database and specify Date/Time or Timestamp value in the date argument, the LAST_DAY() function pushes only the date values to the Google BigQuery target. You might encounter a data mismatch when you compare a mapping that runs with full SQL ELT optimization and a mapping that runs without SQL ELT optimization.
•To push the ROUND(DATE) function to Google BigQuery, you must use the following format arguments:
- DD
- DDD
- DY
- DAY
- HH
- HH12
- HH24
- MI
- SS
- MS
•When you push the ROUND(DATE) function to the Google BigQuery database and use NS (nanoseconds) in the format argument, the mapping runs without SQL ELT optimization or with source SQL ELT optimization.
•To push the ROUND(NUMBER) function to Google BigQuery, you must use a numeric value of the following data types:
- Decimal
- Numeric
- NULL
•To push the INSTR() function to Google BigQuery, you must only define the input_field and string arguments.
•When you push the SYSTIMESTAMP() function to the Google BigQuery database, do not specify any format arguments. If you do not specify any format arguments, the Google BigQuery database returns the complete timestamp.
•If you use a % operator in an expression transformation, the mapping converts the % operator to the MOD() function and pushes the MOD() function to Google BigQuery.
The MOD() function supports arguments of Int64 and Numeric data types. When you push the MOD() function to the Google BigQuery database, ensure that the format arguments are of the same data type. You can specify the arguments in the following formats:
- MOD(Int64, Int64)
- MOD(Numeric, Numeric)
•When you push the TRUNC(DATE) function to the Google BigQuery database and specify a NULL value in the format argument, the mapping runs without SQL ELT optimization.
•When you push the SYSDATE() function to the Google BigQuery database and the mapping runs with SQL ELT optimization, the function returns the current date and time based on the time zone associated with the Google BigQuery database.
When you push the SYSDATE() function to the Google BigQuery database and the mapping runs without SQL ELT optimization, the function returns the current date and time based on the time zone associated with the machine where the Secure Agent runs.
•When you push the SUBSTR() function to the Google BigQuery database, you must specify a value of the String data type in the string argument. If you pass a numeric value, the mapping runs without SQL ELT optimization.
•When you push the SUBSTR() function to the Google BigQuery database, the value of the length argument must be an integer greater than 0. If you specify a negative integer value for the length argument, the mapping runs without SQL ELT optimization.
•When you push the EXP() function to the Google BigQuery database and specify a value of the Numeric or Double data type for the exponent argument, you might encounter a data mismatch in the decimal values when you compare a mapping that runs with full SQL ELT optimization and a mapping that runs without SQL ELT optimization.
•When you push the RPAD() or LPAD() function to the Google BigQuery database, you must specify a value of the String data type in the first_string argument. If you specify a value other than the String data type in the first_string argument, the mapping runs without SQL ELT optimization.
•When you push the RPAD() or LPAD() function to the Google BigQuery database and specify an empty string in the second_string or third_string argument, the mapping runs without full SQL ELT optimization.
•When you push the REPLACECHR() function to the Google BigQuery database to write Numeric data to the Google BigQuery target, you can see a data mismatch between the results when you compare a mapping that runs with full SQL ELT optimization against a mapping disabled for SQL ELT optimization.
If the mapping runs with full SQL ELT optimization, the trailing zeroes after decimal is not considered while casting. However, if you run a mapping with disabled SQL ELT optimization, the Secure Agent casts the trailing zeroes after the decimal while casting from NUMERIC to String data types.
•When you push the REPLACECHR() or REPLACESTR() function to the Google BigQuery database, the microseconds available in the timestamp data is considered in the casted string for a mapping that runs with full SQL ELT optimization.
You might encounter a data mismatch when you compare a mapping that runs with full SQL ELT optimization and a mapping that runs without SQL ELT optimization. The microseconds are not considered in the mapping that runs without SQL ELT optimization.
• When you push the REPLACESTR() or REPLACECHR() function to the Google BigQuery database and specify special characters in the format arguments in a nested function, ensure that the nested function does not contain a single backslash. You can use a double backslash in the nested function.
You might encounter a data mismatch when you compare a mapping that runs with full SQL ELT optimization and a mapping that runs without SQL ELT optimization when you use a double backslash in the nested function.
• When you push the REPLACESTR() or REPLACECHR() function using an Expression transformation with the data/time data types to Google BigQuery using full SQL ELT optimization, the default date format of the data/time data types returned for a mapping with SQL ELT optimization is YYYY-MM-DD HH24:MI:SS.US, whereas for a mapping without SQL ELT optimization is MM/DD/YYYY HH24:MI:SS.US.
To fix the issue in a mapping without SQL ELT optimization, use the TO_CHAR function to return the string date in the MM/DD/YYYY HH24:MI:SS.US format. For example, to get a similar result, use replacechr(1, TO_CHAR(col6_date), '09','1').
•When you push down the Is_Number() function for Float data types with NaN, -inf, and +inf values, the Is_Number() function returns true.
•When you push the Is_Date() function using an Expression transformation with the YYYY-MM-DD format and data contains data types with the YYYY-MM-DD and YYYY/MM/DD formats , the Is_Date() function returns true only for YYYYY-MM-DD. When you push the Is_Date() function with the YYYY/MM/DD format and data contains data types with the YYYY-MM-DD and YYYY/MM/DD formats , the Is_Date() function returns true only for YYYY/MM/DD.
•When you push the Is_Number() function to process in Google BigQuery from a Secure Agent machine on Windows, the Is_Number() function returns false for the following format: '45.45d-2'
•When you use the TO_CHAR(String) function, the string value must not contain a backslash (\). Else, the mapping fails.
• When you configure an IN function that returns a value of string data type in an Expression transformation and writes the value to an integer data type in full SQL ELT optimization, the mapping fails with the following error:
The Secure Agent failed to run the full SQL ELT query due to the following error: [Bad int64 value: false]
• When you configure an IN function that returns a value of integer data type in an Expression transformation and writes the value to an integer data type in full SQL ELT optimization, the mapping fails with the following error:
The Secure Agent failed to run the full SQL ELT query due to the following error: [Query column 1 has type BOOL which cannot be inserted into column COL_INT, which has type INT64 at [3:4]]
•When you configure an IN function that returns a value of integer data type in an Expression transformation and writes the value to any data type in the target in source SQL ELT optimization, the mapping fails with the following error:
The following error occurred: [For input string: "true"]
•When you run a mapping enabled with full SQL ELT optimization and if the arguments are null in the DATE_COMPARE function, the mapping runs without SQL ELT optimization.
•When you configure a mapping enabled with full SQL ELT optimization, the mapping switches to source SQL ELT optimization or runs without SQL ELT optimization if the operands of +, -, *, or / operators contain NULL in the expression, aggregator, or filter transformations.
•When you push the REG_REPLACE() function to the Google BigQuery database and specify a backslash (\) in the column name or a nested function, the mapping fails with the following error:
[ERROR] The Secure Agent failed to run the full SQL ELT query due to the following error: [Cannot parse regular expression: invalid escape sequence: \o]
•When you push the REG_REPLACE() function to the Google BigQuery database, ensure that the column name or nested function does not contain a single backslash. The function only supports single backslashes that are followed by a digit or another backslash.
You might encounter a data mismatch when you compare a mapping that runs with full SQL ELT optimization and a mapping that runs without SQL ELT optimization when you use a double backslash in the nested function.
•You cannot use the following escape sequences with the regular expression in the REG_REPLACE() function:
Escape sequence
Description
\1
Back reference
\b
Backspace (use \010)
\cK
Control char ^K (For example, use \001)
\e
Escape (use \033)
\g1
Back reference
\g{1}
Back reference
\g{+1}
Back reference
\g{-1}
Back reference
\g{name}
Named back reference
\g<name>
Subroutine call
\g'name'
Subroutine call
\k<name>
Named back reference
\k'name'
Named back reference
\lX
Lowercase X
\ux
Uppercase x
\L...\E
Lowercase text ...
\K
Reset beginning of $0
\N{name}
Named Unicode character
\R
Line break
\U...\E
Upper case text ...
\X
Extended Unicode sequence
\%d123
Decimal character 123
\%xFF
Hex character FF
\%o123
Octal character 123
\%u1234
Unicode character 0x1234
\%U12345678
Unicode character 0x12345678
•You cannot use the following empty strings with the regular expression in the REG_REPLACE() function:
Empty string
Description
\g
At beginning of subtext being searched
\G
At end of last match
\Z
At end of text, or before newline at end of text
(?=re)
Before text matching re
(?!re)
Before text not matching re
(?<=re)
After text matching re
(?<!re)
After text not matching re
re&
Before text matching re
re@=
Before text matching re
re@!
Before text not matching re
re@<=
After text matching re
re@<!
After text not matching re
\zs
Sets start of match (= \K)
\ze
Sets end of match
\%^
Beginning of file
\%$
End of file
\%V
On screen
\%#
Cursor position
\%'m
Mark m position
\%23l
In line 23
\%23c
In column 23
\%23v
In virtual column 23
•When you configure the DATE_DIFF function in an Expression transformation and the transformation output is set to double or decimal data type in a mapping enabled with SQL ELT optimization, the function returns an integer value. However, when you run the mapping without SQL ELT optimization, the function returns a double value. Also, you might encounter a data mismatch when the transformation output is set to integer data type.
•When you set the optional property OptimizeCastsInPDO in the Google BigQuery connection, you can compare the following data types in the DATE_DIFF function:
Date1 argument
Date2 argument
Format argument
Date
Date
Year, Month, and Day
Date
Datetime
Year, Month, Day, Hour, Minute, Second, Millisecond, and Microsecond
Datetime
Date
Year, Month, Day, Hour, Minute, Second, Millisecond, and Microsecond
Datetime
Datetime
Year, Month, Day, Hour, Minute, Second, Millisecond, and Microsecond
Time
Time
Hour, Minute, Second, Millisecond, and Microsecond
Timestamp
Timestamp
Day, Hour, Minute, Second, Millisecond, and Microsecond
•When you set the format argument to year, month, nanosecond, or null in the DATE_DIFF function, the mapping runs with source SQL ELT optimization or without SQL ELT optimization.
•When you configure DECODE or IFF functions along with AND, OR, or NOT IN operators in a mapping enabled with SQL ELT optimization, the mapping might fail.
MD5() function
When you push the MD5 function to Google BigQuery, adhere to the following guidelines:
- You can use only the string data type as the return type.
- The MD5 function in a mapping enabled with SQL ELT optimization uses BASE64 semantics as the string format by default. To use BASE16 semantics as the string format, set the optional property UseBase16ForMd5 in the Google BigQuery V2 connection. However, when you run the mapping without SQL ELT optimization, it uses BASE16 semantics as the string format.
- When you configure the MD5 function with BASE16 semantics, the function output differs in a mapping enabled with or without SQL ELT optimization.
Operators with Google BigQuery V2
When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent operators in the database. If there is no equivalent operator, the Secure Agent processes the transformation logic.
The following table lists the SQL ELT optimization operators that you can push to the Google BigQuery database by using full SQL ELT optimization:
Operator
Operator
+
=
-
>=
*
<=
/
!=
%
AND
||
OR
>
NOT
<
Rules and guidelines
When the argument in the NOT operator is a null value, the mapping runs with source SQL ELT optimization or without SQL ELT optimization.
Transformations with Google BigQuery V2
When you configure SQL ELT optimization, the Secure Agent tries to push the configured transformation to Google BigQuery.
The following list summarizes the availability of transformations that you can push down to Google BigQuery:
•Aggregator
•Expression
•Filter
•Joiner
•Lookup
•Rank
•Router
•Sorter
•SQL
•Union
Aggregator transformation
You can configure full SQL ELT optimization to push an Aggregator transformation to process in Google BigQuery.
You can perform the following aggregate calculations:
•AVG
•COUNT
•MAX
•MIN
•SUM
•STDDEV
•VARIANCE
When you configure an Aggregator transformation, you must use each of the incoming ports either in an aggregate function or in a group by field to define how to group data for aggregate expressions.
Lookup transformation
You can configure full SQL ELT optimization to push a Lookup transformation to process in Google BigQuery. This applies to both connected and unconnected lookups.
You can add the following lookups:
•Cached
•Uncached
•Unconnected with cached
When you look up data and the lookup condition finds multiple matches, the lookup returns all rows. In a mapping with Google BigQuery as target, you must set the Multiple Matches option for the lookup object to Return all rows. If you enabled Multiple Matches to any option other than Return all rows, the agent ignores it.
When you configure a Lookup transformation based on a Google BigQuery source, adhere to the following guidelines:
•If there are null values in a lookup column, the mapping does not push the rows with null values to the Google BigQuery target. However, if you run the mapping without full SQL ELT optimization, the rows with null values are written to the target.
•When you specify multiple lookup conditions, ensure at least one of the lookup condition uses the Equals operator.
•Ensure that you specify the same Google BigQuery region ID for the source, lookup, and target connection.
•When you use a Lookup transformation, ensure that you select the Lookup caching enabled property in the lookup advanced properties.
•When you use an unconnected lookup and use an Expression transformation to assign the unconnected Lookup transformation output to a variable port, the mapping runs without SQL ELT optimization.
•When you use a completely parameterized lookup condition where the input parameter holds the default value and you specify an override from the task using the parameter file, the task does not honor the override and runs with the default value.
When you configure an unconnected Lookup transformation, consider the following rules:
•You must select the Multiple Matches property value as Report error in the unconnected lookup properties for SQL ELT optimization to work.
•You can only configure an Expression transformation for an output received from an unconnected lookup.
SQL transformation
You can use an SQL transformation to push supported scalar functions to Google BigQuery. When you configure SQL ELT optimization for a mapping, you can use Java or SQL user-defined functions (UDFs) in a SQL transformation and run queries with the Google BigQuery target endpoint.
You can use only the SELECT clause SQL statement to push down a function. The following snippet demonstrates the syntax of a simple SELECT SQL query:
You can push a SQL transformation with the following restrictions:
•You can configure only a SQL query in the SQL transformation. You cannot enable a stored procedure when you push down to Google BigQuery.
•The SQL query must be a simple SELECT statement without 'FROM' and 'WHERE' arguments. The SQL transformation only supports functions with simple SELECT statement.
•You can only use a SQL transformation when the SELECT statement is present in the query property. Even if an entire query containing the SELECT statement comes from a parameterized input port, the SQL ELT optimization fails.
•If any SQL error occurs, the error is added to the SQLError field by default. However, when you run a mapping enabled with SQL ELT optimization, the SQLError field remains as Null.
•The NumRowsAffected field records the number of rows affected while computing the output buffer. However, for SQL transformation, the NumRowsAffected is 0, as the query runs for all the records at the same time.
•Google BigQuery offers only passive behavior of SQL transformations where the support for dynamic queries are limited.
•User defined functions containing special characters in its function name are supported. You need to enclose the Full UDF function name with backtick (`) character if it contains special characters.
•You cannot specify the user defined functions in a legacy SQL query.
•You cannot use sub-query and join condition in the SQL transformation.
•You cannot use temporary UDF in the SQL transformation.
•You cannot use the following parameterization scenarios:
- Entire query as a parameter
- Field names in a query as a parameter
- In-out and input parameters in a query
Variables with Google BigQuery V2
When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent variables in the database. If there is no equivalent variable, the Secure Agent processes the transformation logic.
The following table lists the SQL ELT optimization variables that can be used in an Google BigQuery database. Columns marked with an X indicate that the variable can be pushed to the Google BigQuery database by using full SQL ELT optimization.
Variable
SQL ELT optimization
SESSSTARTTIME
X
SYSDATE
-
WORKFLOWSTARTTIME
-
Data types with Google BigQuery V2
The following table lists the Google Cloud Storage data types based on the file format type that can be pushed to the Google BigQuery database:
The following table lists the Google BigQuery native data types that can be mapped to the comparable transformation data types in a mapping configured with SQL ELT optimization:
Google BigQuery data type
Transformation data type
Boolean
String
Date
Date/Time
DateTime
Date/Time
Float
Double
Integer
BigInt
Numeric
Decimal
Default precision 28, scale 9.
String
String
Byte
Byte
Time
Date/Time
Timestamp
Date/Time
When you set the custom property OptimizeCastsInPDO:true in Google BigQuery V2 connection optional properties, you can map the following date time data types in Google BigQuery source to a target in a mapping enabled with SQL ELT optimization:
Source data type
Supported target data type
Date
Date, Date/Time
Time
Time
Date/Time
Date, Time, Date/Time
Timestamp
Time, Timestamp
Read from and write to Google BigQuery
You can configure SQL ELT optimization in a mapping to read from and write to Google BigQuery using a Google BigQuery V2 connection.
Example
You work in a motorbike retail company with more than 30,000 dealerships and 2000 inspection centers globally. The company stores millions of records in Google BigQuery hosted on GCP. You want to use Data Integration to perform some transformations on the data before you write back to Google BigQuery.
Use a Google BigQuery V2 connection in the mapping to read from the Google BigQuery source and write the processed data to the Google BigQuery target. Configure full SQL ELT optimization in the mapping to enhance the performance.
Supported features
You must configure a Google BigQuery V2 connection with simple or hybrid mode when you enable SQL ELT optimization in a mapping task.
Note: If you configure a Google BigQuery V2 connection with complex mode, the Secure Agent logs an SQL ELT optimization validation error in the session logs file and the mappings run in the Informatica runtime environment without full SQL ELT optimization.
When you configure SQL ELT optimization, the mappings support the following advance properties for a Google BigQuery V2 source:
•Source Type - Single, Query, Multiple Objects, and Parameter
•Query options - Filter. Supports both simple and advanced filter conditions. You can use both the source filter in conjunction with the Filter transformation in the mapping.
•Source Dataset ID
•Source Table Name
•Number of Rows to Read
•Job Poll Interval In Seconds
•pre SQL - using standard SQL query
•post SQL - using standard SQL query
•pre SQL Configuration
•post SQL Configuration
•SQL Override Query - using standard SQL query
•Billing Project ID
When you configure SQL ELT optimization, the mappings support the following advance properties for a Google BigQuery V2 connected and unconnected lookup:
•Source type - Single
•Source type - Query
•Source type - Standard and materialized views
•Source Object Type - Parameter
•Source Dataset ID
•Source Table Name
•Job Poll Interval In Seconds
•pre SQL - using standard SQL query
•post SQL - using standard SQL query
•pre SQL Configuration
•post SQL Configuration
•SQL Override Query - using standard SQL query
•Billing Project ID
When you configure SQL ELT optimization, the mappings support the following properties for an Google BigQuery V2 target:
•Target Object Type - Single, Parameter
•Operation
- Insert
- Update
- Upsert
- Delete
- Data Driven
Note: You can implement the Update Strategy through target operations.
•Data Driven Condition
•UpdateMode
•Enable Merge
•Target Dataset ID
•Target Table Name
•Create Disposition for Insert operation. Supports only Create if never option.
•Write Disposition for Insert operation. Supports only Write append option.
•Write Mode. Use Bulk mode to push data into Google BigQuery.
•Truncate target table
•Job Poll Interval In Seconds
•pre SQL - using standard SQL query
•post SQL - using standard SQL query
•pre SQL Configuration
•post SQL Configuration
•Billing Project ID
Note: If you configure target advanced properties that are not supported, the Secure Agent logs an validation error in the session logs and the mappings run in the Informatica runtime environment without full SQL ELT optimization.
Rules and guidelines for mappings that read from and write to Google BigQuery
When you configure SQL ELT optimization in a mapping that reads from and writes to Google BigQuery, consider the following guidelines:
•When you perform an upsert operation, you must select the Enable Merge option in the target advanced properties.
•You cannot use system variables in filters.
•If a mapping contains a Filter transformation and also a filter in the Source transformation, the mapping consolidates the filter conditions from both these transformations to filter the records. However, it is recommended that you use only one of these filters at a time in a mapping.
•You cannot apply a filter for query and multiple source objects.
•A native filter cannot contain a sub-query.
•When you select the source type as Query, ensure that you do not select the Retain existing fields at runtime option on the Fields tab. Otherwise, the mapping fails with the following error:
Error: The Secure Agent failed to run the full SQL ELT query due to the following error: [Field not found inside table]
•When you configure a Target transformation in a mapping with a delete operation and the source type uses a query that contains Union ALL, the mapping fails. To avoid this error, before you run the mapping, you need to select the Enable Merge property in the target advanced properties. The mapping issues a merge query and runs successfully.
•If the source data that the mapping read contains the Binary data types, data preview for SQL ELT optimization fails.
Read from Google Cloud Storage and write to Google BigQuery
You can configure SQL ELT optimization for a mapping that uses a Google Cloud Storage connection in the Source transformation to read from Google Cloud Storage and a Google BigQuery V2 connection in the Target transformation to write to Google BigQuery.
Example
You work for a rapidly growing data science organization. Your organization develops software products to analyze financials, building financial graphs connecting people profiles, companies, jobs, advertisers, and publishers. The organization uses infrastructure based on Google Cloud Platform and stores its data in Google Cloud Storage files. The organization plans to implement a business intelligence service to build visualization and perform real-time analysis. You can load data from Google Cloud Storage to Google BigQuery by configuring the transformations to support the adequate data warehouse model and the consuming requirements.
Create an Google Cloud Storage V2 connection to read data form the Google Cloud Storage source. Create an Google BigQuery V2 connection and use SQL ELT optimization to write data to the Google BigQuery target to enhance the performance and reduce the cost involved.
Supported features
When you configure SQL ELT optimization, the Google Cloud Storage V2 connection supports the following properties:
•Service Account ID
•Service Account Key
•Project ID
When you configure SQL ELT optimization, the mappings support the following properties for a Google Cloud Storage V2 source:
•Source connection, connection parameter
•Source Type - Single, parameter
•Parameter
•Format - Delimited, Avro, Parquet, and JSON. ORC is not applicable.
Note: None is not supported.
•Delimited file formatting options
- Delimiter
- Qualifier
- Header Line Number
- First Data Row
•Google Cloud Storage Path
•Source File Name
•Is Directory
Rules and guidelines for mappings that read from Google Cloud Storage V2 source
Use the following rules and guidelines when you configure SQL ELT optimization in a mapping that reads from a Google Cloud Storage V2 source and writes to a Google BigQuery target:
•The source fields must start with a letter or an underscore and can contain letters, numbers, and underscores up to a maximum of 300 characters. You cannot read source fields with special characters.
•When you read a boolean integer column and write to a boolean string column in a mapping, the mapping fails.
•When you write data with the Numeric data types to a Google BigQuery target created at runtime, where the source column has precision greater than 28, the mapping runs without SQL ELT optimization.
•When you set the JVM option system property for the DTM type to -DHonorInfaDateFormat=true for the Secure Agent and configure a mapping with SQL ELT optimization, the mapping fails with the following error if it reads the date data type that is not in the YYYY-MM-DD format:
The Secure Agent failed to run the full SQL ELT query due to the following error: [Failed to parse input string "1972-12-31"]
•When you map a string data type in the source to a time data type in the target, and the data is in the format HH24:MI:SS.US, the mapping fails with the following error:
[Invalid timestamp: '00:00:00.000001']
Read from Amazon S3 and write to Google BigQuery
You can configure SQL ELT optimization for a mapping that uses an Amazon S3 V2 connection in the Source transformation to read from Amazon S3 and a Google BigQuery connection in the Target transformation to write to Google BigQuery.
Example
You work for a healthcare organization. Your organization offers a suite of services to manage electronic medical records, patient engagement, telephonic health services, and care coordination services. The organization uses infrastructure based on Amazon Web Services and stores its data on Amazon S3. The management plans to load data to a data warehouse to perform healthcare analytics and create data points to improve operational efficiency. To load data from an Amazon S3 based storage object to Google BigQuery, you must use ETL and ELT with the required transformations that support the data warehouse model.
Use an Amazon S3 V2 connection to read data from a file object in an Amazon S3 source and a Google BigQuery connection to write to a Google BigQuery target. Configure full SQL ELT optimization in the mapping to optimize the performance.
Amazon S3 prerequisites
You need to complete the following prerequisites in Amazon S3 before you can read data from an Amazon S3 source and write to Google BigQuery:
•To load data from an Amazon S3 data source, you must:
- Specify the URI for the Amazon S3 source.
- Provide your access key ID and secret access key to access the Amazon S3 bucket.
- Set the minimum required policy AmazonS3ReadOnlyAccess on your Amazon S3 source data.
•Enable the BigQuery Data Transfer Service for your project. To enable the BigQuery Data Transfer Service, you must be granted the owner role for your project.
For more information on configuring these prerequisites in Amazon S3, see the Amazon S3 documentation.
Supported features
When you configure SQL ELT optimization, the following connection properties of Amazon S3 V2 source are supported:
•Access Key
•Secret Key
•Folder Path
When you configure SQL ELT optimization, the mappings support the following properties for an Amazon S3 V2 source:
•Source connection parameter
•Source Type - Single, Parameter
•Object
•Parameter
•Format - Flat, Avro, Parquet, and JSON. ORC and None are not applicable.
•Flat file formatting options:
- Delimiter
- First Data Row
•Source Type
•Folder Path
•File Name
When you configure SQL ELT optimization, the mapping supports the following transformations:
•Expression
•Filter
Note: You can run a mapping that reads from an Amazon S3 source and writes to a Google BigQuery target, both belonging to different regions.
For information about the configurations for the listed options, see the help for the Amazon S3 V2 Connector.
Rules and guidelines for mappings that read from Amazon S3 source
Use the following rules and guidelines when you configure SQL ELT optimization in a mapping that reads from an Amazon S3 source and writes to a Google BigQuery target:
•Do not map the boolean data type in Amazon S3 to the boolean data type in Google BigQuery. Else, the mapping fails.
•When you edit the metadata in the mapping, you cannot add or remove source fields or change the scale and precision of data types. However, you can edit the field data types.
•When you read data in AVRO, JSON, or CSV format, ensure that the date is in YYYY-MM-DD format and time is in hh:mm:ss format in the DATE, TIME, DATETIME, and TIMESTAMP columns.
•The source fields must start with a letter or an underscore and can contain letters, numbers, and underscores up to a maximum of 300 characters. You cannot read source fields with special characters.
•When you write data with the Numeric data types to a Google BigQuery target created at runtime, where the source column has precision greater than 28, the mapping runs without SQL ELT optimization.
•When you write the DATE, TIME, or DATETIME data types to a Google BigQuery target, you must match the agent time zone with the time zone of the Google BigQuery application.
•In a mapping enabled with SQL ELT optimization, you cannot read a single directory from multiple subdirectories. When you select the source type as Directory in the advanced source properties to read objects stored in subdirectories from an Amazon S3 source, you must select the Enable Recursive Read option. Otherwise, the mapping runs without SQL ELT optimization.
•When you write data from Avro or Parquet file formats in an Amazon S3 source to a Google BigQuery target created at run time, you must delete the Filename field in the mapping.
•When you configure a lookup from an Amazon S3 or a Google Cloud Storage V2 object in a mapping, the mapping runs without SQL ELT optimization.
•When you read data from a smaller dataset such as Transaction Processing Council Ad-hoc/decision support benchmark (TPC-H) scale factor 1 or below and run a mapping enabled with SQL ELT optimization, the mapping takes 30% more time to process the data as compared to the mapping that runs without SQL ELT optimization.
•When you read from an Amazon S3 source and write to a Google BigQuery target, the time taken to load data to the Google BigQuery staging in the first and subsequent mapping runs for the same dataset and resources is inconsistent.
•When you configure a mapping enabled with SQL ELT optimization to read a boolean integer column and write to a boolean string column, the mapping fails.
•When you read data from an Amazon S3 source and write to a Google BigQuery target, it takes a few minutes to initialize the transfer to the Google BigQuery target.
•When you upload a file in an Amazon S3 bucket and then immediately run a data transfer task, the source file is not detected. Wait for at least five minutes and then run the mapping again.
• When you run a mapping enabled with SQL ELT optimization to read data with wildcard characters from an Amazon S3 source and write to Google BigQuery, the mapping runs without SQL ELT optimization and fails with the following error:
Wild card character option is not valid in the native mode of execution
•When you set the JVM option system property for the DTM type to -DHonorInfaDateFormat=true for the Secure Agent and configure a mapping with SQL ELT optimization, the mapping fails with the following error if it reads the date data type that is not in the YYYY-MM-DD format:
The Secure Agent failed to run the full SQL ELT query due to the following error: [Failed to parse input string "1972-12-31"]
•When you map a string data type in the source to a time data type in the target, and the data is in the format HH24:MI:SS.US, the mapping fails with the following error:
[Invalid timestamp: '00:00:00.000001']
Rules and guidelines for SQL ELT optimization
Certain rules and guidelines apply when you enable a mapping for SQL ELT optimization to a Google BigQuery database.
When you configure a Google BigQuery source, Google Cloud Storage source, or Google BigQuery target, adhere to the following guidelines:
•The Service Account ID associated with the Google BigQuery V2 connection must have permissions to access Google Cloud Storage buckets and files.
•You cannot enable full SQL ELT optimization for a mapping task when the target table contains columns of record data type or repeated columns.
•You cannot enable full SQL ELT optimization for a mapping task when the task contains a mapping with a single transformation connected to multiple transformations downstream or multiple transformations connected to a single transformation.
•You must ensure that the column header names in the Google Cloud Storage source file does not contain unicode characters. Otherwise, the mapping fails.
• When you enable SQL ELT optimization for a mapping with multiple pipelines to write to the same Google BigQuery target and the Truncate target table option is enabled in each pipeline, the target table is truncated for each pipeline when data is inserted into the target.
For example, if there are two pipelines, in pipeline 1 the target table is truncated and then the data is inserted. Similarly, in Pipeline 2 the target table is truncated and the data is inserted into the target table. Hence, the target table contains only the data from pipeline 2.
When you run a mapping without SQL ELT optimization and the mapping contains multiple pipelines, the target tables are truncated at once for all pipelines and then the data is inserted.
•When you configure a Filter transformation or specify a filter condition, you must ensure that you do not specify special characters. Use the ASCII value for the special character in the filter condition.
•When you parameterize the Google BigQuery V2 connection, source, target and provide values in the mapping task using a parameter file, the default values for the parameter are not overridden with the values in the parameter file.
•If the Google Cloud Storage source file contains a column of Boolean data type, SQL ELT optimization query fails.
•You must ensure that the Google BigQuery source object does not contain any partitions.
•When you read from a Google BigQuery source and edit the metadata for the source fields, the Secure Agent ignores the changes to the metadata.
•If the Google BigQuery source and target object resides in the same region other than US, do not specify the Region ID explicitly in the connection.
•You must not specify a legacy SQL query in the pre SQL and post SQL advanced source or target properties.
•A mapping run without SQL ELT optimization fails if any of the Pre-SQL and Post-SQL commands fail in a multi-statement query. Previously mappings were successful.
•When you specify custom query as a source object and specify a dataset name in the Source Dataset ID source advanced property, the mapping runs without full SQL ELT optimization.
•When you specify custom query as a source object and specify an SQL override query, you must specify a dataset name in the Source Dataset ID source advanced property.
•When you specify custom query as a source object and specify an SQL override query with different column names, ensure that the data types and the order of the columns that appear in the SQL override query matches the data types and order in which they appear in the custom query.
•When you select a view as a source object that contain columns of the Record data type or repeated columns and create a new target at runtime, a validation error appears in the session logs and the mapping runs without full SQL ELT optimization.
•To load data into columns of date, time, datetime, or timestamp in a Google BigQuery target, you must pass the data through the TO_DATE() function as an expression and map the results to the target column.
•When you set SCD Type 2 merge optimization context for a mapping, you cannot use filter, joiner, and custom SQL query.
•If the mapping contains a Router transformation output connected to a Sequence Generator transformation, the mapping does not push down the mapping logic to the point where the transformation is supported and runs without SQL ELT optimization.
•When you push down a Router transformation with IIF and IS_SPACE() functions in mapping that reads from and writes to Google BigQuery, and the Boolean values are 0 and 1, the mapping fails. When the Boolean values are true and false, the mapping runs successfully.
•When you use multiple functions within a transformation and one of the functions cannot be pushed to Google BigQuery, the mapping runs without SQL ELT optimization.
•When the mapping contains multiple pipelines and a function within one of transformations cannot be pushed to Google BigQuery, the mapping does not push down the mapping logic to the point where the transformation is supported and the mapping runs without SQL ELT optimization.
•When you read from or write data to Google BigQuery objects associated with different projects in different Google service accounts that resides in different regions, the mapping runs without SQL ELT optimization.
•When you use the data driven operation to write data to a Google BigQuery target and enable the Disable Duplicate Update Rows target advanced property, the Secure Agent ignores the Disable Duplicate Update Rows property.
•When you read data from a Google BigQuery source that contains duplicate update keys and enable the Disable Duplicate Update Rows target advanced property, the Secure Agent ignores the Disable Duplicate Update Rows property.
•When you configure a mapping that includes any of the following datetime scenarios, the mapping runs without SQL ELT optimization:
- Map data from the TIME data type to any other date/time data type
- Map data from the DATE data type to the TIME data type
- Compare data of the TIME and TIMESTAMP data types with the DATE or DATETIME data types
• When you use $$$SESSSTARTTIME variable in a custom query, the variable returns the session start time as a string value. Use the following syntax to convert the string values to timestamp or datetime:
- SELECT PARSE_DATETIME('%m/%d/%Y %H:%M:%E6S', '$$$SESSSTARTTIME' ) as t1;
- SELECT cast(substr(cast('$$$SESSSTARTTIME' as string),0,19) as datetime FORMAT 'MM/DD/YYYY HH24:MI:SS') as datetime;
Ensure that the time zones of the Google BigQuery project and the agent machine are the same.
•When you set the OptimizeCastsInPDO:true custom property in the advanced settings for a Google BigQuery V2 connection, the SESSSTARTTIME, SYSDATE, SYSTIMESTAMP, and TO_DATE functions return data of DATETIME data type.
•When you configure a native filter in the Source transformation, ensure that you do not prefix the field name with the table name and dataset name in the filter expression. Otherwise, the mapping fails.
•When you configure an insert operation and set the Write Disposition property as Write Truncate in the target transformation properties, the mapping appends the records to the target table instead of truncating the target table before loading data. To configure a truncation when you insert records, you need to select the Truncate target table option in the target advanced properties.
•When you create a target table at runtime and perform DML operations, the mapping might fail if the expression port returns a null value of a non-integer data type.
Troubleshooting a SQL ELT optimization task
Mapping fails when configured to read date, timestamp, or datetime information and write to default date/time format
When you configure a mapping to read date, timestamp, or datetime information from a string column and process the data with the default date/time format to write to Google BigQuery target, the mapping fails with the following error:
[ERROR] The Secure Agent failed to run the full SQL ELT query due to the following error: [Invalid timestamp: '12/31/1972 00:00:00.000001']
To resolve this issue, set the JVM option -DHonorInfaDateFormat=true for the Secure Agent.
Perform the following steps to configure the JVM option in Administrator:
1Select Administrator > Runtime Environments.
2On the Runtime Environments page, select the Secure Agent machine that runs the mapping.
3Click Edit.
4In the System Configuration Details section, select Data Integration Server as the Service and DTM as the Type.
5Edit the JVMOption system property and set the value to -DHonorInfaDateFormat=true.
6Click Save.
Mapping fails when configured to read time data in string data type and write to date/time data type
When you set the JVM option system property for the DTM type to -DHonorInfaDateFormat=false for the Secure Agent and also set the optional property OptimizeCastsInPDO=true in the Google BigQuery connection and run a mapping to read the time data in string data type from an Amazon S3 or Google Cloud Storage source and write to date/time data type, the mapping fails with the following error:
The Secure Agent failed to run the full SQL ELT query due to the following error: [Invalid timestamp: '12/31/1972 00:00:00.000001']
To resolve this issue, set the JVM option -DHonorInfaDateFormat=true for the Secure Agent and also provide the source data with the format provided in the DateTime Format String advanced session property in the mapping task.
Perform the following steps to configure the JVM option in Administrator:
1Select Administrator > Runtime Environments.
2On the Runtime Environments page, select the Secure Agent machine that runs the mapping.
3Click Edit.
4In the System Configuration Details section, select Data Integration Server as the Service and DTM as the Type.
5Edit the JVMOption system property and set the value to -DHonorInfaDateFormat=true.