Snowflake Data Cloud Connector > Part III: SQL ELT with Snowflake Data Cloud Connector > SQL ELT optimization for mapping tasks > Transformations with Snowflake Data Cloud

Transformations with Snowflake Data Cloud

When you configure SQL ELT optimization, Data Integration tries to push the configured transformation to Snowflake.

You can use full SQL ELT to push the following transformations to Snowflake:

•Aggregator
•Expression
•Filter
•Hierarchy Processor
•Joiner
•Lookup
•Normalizer
•Rank
•Router
•Sequence Generator
•SQL*
•Sorter
•Union
•Update Strategy*

*Doesn't apply to mappings in advanced mode.

Note: Router transformation is applicable only for source SQL ELT and Hierarchy Processor transformation is applicable only to mappings in advanced mode.

For more information about configuring transformations, see Transformations.

Aggregator transformation

You can configure full SQL ELT optimization to push an Aggregator transformation to process in Snowflake.

Aggregate calculations

You can perform the following aggregate calculations:

•AVG
•COUNT
•MAX
•MIN
•MEDIAN
•SUM
•VARIANCE

Incoming ports

When you configure an Aggregator transformation and the incoming port is not used in an aggregate function or in a group by field in mappings, the ANY_VALUE() function is used for columns that are not part of the group by or the aggregator function. In this case, the output is not deterministic as the ANY_VALUE() function returns any value from the port. However, in advanced mode, when the incoming port is not a part of the group by field, the MAX() function is used. The Aggregator transformation also generates an additional column with value as 1. However, this column is dropped and not used in the insert part of the SQL ELT query.

Expression transformation

You can configure full SQL ELT optimization to push an Expression transformation to process in Snowflake.

You can add an Expression transformation to each of the sources in the mapping, followed by a join downstream in the mapping. Additionally, you can add multiple Expression transformations that branch out from a transformation and then branch in into a transformation downstream in the mapping.

When you configure an Expression transformation, consider the following rules to include variables in the expression:

•You cannot use variables where you are using the value assigned while processing a previous row for calculations in the current row. if you do, the mapping runs without SQL ELT optimization.
•The variables can be nested, but you cannot refer to a variable before it is defined in the expression.

If the variables are not defined in that order, the mapping runs without SQL ELT optimization.

For example,

var: AGEPLUS2 = AGEPLUS1 + 1
var: AGEPLUS1 = AGE + 1
out: NEXTAGE = AGEPLUS2 + 1

Here, AGE +1 is defined later. AGEPLUS2 in the first variable refers to AGEPLUS1 and remains unresolved.

To resolve this, specify the variables in the following order:

var: AGEPLUS1 = AGE + 1
var: AGEPLUS2 = AGEPLUS1 + 1
out: NEXTAGE = AGEPLUS2 + 1

•The variables cannot have an expression that is cyclic or refers to itself:

For example,

var: AGEPLUS1 = AGEPLUS2 + 1
var: AGEPLUS2 = AGEPLUS1 + 1
out: NEXTAGE= AGEPLUS2

Here, AGEPLUS1 refers to AGEPLUS2 and remains unresolved.

Hierarchy Processor transformation

In advanced mode, you can configure a Hierarchy Processor transformation to read hierarchical or relational input from the Amazon S3 V2 or Microsoft Azure Data Lake Storage Gen2 source and write as relational or hierarchical output to the Snowflake target.

The Hierarchy Processor transformation processes hierarchical fields that represent a struct or an array.

You can configure a Hierarchy Processor transformation with the following restrictions:

•If the array element of the hierarchical output field contains the fixed-point Number data type, the mapping runs without SQL ELT optimization.
•Decimal values of the Double data type are written in exponential notation.

For example, when you write a Double data type 2341.6789 to an output field in the Snowflake target, the output appears as 2.341678900000000e+03.

•If you select Use input group or incoming fields as a data source and read hierarchical or relational input from the source that contains more than one row and write as hierarchical output to a Snowflake target, Data Integration duplicates records for each row.

To avoid writing duplicate rows to the target, either select Inherit parent's data sources as a data source or filter child fields from the data source using the filter condition.

•To write Integer or Bigint data type from a Struct field to a Snowflake target, select advanced.custom.property from the Session Property Name list, and then enter the following value in the mapping task:

DisableAdvancedMappingRuntimeValidation=true

•You can't write data from an Avro or Parquet file that contains multi-level struct fields to a Snowflake target.

Lookup transformation

You can configure full SQL ELT optimization to push a Lookup transformation to process in Snowflake. You can push both a connected and an unconnected lookup.

When the mapping contains an unconnected lookup, you can also nest the unconnected lookup function with other expression functions. For example, :LKP.U_LOOKUP(Upper(argument1), argument)

Lookup objects

Consider the following rules when you configure lookups:

•You can configure a lookup for Snowflake when the Source transformation uses the following sources:

- Amazon S3
- Google Cloud Storage*
- Microsoft Azure Data Lake Storage Gen2
- Snowflake source

*Doesn't apply to mappings in advanced mode.

•You can configure a lookup for an Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, or Snowflake object only when the Source transformation uses the corresponding Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, or Snowflake source.

Connected lookup

In a mapping with a Snowflake Data Cloud source and target, set the Multiple Matches option for the lookup object to Return all rows.

Unconnected lookup

When you configure an unconnected Lookup transformation, consider the following rules:

•Do not configure an expression for an output received from an unconnected lookup.
•In a mapping with a Snowflake Data Cloud source and target where some of input fields from the unconnected lookup transformation is not mapped to the Snowflake target object, the select query includes all the unmapped fields.

Multiple matches behavior in a connected and unconnected lookup

If you enable SQL ELT optimization for a mapping that contains a connected or unconnected lookup, you must follow these guidelines:

•In an unconnected lookup, ensure that you always select the Multiple Matches option to Report Error. When you look up data and the lookup condition finds multiple matches, all the matching rows are selected and the task runs with SQL ELT optimization. If you enabled Multiple Matches to any option other than Report Error, the mapping runs without SQL ELT optimization.
• In a connected lookup, you can set the Multiple Matches option to Return all rows or Report Error. When you set the Multiple Matches option to Report Error, you can set the Lkp_apdo_allow_report_error custom flag in the task advanced session properties to determine how Data Integration handles multiple matches:

- When you set the property to Yes and if there are multiple matches in the data, the multiple match policy is ignored and the job runs successfully with SQL ELT optimization.
- When you do not set the property, and if there are multiple matches in the data, Data Integration considers the policy and displays a warning message. SQL ELT optimization is ignored and the task fails.

FileName port

When you configure a lookup for an Amazon S3 source in a mapping that contains an Amazon S3 source and Snowflake target, remove the filename port from both the Amazon S3 source and lookup object. The FileName port is not applicable.

Lookup query object

When you use a lookup object as a query in a Lookup transformation in a mapping to lookup data in Snowflake, specify the database and schema in the advanced lookup properties or in the additional JDBC URL parameters in the Snowflake Data Cloud connection.

Normalizer transformation

You can configure a Normalizer transformation in a mapping to return a row for each instance of the multiple-occurring data.

For example, a relational source includes four fields with quarterly sales data. You can configure a Normalizer transformation to generate a separate output row for each quarter.

In advanced mode, you can configure a Normalizer transformation with the following restrictions:

•To normalize multiple-occurring fields from different groups, map each group to a unique target and exclude all fields that are from different groups from the list of incoming fields even though the groups are not mapped to the target.
•To write normalized data to multiple targets, map all generated column IDs of normalized fields to the corresponding targets.

Router transformation

You can configure source SQL ELT optimization to push a Router transformation to the database for processing.

When you configure a Router transformation, connect or map only one output group to the target transformation.

Sequence Generator transformation

When you configure a Sequence Generator transformation, you can connect the NEXTVAL port to single or multiple ports in the transformations that follow the Sequence Generator transformation.

You can push a Sequence Generator transformation with the following restrictions:

•You cannot use a shared sequence in a Sequence Generator transformation.
•You can add only an Expression or a Target transformation after a Sequence Generator transformation in a mapping.

Guidelines for using a Sequence Generator transformation in mappings in advanced mode and SQL ELT mode

A Sequence Generator transformation in advanced mode and SQL ELT mode follows the same rules as in mappings. However, to use a Sequence Generator transformation in a mapping in advanced mode and SQL ELT mode, you need to additionally set a property in the mapping task properties, with the name of the sequence in Snowflake.

1On the Runtime Options tab in the task properties, navigate to the Advanced Session Properties section.
2Add a custom property and enter the following session property value in the following format:

Pushdown.<Sequence transformation name in mapping>.SequenceName=<sequence name to create in Snowflake>

For example,

Pushdown.Seq_SF.SequenceName=snowflake_first_sequence&:Pushdown.sequence_second.SequenceName=SECOND_SEQUENCE&:Pushdown.seq_third.SequenceName=THIRD_SEQ

When you use multiple Sequence Generator transformations, the name of the sequence object to be created in the Snowflake database for each sequence must be unique. If not, the transformation runs with the CREATE SEQUENCE IF NOT EXISTS query twice using the same name. The first sequence runs as expected, but the second sequence which has the same name as the first sequence does not override it. Hence, the nextValue generated for the second sequence is incorrect. However, when the sequence runs with a unique name for the sequence object, that sequence is not updated again, irrespective of the number of times you run the mapping.

Note: If you disable SQL ELT optimization and run the mapping, the sequence values restart from the initial start value. If you delete a mapping, you must ensure to also drop the sequences from the Snowflake database.

SQL transformation

You can use an SQL transformation only to push certain functions and shared sequence.

Use functions to run queries

You can include functions in an entered query in an SQL transformation and run queries with the Snowflake target endpoint.

You must use only the SELECT clause SQL statement to push a function. Specify the column name in the select query or function. Do not push functions using statements such as "SELECT * FROM TABLE".

You can use the following functions in an entered query:

•UUID_STRING
•RANDOM
•RANDSTR
•SIGN
•CURRENT_REGION
•CURRENT_ACCOUNT
•CURRENT_ROLE
•CURRENT_USER
•CURRENT_DATABASE
•CURRENT_SCHEMA
•DAYNAME
•SPLIT
•SPLIT_PART

To use the CURRENT_ROLE, CURRENT_DATABASE, and CURRENT_SCHEMA functions in an SQL transformation, ensure to provide the database, role, and schema name in the additional JDBC parameters field in the Snowflake Data Cloud connection. If you do not specify the values in the connection, Data Integration inserts null to the target.

Reuse shared sequence

You can push a mapping with a shared sequence defined in an SQL transformation to a Snowflake endpoint. Data Integration writes the data in the same sequence to the target as in the Snowflake source.

Get the shared sequence from Snowflake and define the sequence in an entered query in an SQL transformation.

Specify the shared sequence in the entered query in the following syntax: Select <Snowflake_schema_name>.<Snowflake_database_name>.<sequence_name>.NEXTVAL

User defined functions

You can configure a custom query in an SQL transformation to read from Java or SQL user-defined functions (UDF) in Snowflake.

The following guidelines apply for UDFs:

•You cannot read UDFs that have newline characters in the UDF name.
•If the UDF contains array parameters, the mapping runs without SQL ELT optimization.

Union transformation

You can push a Union transformation with the following restrictions:

•The Source transformation in the mapping must only include Snowflake source objects.
•A mapping runs without SQL ELT optimization when the source is Amazon S3, Google Cloud Storage, or Microsoft Azure Data Lake Storage Gen2.

Update Strategy transformation

You cannot use an Update Strategy transformation.

You can instead use the update and upsert operations in the Target transformation to write to Snowflake.