Snowflake Data Cloud Connector > Part II: Data Integration with Snowflake Data Cloud Connector > Targets for Snowflake Data Cloud > Target objects and operations
  

Target objects and operations

In a Target transformation, you can use a single object or parameter as the target type.
When you choose the target type, you can select the operation to insert, update, upsert, or delete data in a Snowflake target. You can also use the data driven operation to define expressions that flag rows for an insert, update, delete, or reject operation.

Target operation restrictions

Some rules apply when you configure a target operation.

Parameterized target

When you parameterize the target object and connection and enable the Allow parameter to be overridden at run time option in a transformation, you cannot override the object name using the fully qualified name such as db.schema.tablename. You must pass the db=<dbname>&schema<schemaname> values in the Additional JDBC URL Parameters field in the Snowflake Data Cloud connection.

Update columns

You need to specify the temporary key column to update data to or delete data from a Snowflake target. If the Snowflake target does not include a primary key column, click Add to add a temporary key. You can select multiple columns.
If the records from the source tables contain duplicate primary keys, perform one of the following tasks in mappings to update or delete records in Snowflake:
When you configure a mapping in advanced mode, you must additionally consider the following guidelines:

Data driven operation

When you configure a data driven expression to update the rows in Snowflake from a mapping in advanced mode, you need to specify the third argument in the IIF condition. If you do not specify a third argument in the IIF condition, the agent treats the operation as an insert for all non-matching rows.
For example, if you give a condition IIF(Update_column=2,DD_UPDATE) without including the third argument, such as DD_DELETE in the following expression IIF(Update_column=2,DD_UPDATE,DD_DELETE), for any other row where the value of the update_column is not equal to 2, the agent performs an insert by default.
In the example, from the expression IIF(COL_INTEGER = 2, DD_UPDATE), COL_INTEGER=2 resolves to 1 and COL_INTEGER!=2 resolves to 0, where 1 is the internal ID for the Update operation and 0 is the internal ID for the Insert operation. The value of the third argument when not specified defaults to 0.
You can use data driven operation with the following restrictions for mappings in advanced mode:

General restrictions

In advanced mode, mappings that write data to more than 500 columns fail with the following error: HTTP POST request failed due to IO error

Passthrough partitioning

You can configure partitioning to optimize the mapping or performance at run time when you write data to Snowflake targets.
The partition type controls how the agent distributes data among partitions at partition points. You can define the partition type as passthrough partitioning. With partitioning, the agent distributes rows of target data based on the number of threads that you define as partition.

Specify a target

You can use an existing target or create a new target to write data to Snowflake. If you choose to create a new target, the agent creates the target when it runs the task.
Ensure that the path for the schema and database that you specify for the target object is in uppercase letters. Specify the Snowflake database name and schema in the following format: <database name>/<schema name> for the target object is in uppercase letters. If you do not enter the path, the Secure Agent considers the schema and database name you specified in the Additional JDBC URL Parameters field from the Snowflake Data Cloud connection properties.
Specify the TableType as table. The TableType property is optional.
The following image shows an example of a target object configured to create at runtime: To create a new target object at runtime, specify the object name and the path where you want to create the table in Snowflake.

Restrictions for a Snowflake target

The following rules apply when you configure the target:

Override the update operation

You can specify an update override to override the update query that the Secure Agent generates for the update operation.
When you configure an update override, the Secure Agent uses the query that you specify, stages the data in files, and then loads that data into a temporary table using the Snowflake's loader copy command. The data from the temporary table is then loaded to the Snowflake target table. The syntax that you specify for the update query must be supported by Snowflake.
Specify the update query in the following format:
UPDATE <Target table name> SET <Target table name>.<Column1> = :TU.<Column1>, <Target table name>.<Column2> = :TU.<Column2>, … <Target table name>.<ColumnN> = :TU.<ColumnN> FROM :TU WHERE <Target table name>.<Update Column1> = :TU.<Update Column1> AND <Target table name>.<Update Column2> = :TU.<Update Column2> AND … <Target table name>.<Update ColumnN> = :TU.<Update ColumnN>
where, :TU. represents the incoming data source for the target port.
The Secure Agent replaces :TU. with a temporary table name while running the mapping and does not validate the update query.
When you configure an update override in a mapping to write to Snowflake, consider the following rules:

Optimize the .csv file size

When you create a mapping to write to Snowflake, you can specify the size of the local staging .csv file in bytes. Specify the local staging file size property, csvFileSize, in the Additional Write Runtime Parameters field in the advanced Snowflake target properties. The default file size is 50 MB.
If the intended file size is 50 MB, calculate the csvFileSize value in bytes, for example 50*1024*1024 and then specify 52428800 as the csvFileSize. For optimal performance, the number and size of the CSV files must be in synchrony.

Configuring the batch size and the number of local staging files

To achieve optimal performance of a mapping that writes data to Snowflake, allow the mapping to consider the default values for both the batch size and number of local staging files based on the amount of data it processes. Unless your use case requires that you set the values, it is recommended that the mapping processes with the default values.
The priority that the mapping gives to these two configurations depends on which threshold is reached first. If the mapping satisfies the criteria for the number of local staging files before reaching the specified batch row size, the number of local staging files takes precedence since that threshold is reached first.
For example, see the following table which displays the number of insert commands issued for various configurations of the specified batch size and the number of local staging files in a write operation to Snowflake:
Number of Source Rows
Row Size (bytes)
Default File Size
Batch Row Size
Number of Local Staging Files
Number of Inserts
4
52019206 bytes
52428800 bytes
4
1
2
4
52019206 bytes
52428800 bytes
4
Not configured. Considers the default value of 64.
1
4
52019206 bytes
52428800 bytes
1
Not configured. Considers the default value of 64.
4
4
52019206 bytes
52428800 bytes
3
1
2
4
52019206 bytes
52428800 bytes
1
1
4
Scenario 1: The batch row size is 4 and the number of local staging files is 1.
With a batch row size of 4 and 1 local staging file, two insert statements are executed in Snowflake when processing data.
When the file size exceeds the default 50 MB because there are two rows, a new file is created for the next incoming row. Given that the number of local staging files is set to 1, a stage can hold a maximum of one file. Hence, two stages are created, each holding one file with two rows. Each of the file sizes is 104038412 bytes, which is 52019206 bytes per row. For each stage, an insert command is issued, leading to two insert commands.
Scenario 2: The batch row size is 4 and the number of local staging files value is not set.
Using the default value of 64 for the local staging files, the mapping considers the batch size when the local staging file threshold is not reached.
Scenario 3: The batch row size is 1 and the number of local staging files value is not set.
With a batch row size of 1 and no specific local staging files value set, the mapping issues four insert statements for four rows as the staging files threshold is not reached.
Scenario 4: The batch row size is 3 and the number of local staging files value is 1.
Using a batch row size of 3 and one local staging file, the mapping issues two insert commands after processing two rows in the first stage and two rows in the second stage.
Scenario 5: The batch row size is 1 and the number of local staging files value is 1.
With a batch row size of 1 and one local staging file, four insert commands are issued for each row individually as the staging files threshold is never met due to the specified smaller row size.

Configure load properties in mappings

You can configure write properties to load data to Snowflake in the Additional Write Runtime Parameters field in the Snowflake Data Cloud advanced target properties of the Target transformation.
The following table lists some of the additional runtime parameters that you can specify to load data to Snowflake:
Property
Description
oneBatch
Process all data in a single batch.
Type is Boolean.
Default is false.
remoteStage
Specifies to use internal stage. External stage is not applicable.
Type is String.
Default is "~"(user stage).
onError
Specifies the action to perform when an error is encountered while loading data from a file.
For example, onError option ABORT_STATEMENT|CONTINUE|SKIP_FILE
Type is String.
Default is CONTINUE.
compressFileByPut
Compress file by PUT.
Type is Boolean.
Default is false.
compressDataBeforePut
Compress data before PUT.
The loader compresses the data to a gzip format before uploading the data.
Type is Boolean.
Default is true.
copyEmptyFieldAsEmpty
The COPY command option to set incoming empty fields as null.
Type is Boolean.
enablePGZIP
Enables parallelism for the file compression.
Type is Boolean.
Default is true.
onError=ABORT_STATEMENT&oneBatch=true
Load the entire data in single batch and to stop the task if an error occurs. Simultaneously, validate the user-specified reject file path and write the error records to this file and to the session log.
Type is onError - String or oneBatch - Boolean.
When you set the values in the additional runtime parameters field, every configured partition initializes a new loader instance and the configured values apply similarly across all the partitions.

Example 1. Compress files

You want to compress files by using the Put command before loading data to Snowflake.
Specify the following compression option: compressDataBeforePut=false&compressFileByPut=true
If you specify both the options as true, Snowflake considers the compressDataBeforePut option.

Example 2. Replace empty values as null

You want to replace the incoming fields with empty values as NULL while loading the data to Snowflake.
Specify the copyEmptyFieldAsEmpty Boolean option and set the value to true or false based on your requirement.
Consider the following scenarios before you configure the copyEmptyFieldAsEmpty Boolean parameter:

Example 3. Write data in a single batch

You want to write source data in a single batch to Snowflake. If an error is encountered while loading, you also want to capture the errors.
Specify the onError=ABORT_STATEMENT&oneBatch=true property based on your requirement.
While loading in a single batch, if an error occurs, the Secure Agent checks for the specified reject file name, runs the COPY command, validates the reject file, and then passes the file name to capture the errors, if any.

Configure additional runtime parameters to run mappings in advanced mode

In advanced mode, you can configure additional properties to write data to Snowflake. Specify the parameters in the Additional Write Runtime Parameters field in the Target transformation.
You can configure the following additional runtime parameters to write data to Snowflake:
autopushdown
Optional. Determines whether the automatic query SQL ELT is enabled.
If you enable SQL ELT and the query runs on an advanced cluster, the cluster application pushes part of the query to process in Snowflake, thereby optimizing the performance of these queries.
Default is on when the connector uses a compatible Spark version. When the connector does not use a compatible Spark version, the default value is off.
continueOnError
Optional. Determines whether the COPY command aborts the operation when you enter data that is not valid. For example, you specify a JSON format for a variant data type column that is not valid.
The values include on and off. When you specify the value as on, the COPY command continues even if an error occurs. If you specify off, the COPY command aborts when an error occurs. Default is off.
It is recommended that you keep the option as off. Else, when an error is encountered while copying data into Snowflake, some of the data might be missing.
parallelism
The size of the thread pool to use when the Secure Agent uploads or downloads data between Snowflake and the advanced cluster. Default is 4.
Do not change the default value unless you need to increase or decrease the throughput. When you want a high throughput, do not set the parallelism to an arbitrarily large number. A high value of parallelism might lead to undesired outputs and slows down the operation.
purge
Determines whether the Secure Agent deletes the temporary files created when transferring data from an advanced cluster to Snowflake through the external data transfer. The possible values are on and off. Default is off.
If you set this parameter to off, the Secure Agent automatically deletes the temporary files. Purging works only for data transfers from an advanced cluster to Snowflake, but not for transfers from Snowflake to the advanced cluster. If you set this parameter to on, the Secure Agent does not automatically delete the temporary files.
usestagingtable
Optional. Determines whether the data loading operation uses a staging table.
Snowflake creates a staging table with a temporary name. If the data loading operation is successful, Snowflake drops the original target table and renames the staging table to the original target table name. If the data loading operation fails, Snowflake drops the staging table and the target table retains the data that it contained before the operation.
Snowflake strongly recommends that you use a staging table. To create a staging table, you must have sufficient privileges to run the COPY command to create a table. If you do not have permissions to create a table, you can load directly without using a staging table.
The values include on and off. If you specify the usestagingtable parameter as on, Snowflake uses a staging table. If you specify the value as off, Snowflake directly loads the data into the target table. Default is on.