Target preparation varies depending on whether you use the Superpipe feature for high performance streaming of data to Snowflake target tables or write data to intermediate stage files.
With Superpipe
If you use the Superpipe feature, complete the following steps:
1Create a Data Ingestion and Replication user. Use the following SQL statement:
create user INFACMI_User password 'Xxxx@xxx';
2Create a new user role and grant it to the Data Ingestion and Replication user. Use the following SQL statements:
create role INFACMI_superpipe; grant role INFACMI_superpipe to user INFACMI_User;
3Grant usage on the Snowflake virtual warehouse to the new role. Use the following SQL statement:
grant usage on warehouse warehouse_name to role INFACMI_superpipe;
4Grant usage on the Snowflake database to the new role. Use the following SQL statement:
grant usage on database INFACMI_DB1 to role INFACMI_superpipe;
5Create a new schema. Use the following SQL statements:
use database INFACMI_DB1; create schema sh_superpipe;
6Grant create stream, create view, and create table privileges on the new Snowflake schema to the new role. Use the following SQL statement:
grant create stream, create view, create table, usage on schema INFACMI_DB1.sh_superpipe to role INFACMI_superpipe;
7Set the default role for the newly created user. Use the following SQL statement:
alter user INFACMI_User set default_role=INFACMI_superpipe;
8Define a Snowflake Data Cloud connection to the target. You must use the KeyPair option as the authentication method. See Connectors and Connections > Snowflake Data Cloud connection properties.
9Generate a private key with OpenSSL version 3.x.x and one of the following ciphers:
- AES128
- AES128-CBC
- AES256
- AES-256-CBC
Use the following openssl commands to generate and format the private key:
10Generate the public key. Use the following openssl command, in which the -in option references the file (rsa_key.p8) that contains the encrypted private key:
11In Snowflake, assign the public key to the Snowflake user. Use the following SQL command:
alter user INFACMI_User set rsa_public_key='key_value’;
Next step: When you create an ingestion and replication task, select the Superpipe option on the Target page of the task wizard. You can also optionally specify a Merge Frequency value, which controls the frequency at which change data rows are merged and applied to the Snowflake target table.
Without Superpipe
If you do NOT use the Superpipe feature for Snowflake targets, complete the following steps as the ACCOUNTADMIN user:
1Create a Data Ingestion and Replication user. Use one of the following SQL statements:
create user INFACMI_User password 'Xxxx@xxx';
or
replace user INFACMI_User password 'Xxxx@xxx';
2Create a new role and grant the role to the Data Ingestion and Replication user. Use the following SQL statements:
create role INFA_CMI_Role; grant role INFA_CMI_Role to user INFACMI_User;
3Grant usage on the Snowflake virtual warehouse to the new role. Use the following SQL statement:
grant usage on warehouse CMIWH to role INFA_CMI_Role;
4Grant usage on the Snowflake database to the new role. Use the following SQL statement:
grant usage, CREATE SCHEMA on database CMIDB to role INFA_CMI_Role;
5Set the default role for the newly created user. Use the following SQL statement:
alter user INFACMI_User set default_role=INFA_CMI_Role;
Also, as the INFACMI_User, create a new schema:
create schema CMISchema;
Note: If the user's default role is used for ingestion and replication tasks and does not have the required privileges, the following error will be issued at runtime:
SQL compilation error: Object does not exist, or operation cannot be performed.
For Snowflake-managed Iceberg tables
Regardless of whether you use the Superpipe option, you can choose to store data and metadata in Snowflake-managed Iceberg tables in external cloud storage instead of in Snowflake tables. To prepare for the use of Iceberg tables, perform the following tasks:
•Create and configure an external volume in Snowflake to store an identity and access management (IAM) entity that Snowflake can use to connect to your external cloud storage. The steps depend on the type of cloud storage, for example, Amazon S3, Google Cloud Storage, or Azure storage. For instructions, see the Snowflake documentation here.
•Create a role with the minimum access control privileges required to create the external volume and Iceberg tables and to use the external volume and table schema. For instructions, see the Snowflake documentation here.
Usage considerations
•Database Ingestion and Replication provides alternative methods of moving data to a Snowflake Data Cloud target:
- If you select the Superpipe option when defining an ingestion task, the ingestion job uses the Snowpipe Streaming API to stream rows of data directly to the target tables with low latency. This method is available for all load types. You must use KeyPair authentication.
- If you do not use Superpipe, ingestion jobs first write the data to data files in an internal stage, which has the name you specify in the task definition.
•If you do not use Superpipe and the internal stage specified in the target properties for an ingestion job does not exist, Database Ingestion and Replication automatically creates the stage by running the following SQL command:
Create stage if not exists "Schema"."Stage_Bucket"”
For the command to run successfully, the following privilege must be granted to your user role:
GRANT CREATE STAGE ON SCHEMA "Schema" TO ROLE <your_role>;
•You can enable database ingestion and replication jobs to store data in an open format in Apache Iceberg managed tables in an external cloud Amazon S3, Google Cloud Storage, or Azure Storage location instead of in Snowflake. For more information, see Using Snowflake-managed Iceberg tables to store data in external cloud storage.
•When you define a connection for a Snowflake Data Cloud target, you must set the Additional JDBC URL Parameters field to database=target_database_name. Otherwise, when you try to define the target in the database ingestion and replication task wizard, an error message reports that the list of schemas cannot be retrieved.
•When you define a connection for a Snowflake target using the KeyPair option as the authentication method and you generate the private key with OpenSSL 3.x.x version, use AES128, AES128-CBC, AES256, or AES-256-CBC cipher while generating the private key. Run the following command:
Data Ingestion and Replication supports the following ciphers:
- AES128
- AES128-CBC
- AES256
- AES-256-CBC
- CAMELLIA-256-CBC
- CAMELLIA-256-CTR
- ARIA-256-CTR
•Database ingestion and replication incremental load jobs and combined initial and incremental load jobs generate a recovery table named INFORMATICA_CDC_RECOVERY on the target to store internal service information that prevents jobs restarted after a failure from propagating previously processed data again. This recovery table is generated in the same schema as the target tables.
•Database ingestion and replication incremental load and combined load jobs that have a Snowflake target create transient _LOG tables to stage data on the target instead of permanent tables, by default. To prevent permanent tables from being created, the Snowflake Time Travel and Fail-safe features are disabled by default. The use of transient tables helps reduce _LOG table storage consumption and associated costs. With transient tables, however, you can't use the Superpipe option. If you need to create permanent tables, you can set the target snowflakeStagingTransientTable custom property to false.
•For Snowflake targets, you cannot alter the scale of NUMBER fields or change the data type of an existing field to a different data type because Snowflake does not support these actions.
•If the source tables contain columns that have special characters in the name, you can replace the special characters with a provided choice of string in the Snowflake target.
To replace the special characters with a provided choice of string in the generated target schema, perform the following actions before deploying the database ingestion and replication task:
1Create a properties file to be used at the metadata-manager layer. Add the targetReplacementValue=<string> key-value pair to the properties file.
2Open the Runtime Environment page in Administrator and edit the Secure Agent. Create a custom property in the Custom Configuration Details area:
▪ Select the Database Ingestion service.
▪ Select the DBMI_AGENT_ENV type.
▪ Enter DBMI_TASK_OVERRIDE_PROPERTIES as the property name.
▪ Enter the properties file location as the property value.
3Set the targetReplacementValue=<string> custom property in the Target page of the task wizard.
Before running the task, add <jobname>. to the targetReplacementValue key in the properties file:
<jobname>.targetReplacementValue=<string>
If the property affects all jobs, add "alljobs." to the targetReplacementValue key:
alljobs.targetReplacementValue=<string>
•Before stopping a database ingestion and replication incremental load or combined load job that has a Snowflake target and uses the Superpipe option, ensure that all ingested data in the change stream has been merged into the target. Otherwise, if the job is not restarted for an extended period or before the stream expires, any data that remains in the stream is lost and can’t be recovered.
•For incremental load and combined load jobs with a Snowflake target that uses the Superpipe option, change data rows might be merged and applied to the Snowflake target tables before the merge interval set in the target properties elapses. This behavior occurs whenever mutated rows are detected. The mutation triggers merge apply processing to begin right away, without waiting for the merge interval.
•If database ingestion and replication jobs replicate data from source JSON columns to Snowflake, Snowflake add slashes (/) within the data on the target. You can set a target custom property to avoid the extraneous slashes. For more information, contact Informatica Global Customer Support.
Using Snowflake-managed Iceberg tables to store data in external cloud storage
You can enable database ingestion and replication jobs to store data and metadata in Snowflake-managed Iceberg tables in an external cloud location instead of in Snowflake tables. The external location can be in Amazon S3, Google Cloud Storage, or Azure Storage.
The external storage of data in an open format has the following benefits:
•Ability to use query processing engines other than Snowflake to process the data
•Interoperability across different data processing and analytics tools
•Better performance
•Robust security and governance
Database Ingestion and Replication automatically creates the Snowflake-managed Iceberg tables when you deploy ingestion and replication tasks enabled for Iceberg table use. You do not need to create the Iceberg tables manually.
Snowflake manages the metadata and catalog for the Iceberg tables. Snowflake uses an external volume that stores identity and access management (IAM) information to connect to the external storage location. In Snowflake, you must create and configure an external volume beforehand. For more information, see "Target preparation" in Snowflake targets.
When you configure ingestion and replication tasks, specify the following target custom properties to enable the use of Snowflake-managed Iceberg tables:
•writerTargetTableFormat. Required. Set this custom property to iceberg to enable the use of Snowflake-managed Iceberg tables.
•writerTargetTableExternalVolumeName. Required. Enter the name of the external volume that you created in the Snowflake account.
•writerTargetTableExternalStorageBaseLocation. Optional. The base URL for the external storage location. If you do not specify this value, the base URL from the external volume definition is used.
The ingestion and replication tasks can use any load type, any apply mode, and schema drift. The Superpipe option is allowed but not required.
The following limitations apply to using Iceberg tables:
•You must use permanent Snowflake-managed Iceberg tables, not transient or temporary tables.
•The external volume must be in the same cloud and region as the Snowflake account.
•Char and varchar data types with a limited size specification are not supported.
•Time and timestamp data types have a microsecond precision.
Configure private connectivity to Snowflake
You can access Snowflake using AWS or Azure Private Link endpoints.
The AWS or Azure Private Link setup ensures that the connection to Snowflake uses the AWS or Azure internal network and does not take place over the public Internet.