Mass Ingestion Applications > Mass Ingestion Applications > Supported targets
  

Supported targets

The targets that Mass Ingestion Applications support depend on the sources specified for the application ingestion tasks.
The following table lists the targets that Mass Ingestion Applications support for each source type:
Source type
Supported target type
Adobe Analytics
Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
Google Analytics
Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
Marketo
Amazon Redshift, Amazon S3, Apache Kafka (incremental load), Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
Microsoft Dynamics 365
Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
NetSuite
Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
Oracle Fusion Cloud
  • - Using REST API - Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
  • - Using BICC - Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
Salesforce
Amazon Aurora PostgreSQL, Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Oracle, Microsoft Azure SQL Database, Microsoft Fabric OneLake, and Snowflake
Salesforce Marketing Cloud
Amazon Redshift, Amazon S3, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
SAP
  • - Using SAP ODP Extractor connector - Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
  • - Using SAP Mass Ingestion connector - Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Azure SQL Database (initial load), Microsoft Fabric OneLake, Oracle, and Snowflake
ServiceNow
Amazon Redshift, Apache Kafka, Databricks Delta, Amazon S3, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
Workday
  • - SOAP. Amazon Redshift, Amazon S3, Apache Kafka (incremental load), Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
  • - RaaS. Amazon Redshift, Amazon S3, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
Zendesk
Amazon Redshift, Amazon S3, Apache Kafka, Databricks Delta, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, and Snowflake
To determine the connectors to use for the target types, see Connectors and Connections > Mass Ingestion Applications connectors.

Guidelines for Amazon Redshift targets

Consider the following guidelines when you use Amazon Redshift targets:

Guidelines for Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Fabric OneLake, and Oracle Cloud Object Storage targets

Consider the following guidelines when you use Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Fabric OneLake, and Oracle Cloud Infrastructure (OCI) Object Storage targets:

Default directory structure of CDC files on Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, and Microsoft Fabric OneLake targets

Application ingestion jobs create directories on Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, and Microsoft Fabric OneLake targets to store information about change data processing.
The following directory structure is created by default on the targets:
Bucket
└───connection_folder
└───job_folder
├───cycle
│ ├───Completed
│ │ ├───completed_cycle_folder
│ │ │ └───Cycle-timestamp.csv
│ │ │ ...
│ │ └───completed_cycle_folder
│ │ └───Cycle-timestamp.csv
│ └───Contents
│ ├───cycle_folder
│ │ └───Cycle-contents-timestamp.csv
│ │ ...
│ └───cycle_folder
│ └───Cycle-contents-timestamp.csv
└───data
└───object_name
├───Data
│ ├───cycle_folder
│ │ └───object_name_timestamp.csv
│ │ ...
│ └───cycle_folder
│ └───object_name_timestamp.csv
└───Schema
└───V1
└───object_name.schema
The following table describes the directories in the default structure:
Folder
Description
connection_folder
Contains the Mass Ingestion Applications objects. This folder is specified in the Folder Path field of the Amazon S3 connection properties, in the Directory Path field of the Microsoft Azure Data Lake Storage Gen2 connection properties, or in the Lakehouse Path field of the Microsoft Fabric OneLake connection properties.
Note: This folder is not created for Google Cloud Storage targets.
job_folder
Contains job output files. This folder is specified in the Directory field on the Target page of the application ingestion task wizard.
cycle/Completed
Contains a subfolder for each completed CDC cycle. Each cycle subfolder contains a completed cycle file.
cycle/Contents
Contains a subfolder for each CDC cycle. Each cycle subfolder contains a cycle contents file.
data
Contains output data files and schema files for each object.
data/object_name/Schema/V1
Contains a schema file.
Note: Mass Ingestion Applications does not save a schema file in this folder if the output files use the Parquet format.
data/object_name/Data
Contains a subfolder for each CDC cycle that produces output data files.

Cycle directories

Mass Ingestion Applications uses the following pattern to name cycle directories:
[dt=]yyyy-mm-dd-hh-mm-ss
The "dt=" prefix is added to cycle folder names if you select the Add Directory Tags check box on the Target page of the application ingestion task wizard.

Cycle contents files

Cycle contents files are located in cycle/Contents/cycle_folder subdirectories. Cycle contents files contain a record for each object that has had a DML event during the cycle. If no DML operations occurred on an object in the cycle, the object does not appear in the cycle contents file.
Mass Ingestion Applications uses the following pattern to name cycle content files:
Cycle-contents-timestamp.csv
A cycle contents csv file contains the following information:

Completed cycle files

Completed cycle files are located in cycle/Completed/completed_cycle_folder subdirectories. An application ingestion job creates a cycle file in this subdirectory after a cycle completes. If this file is not present, the cycle has not completed yet.
Mass Ingestion Applications uses the following pattern to name completed cycle files:
Cycle-timestamp.csv
A completed cycle csv file contains the following information:

Output data files

The data files contain records that include the following information:

Custom directory structure for output files on Amazon S3, Google Cloud Storage, Microsoft Fabric OneLake, and ADLS Gen2 targets

You can configure a custom directory structure for the output files that initial load, incremental load, and combined initial and incremental load jobs write to Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage (ADLS) Gen2, and Microsoft Fabric OneLake targets if you do not want to use the default structure.

Initial loads

By default, initial load jobs write output files to tablename_timestamp subdirectories under the parent directory. For Amazon S3 and ADLS Gen2 targets, the parent directory is specified in the target connection properties if the Connection Directory as Parent check box is selected on the Target page of the task wizard.
For Google Cloud Storage targets, the parent directory is the bucket container specified in the Bucket field on the Target page of the task wizard.
For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
You can customize the directory structure to suit your needs. For example, for initial loads, you can write the output files under a root directory or directory path that is different from the parent directory specified in the connection properties to better organize the files for your environment or to find them more easily. Or you can consolidate all output files for a table directly in a directory with the table name rather than write the files to separate timestamped subdirectories, for example, to facilitate automated processing of all of the files.
To configure a directory structure, you must use the Data Directory field on the Target page of the ingestion task wizard. The default value is {TableName}_{Timestamp}, which causes output files to be written to tablename_timestamp subdirectories under the parent directory. You can configure a custom directory path by creating a directory pattern that consists of any combination of case-insensitive placeholders and directory names. The placeholders are:
A pattern can also include the following functions:
By default, the target schema is also written to the data directory. If you want to use a different directory for the schema, you can define a directory pattern in the Schema Directory field.
Example 1
You are using an Amazon S3 target and want to write output files and the target schema to the same directory, which is under the parent directory specified in the Folder Path field of the connection properties. In this case, the parent directory is idr-test/DEMO/. You want to write all of the output files for a table to a directory that has a name matching the table name, without a timestamp. You must complete the Data Directory field and select the Connection Directory as Parent check box.
Based on this configuration, the resulting directory structure is:
Example 2
You are using an Amazon S3 target and want to write output data files to a custom directory path and write the target schema to a separate directory path. To use the directory specified in the Folder Path field in the Amazon S3 connection properties as the parent directory for the data directory and schema directory, select Connection Directory as Parent. In this case, the parent directory is idr-test/DEMO/. In the Data Directory and Schema Directory fields, define directory patterns by using a specific directory name, such as data_dir and schema_dir, followed by the default {TableName}_{Timestamp} placeholder value. The placeholder creates tablename_timestamp destination directories.
Based on this configuration, the resulting data directory structure is:
And the resulting schema directory structure is:

Incremental loads and combined initial and incremental loads

By default, incremental load and combined initial and incremental load jobs write cycle files and data files to subdirectories under the parent directory. However, you can create a custom directory structure to organize the files to best suit your organization's requirements.
This feature applies to application ingestion incremental load jobs that have a Salesforce source and Amazon S3, Google Cloud Storage, Microsoft Fabric OneLake, or Microsoft Azure Data Lake Storage (ADLS) Gen2 targets.
For Amazon S3 and ADLS Gen2 targets, the parent directory is set in the target connection properties if the Connection Directory as Parent check box is selected on the Target page of the task wizard.
For Google Cloud Storage targets, the parent directory is the bucket container specified in the Bucket field on the Target page of the task wizard.
For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
You can customize the directory structure to suit your needs. For example, you can write the data and cycle files under a target directory for the task instead of under the parent directory specified in the connection properties. Alternatively, you can 1) consolidate table-specific data and schema files under a subdirectory that includes the table name, 2) partition the data files and summary contents and completed files by CDC cycle, or 3) create a completely customized directory structure by defining a pattern that includes literal values and placeholders. For example, if you want to run SQL-type expressions to process the data based on time, you can write all data files directly to timestamp subdirectories without partitioning them by CDC cycle.
To configure a custom directory structure for an incremental load task, define a pattern for any of the following optional fields on the Target page of the ingestion task wizard:
Field
Description
Default
Task Target Directory
Name of a root directory to use for storing output files for an incremental load task.
If you select the Connection Directory as Parent option, you can still optionally specify a task target directory. It will be appended to the parent directory to form the root for the data, schema, cycle completion, and cycle contents directories.
This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
None
Connection Directory as Parent
Select this check box to use the parent directory specified in the connection properties.
This field is not available for the Microsoft Fabric OneLake target.
Selected
Data Directory
Path to the subdirectory that contains the data files.
In the directory path, the {TableName} placeholder is required if data and schema files are not partitioned by CDC cycle.
{TaskTargetDirectory}/data/{TableName}/data
Schema Directory
Path to the subdirectory in which to store the schema file if you do not want to store it in the data directory.
In the directory path, the {TableName} placeholder is required if data and schema files are not partitioned by CDC cycle.
{TaskTargetDirectory}/data/{TableName}/schema
Cycle Completion Directory
Path to the directory that contains the cycle completed file.
{TaskTargetDirectory}/cycle/completed
Cycle Contents Directory
Path to the directory that contains the cycle contents files.
{TaskTargetDirectory}/cycle/contents
Use Cycle Partitioning for Data Directory
Causes a timestamp subdirectory to be created for each CDC cycle, under each data directory.
If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
Selected
Use Cycle Partitioning for Summary Directories
Causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
Selected
List Individual Files in Contents
Lists individual data files under the contents subdirectory.
If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date.
If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.
Not selected if Use Cycle Partitioning for Summary Directories is selected.
Selected if you cleared Use Cycle Partitioning for Summary Directories.
A directory pattern consists of any combination of case-insensitive placeholders, shown in curly brackets { }, and specific directory names. The following placeholders are supported:
Note: The timestamp, year, month, and day placeholders indicate when the CDC cycle started when specified in patterns for data, contents, and completed directories, or indicate when the CDC job started when specified in the schema directory pattern.
Example 1
You want to accept the default directory settings for incremental load jobs as displayed in the task wizard. The target type is Amazon S3. Because the Connection Directory as Parent check box is selected by default, the parent directory path that is specified in the Folder Path field of the Amazon S3 connection properties is used. This parent directory is idr-test/dbmi/. You also must specify a task target directory name, in this case, s3_target, because the {TaskTargetDirectory} placeholder is used in the default patterns in the subsequent directory fields. The files in the data directory and schema directory will be grouped by table name because the {TableName} placeholder is included in their default patterns. Also, because cycle partitioning is enabled, the files in the data directory, schema directory, and cycle summary directories will be subdivided by CDC cycle.
Based on this configuration, the resulting data directory structure is:
If you drill down on the data folder and then on a table in that folder (pgs001_src_allint_init), you can access the data and schema subdirectories:
If you drill down on the data folder, you can access the timestamp directories for the data files:
If you drill down on cycle, you can access the summary contents and completed subdirectories:
Example 2
You want to create a custom directory structure for incremental load jobs that adds the subdirectories "demo" and "d1" in all of the directory paths except in the schema directory so that you can easily find the files for your demos. Because the Connection Directory as Parent check box is selected, the parent directory path (idr-test/dbmi/) that is specified in the Folder Path field of the Amazon S3 connection properties is used. You also must specify the task target directory because the {TaskTargetDirectory} placeholder is used in the patterns in the subsequent directory fields. The files in the data directory and schema directory will be grouped by table name. Also, because cycle partitioning is enabled, the files in the data, schema, and cycle summary directories will be subdivided by CDC cycle.
Based on this configuration, the resulting data directory structure is:

Guidelines for Apache Kafka targets

Consider the following guidelines when you use Apache Kafka targets:

Guidelines for Databricks Delta targets

Consider the following guidelines when you use Databricks Delta targets:

Guidelines for Google BigQuery targets

Consider the following guidelines when you use Google BigQuery targets:

Guidelines for Microsoft Azure Synapse Analytics targets

Consider the following guidelines when you use Microsoft Azure Synapse Analytics targets:

Guidelines for Microsoft Azure SQL Database targets

Consider the following guidelines when using SAP (using SAP Mass Ingestion connector for initial load only) and Salesforce sources and Microsoft Azure SQL Database target for all load types for instances such as, Microsoft Amazon RDS (Relational Database Service) for SQL Server, Microsoft Azure SQL Database, and Azure SQL Managed Instance:

Guidelines for Oracle targets

By default, Mass Ingestion Applications disables logging for the Oracle target tables to optimize performance. You can enable logging by setting the writerOracleNoLogging custom property to false on the Target page in the application ingestion task wizard.

Guidelines for PostgreSQL targets

Consider the following guidelines when you use PostgreSQL targets:

Guidelines for Snowflake targets

Target preparation

Target preparation varies depending on whether you use the Superpipe feature for high performance streaming of data to Snowflake target tables or write data to intermediate stage files.
With Superpipe
If you use the Superpipe feature, complete the following steps:
  1. 1Create a Mass Ingestion user. Use the following SQL statement:
  2. create user INFACMI_User password = 'Xxxx@xxx';
  3. 2Create a new user role and grant it to the Mass Ingestion user. Use the following SQL statements:
  4. create role INFACMI_superpipe;
    grant role INFACMI_superpipe to user INFACMI_User;
  5. 3Grant usage on the Snowflake virtual warehouse to the new role. Use the following SQL statement:
  6. grant usage on warehouse warehouse_name to role INFACMI_superpipe;
  7. 4Grant usage on the Snowflake database to the new role. Use the following SQL statement:
  8. grant usage on database INFACMI_DB1 to role INFACMI_superpipe;
  9. 5Create a new schema. Use the following SQL statements:
  10. use database INFACMI_DB1;
    create schema sh_superpipe;
  11. 6Grant create stream, create view, and create table privileges on the new Snowflake schema to the new role. Use the following SQL statement:
  12. grant create stream, create view, create table, usage on schema INFACMI_DB1.sh_superpipe to role INFACMI_superpipe;
  13. 7Set the default role for the newly created user. Use the following SQL statement:
  14. alter user INFACMI_User set default_role=INFACMI_superpipe;
  15. 8Define a Snowflake Data Cloud connection to the target. You must use the KeyPair option as the authentication method. See Connectors and Connections > Snowflake Data Cloud connection properties.
  16. 9Generate a private key with OpenSSL version 3.x.x and the PBE-SHA1-2DES or PBE-SHA1-3DES cipher. Use the following openssl commands to generate and format the private key:
  17. openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -v1 PBE-SHA1-3DES -out rsa_key.p8
  18. 10Generate the public key. Use the following openssl command, in which the -in option references the file (rsa_key.p8) that contains the encrypted private key:
  19. openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
  20. 11In Snowflake, assign the public key to the Snowflake user. Use the following SQL command:
  21. alter user INFACMI_User set rsa_public_key='key_value’;
Next step: When you create an ingestion task, select the Superpipe option on the Target page of the task wizard. You can also optionally specify a Merge Frequency value, which controls the frequency at which change data rows are merged and applied to the Snowflake target table.
Without Superpipe
If you do NOT use the Superpipe feature for Snowflake targets, complete the following steps as the ACCOUNTADMIN user:
  1. 1Create a Mass Ingestion user. Use one of the following SQL statements:
  2. create user INFACMI_User password = 'Xxxx@xxx';
    or
    replace user INFACMI_User password = 'Xxxx@xxx';
  3. 2Create a new role and grant the role to the Mass Ingestion user. Use the following SQL statements:
  4. create role INFA_CMI_Role;
    grant role INFA_CMI_Role to user INFACMI_User;
  5. 3Grant usage on the Snowflake virtual warehouse to the new role. Use the following SQL statement:
  6. grant usage on warehouse CMIWH to role INFA_CMI_Role;
  7. 4Grant usage on the Snowflake database to the new role. Use the following SQL statement:
  8. grant usage, CREATE SCHEMA on database CMIDB to role INFA_CMI_Role;
  9. 5Set the default role for the newly created user. Use the following SQL statement:
  10. alter user INFACMI_User set default_role=INFA_CMI_Role;
Also, as the INFACMI_User, create a new schema:
create schema CMISchema;
Note: If the user's default role is used for ingestion tasks and does not have the required privileges, the following error will be issued at runtime:
SQL compilation error: Object does not exist, or operation cannot be performed.

Usage guidelines

Configure private connectivity to Snowflake

You can access Snowflake using AWS or Azure Private Link endpoints.
The AWS or Azure Private Link setup ensures that the connection to Snowflake uses the AWS or Azure internal network and does not take place over the public Internet.
To connect to the Snowflake account over the private AWS network, see AWS Private Link and Snowflake.
To connect to the Snowflake account over the private Azure network, see Azure Private Link and Snowflake.

Avro data types

Mass Ingestion Applications supports only some of the primitive and logical data types that Avro schemas provide.
A primitive data type is a type that allows you to represent a single data value. A logical type is an Avro primitive or complex type with extra attributes to represent a derived type. This topic applies to all target types that support Avro or Parquet output format.
The following table lists the primitive Avro data types that Mass Ingestion Applications supports:
Primitive data type
Description
INT
32-bit signed integer
LONG
64-bit signed integer
FLOAT
Single precision (32-bit) IEEE 754 floating-point number
DOUBLE
Double precision (64-bit) IEEE 754 floating-point number
BYTES
Sequence of 8-bit unsigned bytes
STRING
Unicode character sequence
The following table lists the logical Avro data types that Mass Ingestion Applications supports:
Logical data type
Description
DECIMAL
An arbitrary-precision signed decimal number of the form unscaled × 10 -scale
DATE
A date, without reference to a time or time zone.
TIME
A time of day that has the precision of 1 millisecond or 1 microsecond, without reference to a time zone or date.
TIMESTAMP
A date and time value that has the precision of 1 millisecond or microsecond, without reference to a particular calendar or time zone.