Property | Description |
---|---|
Task Name | Name of the file ingestion and replication task. The names of file ingestion and replication tasks must be unique within the organization. Task names can contain alphanumeric characters, spaces, and underscores. Names must begin with an alphabetic character or underscore. Task names are not case sensitive. |
Location | Project or folder in which the task will reside. |
Description | Optional description of the task. Maximum length is 1024 characters. |
Runtime Environment | Runtime environment that runs the task. File ingestion and replication tasks can run on a Secure Agent or Cloud Hosted Agent. They cannot run in a serverless runtime environment. |
Option | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Directory from where files are transferred. The default value is the source directory specified in the connection. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. |
Add Parameters | Create an expression and add it as a Source Directory parameter. For more information, see Source and target parameters. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern to use for selecting the files to transfer. The pattern can be a regular expression or a pattern with wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
File path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. For more information about transferring skip duplicate files information, see Skip duplicate files. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum number of files you can transfer in a batch is 20. The maximum value of the batch varies based on whether the files are transferred through an intermediate staging area. |
Transfer Mode | File transfer mode. Select one of the following modes:
Note: If a binary file transfer is interrupted due to a network disruption, the file event displays an interrupted status. Run the file ingestion and replication job again to resume the transfer of the interrupted files. |
After File Pickup | Determines what to do with the source files after the files are transferred. Select one of the following options:
|
Option | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Directory from where files are transferred. The default value is the source directory specified in the connection. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. |
Add Parameters | Create an expression to add it as a Source Directory parameter. For more information, see Source and target parameters. |
Include files from sub-folders | This applies when File Pickup is By Pattern. Transfer files from all subfolders under the defined source directory. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern to use for selecting the files to transfer. The pattern can be a regular expression or a pattern with wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
File path containing the list of files | This applies when File Pickup is By File List. This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum number of files you can transfer in a batch is 20. |
Transfer Mode | File transfer mode. Select one of the following modes:
Note: If a binary file transfer is interrupted due to a network disruption, the file event displays an interrupted status. Run the file ingestion and replication job again to resume the transfer of the interrupted files. |
After File Pickup | Determines what to do with the source files after the files are transferred. Select one of the following options:
|
Option | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Directory from where files are transferred. The default value is the source directory specified in the connection. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. |
Add Parameters | Create an expression to add it as a Source Directory. For more information, see Source and target parameters. |
Include files from sub-folders | This applies when File Pickup is By Pattern. Transfer files from all subfolders under the defined source directory. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern to use for selecting the files to transfer. The pattern can be a regular expression or a pattern with wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
File path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum number of files you can transfer in a batch is 20. |
After File Pickup | Determines what to do with the source files after the files are transferred. Select one of the following options:
|
Option | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Amazon S3 folder path from where files are transferred, including the bucket name. The default value is the folder path value specified in the connection properties. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. Note: Ensure that you have sufficient privileges to access the bucket and specific folders. |
Add Parameters | Create an expression to add it as a Folder Path parameter. For more information, see Source and target parameters. |
Include files from sub-folders | This applies when File Pickup is By Pattern. Transfer files from all subfolders under the defined source directory. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern used to select the files to transfer. In the pattern, you can use the following wildcard characters:
|
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
The file path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the path that contains the list of files to pick up and enter the file path. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum value of the batch depends on whether the files transfer through an intermediate staging server. A file ingestion and replication task does not transfer files through an intermediate staging server if the files are transferred from the following source to target endpoints:
Consider the following guidelines when you define a batch size:
|
File Encryption Type | Type of Amazon S3 file encryption to use during file transfer. Select one of the following options:
|
S3 Accelerated Transfer | Select whether to use Amazon S3 Transfer Acceleration on the S3 bucket. To use Transfer Acceleration, accelerated transfer must be enabled for the bucket. The following options are available:
|
Minimum Download Part Size | Minimum download part size in megabytes when downloading a large file as a set of multiple independent parts. |
Multipart Download Threshold | Multipart download minimum threshold in megabytes that is used to determine when to upload objects in multiple parts in parallel. |
After File Pickup | Determines what to do with the source files after the task streams them to the target. Select one of the following options:
|
Option | Description |
---|---|
File Pickup | The file ingestion and replication task picks up files based on a file list. The file list consist of a comma-separated list of file names. The file list option populates automatically from the Cloud Integration Hub subscription and you can't edit it. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum number of files you can transfer in a batch is 20. |
After File Pickup | Determines what to do with the source files after the files are transferred. Select one of the following options:
|
Option | Description |
---|---|
File Pattern | File name pattern to use for selecting the files to transfer. The pattern can be a regular expression or a pattern with wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. |
After File Pickup | Determines what to do with the source files after the files are transferred. Select one of the following filter options:
|
Option | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Directory from where files are transferred. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. |
Add Parameters | Create an expression to add it as a Source Directory parameter. For more information, see Source and target parameters. |
Include files from sub-folders | Transfer files from all subfolders under the defined source directory. |
File Pattern | File name pattern to use for selecting the files to transfer. The pattern can be a regular expression or a pattern with wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, waits for 15 seconds, and processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum batch size varies, based on the following conditions:
Note: If you transfer files from Google Cloud Storage to Google BigQuery, the task transfers files with no intermediate staging server. |
After File Pickup | Determines the actions to be performed on the source files after the files transfer. The following options are available:
For example, if /root/archive is the archive directory, /root/test is the source directory, sub1 and sub2 are the directories within the source directory, and you choose to include files from sub-folders, then the folder structure of the archive directory is /root/archive/sub1, /root/archive/sub2. |
Option | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Directory from where files are transferred. |
Add Parameters | Create an expression to add it as a Source Directory parameter. For more information, see Source and target parameters. |
Include files from sub folders | This applies when File Pickup is By Pattern. Transfer files from all subfolders under the defined source directory. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern used to select the files to transfer. Based on the file pattern that you have selected, enter the file name patterns. Select one of the following file patterns:
Identifies all files except for files whose name contains out, foo, and baz. Identifies all files that have an extension of doc, docx, or pdf. Identifies all text files except for files whose name contains out.txt. |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
File path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, waits for 15 seconds, and processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. |
Option | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Directory from where files are transferred. The Secure Agent must be able to access the directory. The use of slashes around the source folder path differs between connectors. Using slashes incorrectly will result in connection failures. For more information, see the Knowledge Base article 625869. Note: File listener can access files and directories on network shares with support for NFS and CIFS. |
Add Parameters | Create an expression to add it as a Source Directory parameter. For more information, see Source and target parameters. |
Include files from sub-folders | This applies when File Pickup is By Pattern. Transfer files from all subfolders under the defined source directory. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern used to select the files to transfer. Based on the file pattern that you have selected, enter the file name patterns. The following file patterns are available:
Identifies all files except for files whose name contains out, foo, and baz. Identifies all files that have an extension of doc, docx, or pdf. Identifies all text files except for files whose name contains out.txt. |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following options:
|
File path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, waits for 15 seconds, and processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The maximum number of files a file ingestion and replication task transfers in a batch. Default is 5. The maximum number of files you can transfer in a batch is 20. The maximum batch size varies, based on the following conditions:
Consider the following guidelines when you define the batch size:
|
After File Pickup | Determines what to do with source files after the files transfer. The following options are available:
|
Advance Source Property | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Microsoft Azure Blob Storage directory from where files are transferred, including the container name. The default value is the container path specified in the connection. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. |
Add Parameters | Create an expression to add it as a Folder Path parameter. For more information, see Source and target parameters. |
Include files from sub-folders | This applies when File Pickup is By Pattern. Transfer files from sub-folders present in the folder path. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern used to select the files to transfer. You can use a regular expression or wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following options:
|
File path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Do not transfer duplicate files. If files with the same name and file size were transferred by the same file ingestion and replication task, the task does not transfer them again, and the files are marked as duplicate in the job log. If this option is not selected the task transfers all files. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum batch size varies, based on the following conditions:
Note: If you transfer files from Azure Blob Storage to Azure SQL Data Warehouse and Snowflake, the task transfers files with no intermediate staging. |
Advance Source Property | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Microsoft Azure Data Lake Storage Gen2 folder path from where files are transferred. The default value is the container path specified in the connection. The source directory must start with a forward slash (/). You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. |
Add Parameters | Create an expression to add it as a Source Directory parameter. For more information, see Source and target parameters. |
Include files from sub-folders | This applies when File Pickup is By Pattern. Transfer files from sub-folders present in the folder path. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern used to select the files to transfer. You can use a regular expression or wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
File path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip duplicate files | Do not transfer duplicate files. If files with the same name and file size were transferred by the same file ingestion and replication task, the task does not transfer them again, and the files are marked as duplicate in the job log. If this option is not selected the task transfers all files. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum batch size varies, based on the following conditions:
Note: If you transfer files from Microsoft Azure Data Lake Storage Gen2 to Azure SQL Data Warehouse, the task transfers files with no intermediate staging. |
Block Size (Bytes) | Divides a large file into smaller specified block size. When you read a large file, divide the file into smaller parts and configure concurrent connections to spawn the required number of threads to process data in parallel. Default is 8388608 bytes (8 MB). |
After File Pickup | Determines what to do with source files after the files transfer. The following options are available:
For example, if /root/archive is the archive directory, /root/test is the source directory, sub1 and sub2 are the directories within the source directory, and you choose to include files from sub-folders, then the folder structure of archive directory is /root/archive/sub1, /root/archive/sub2. |
Advance Source Property | Description |
---|---|
File Pickup | The file ingestion and replication task supports the following file pickup methods:
|
Source Directory | Microsoft Azure Data Lake Store directory from where files are transferred. The default value is the container path specified in the connection. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. |
Add Parameters | Create an expression to add it as a Source Directory parameter. For more information, see Source and target parameters. |
Include files from sub-folders | This applies when File Pickup is By Pattern. Transfer files from sub-folders present in the folder path. |
File Pattern | This applies when File Pickup is By Pattern. File name pattern to use for selecting the files to transfer. The pattern can be a regular expression or a pattern with wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | This applies when File Pickup is By Pattern. A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | This applies when File Pickup is By Pattern. If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | This applies when File Pickup is By Pattern. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
File path containing the list of files | This applies when File Pickup is By File List. Select this option to provide the file path that contains the list of files to pick up. Ensure that you enter a comma-separated list of file names in the file. |
File list | This applies when File Pickup is By File List. Select this option to provide the list of files to pick up and enter a comma-separated list of file names. |
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the file ingestion and replication task does not transfer files that have the same name and file size as another file. The file ingestion and replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check file stability | Indicates whether to verify that a file is stable before a file ingestion and replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability check interval | This applies when you enable the Check file stability option. Time in seconds that a file ingestion and replication task waits to check the file stability. For example, if the stability time is 15 seconds, the file ingestion and replication task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum batch size varies, based on the following conditions:
Note: If you transfer files from Azure Blob Storage to Azure SQL Data Warehouse and Snowflake, the task transfers files with no intermediate staging server. |
Option | Description |
---|---|
File Pickup | The File Ingestion and Replication task supports the following file pickup methods:
|
Source Directory | Directory from where files are transferred. You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection. Note: A File Ingestion and Replication job fails if there's an empty file in the source directory. However, you can avoid selecting 0 KB files by using the size filter option. |
Add Parameters | Create an expression to add it as a Source Directory parameter. For more information, see Source and target parameters. |
Include files from sub-folders | Transfer files from all subfolders under the defined source directory. |
File Pattern | File name pattern to use for selecting the files to transfer. The pattern can be a regular expression or a pattern with wildcard characters. The following wildcard characters are allowed:
For example, you can specify the following regular expression: ([a-zA-Z0-9\s_\\.\-\(\):])+(.doc|.docx|.pdf)$ |
File Date | A date and time expression for filtering the files to transfer. Select one of the following options:
To specify a date, click the calendar. To specify a time, click the clock. Click the calendar to select the date and the clock to select the time. For example, if you schedule the file ingestion and replication task to run weekly and want to filter for the files that were modified in the previous week, set Days before today to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs. |
Time Zone | If you selected a File Date option, enter the time zone of the location where the files are located. |
File Size | Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options. Select one of the following filter options:
|
Skip Duplicate Files | Indicates whether to skip duplicate files. If you select this option, the File Ingestion and Replication task does not transfer files that have the same name and file size as another file. The File Ingestion and Replication task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates. |
Check File Stability | Indicates whether to verify that a file is stable before a File Ingestion and Replication task attempts to pick it. The task skips unstable files it detects in the current run. |
Stability Check Interval | This applies when you enable the Check file stability option. Time in seconds that a File Ingestion and Replication task waits to check the file stability. For example, if the stability time is 15 seconds, the File Ingestion and Replication task detects all the files in the source folder that match the defined file pattern, waits for 15 seconds, and processes only the stable files. The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds. |
Batch Size | The number of files a file ingestion and replication task can transfer in a batch. Default is 5. The maximum number of files you can transfer in a batch is 20. |
After File Pickup | Determines the actions to be performed on the source files after the files transfer. The following options are available:
For example, if lakehouse/Files/archive is the archive directory, lakehouse/Files/test is the source directory, sub1 and sub2 are the directories within the source directory, and you choose to include files from sub-folders, then the folder structure of the archive directory is lakehouse/Files/archive/sub1, lakehouse/Files/archive/sub2. |
System Variables | Description | Expression |
---|---|---|
BadFileDir * | Directory for reject files. It cannot include the following special characters: * ? < > " | , | ${$PMBadFileDir} |
CacheFileDir * | The location for the cache file. | ${$PMCacheDir} |
Date ** | The current date in ISO (yyyy-MM-dd) format. | ${system.date} |
Day ** | The day of week | ${system.day} |
ExtProcDir * | Directory for external procedures. It cannot include the following special characters: * ? < > " | , | ${$PMExtProcDir} |
Hours ** | Hours | ${system.hours} |
JobId | The id (or job number) of the current job. | ${system.jobid} |
LookupFileDir * | Directory for lookup files. It cannot include the following special characters: * ? < > " | , | ${$PMLookupFileDir} |
Minutes ** | Minutes | ${system.minutes} |
Month ** | Numerical month | ${system.month} |
Name | The name of the current Project. | ${system.name} |
RootDir * | Root directory accessible by the node. This is the root directory for other service process variables. It cannot include the following special characters: * ? < > " | , | ${$PMRootDir} |
RunId | The id when a job is run. | ${system.runid} |
Seconds ** | Seconds | ${system.seconds} |
SessionLogDir * | Directory for session logs. It cannot include the following special characters: * ? < > " | , | ${$PMSessionLogDir} |
SourceFileDir * | Directory for source files. It cannot include the following special characters: * ? < > " | , | ${$PMSourceFileDir} |
StorageDir * | Directory for run-time files. Workflow recovery files save to the $PMStorageDir configured in the PowerCenter Integration Service properties. Session recovery files save to the $PMStorageDir configured in the operating system profile. It cannot include the following special characters: * ? < > " | , | ${$PMStorageDir} |
TargetFileDir * | Directory for target files. It cannot include the following special characters: * ? < > " | , | ${$PMTargetFileDir} |
TempDir * | Directory for temporary files. It cannot include the following special characters: * ? < > " | , | ${$PMTempDir} |
Timestamp ** | The current date and time in ISO (yyyy-MM-dd HH:mm:ss) format. | ${system.timestamp} |
WorkflowLogDir * | The location for the workflow log file. | ${$PMWorkflowLogDir} |
Year ** | Year | ${system.year} |
* Values are fetched from the Data Integration Server. ** Time zone is the Secure Agent time zone. |
Option | Description |
---|---|
Target Directory | Directory to where files are transferred. The default value is the target directory specified in the connection. You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the target directory specified in the connection. |
Add Parameters | Create an expression to add it as a Target Directory parameter. For more information, see Source and target parameters. |
If File Exists | Determines what to do with a file if a file with the same name already exists in the target directory. Select one of the following options:
|
Transfer Mode | File transfer mode. Select one of the following options:
Note: If a binary file transfer is interrupted due to a network disruption, the file event displays an interrupted status. Run the file ingestion and replication job again to resume the transfer of the interrupted files. |
Create intermediate file | Creates an intermediate file until the file is completely transferred to the target location. For example, if you transfer a file named file.txt from a source to a target, you see an intermediate file named file.txt_644a1f88 in the target location until the file is completely transferred. |
Option | Description |
---|---|
Target Directory | Directory to where files are transferred. The default value is the target directory specified in the connection. You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the target directory specified in the connection. |
Add Parameters | Create an expression to add it as a Target Directory parameter. For more information, see Source and target parameters. |
If File Exists | Determines what to do with a file if a file with the same name already exists in the target directory. Select one of the following options:
|
Transfer Mode | File transfer mode. Select one of the following options:
Note: If a binary file transfer is interrupted due to a network disruption, the file event displays an interrupted status. Run the file ingestion and replication job again to resume the transfer of the interrupted files. |
Create intermediate file | Creates an intermediate file until the file is completely transferred to the target location. For example, if you transfer a file named file.txt from a source to a target, you see an intermediate file named file.txt_644a1f88 in the target location until the file is completely transferred. |
Option | Description |
---|---|
Target Directory | Directory to where files are transferred. The default value is the target directory specified in the connection. You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the target directory specified in the connection. |
Add Parameters | Create an expression to add it as a Target Directory parameter. For more information, see Source and target parameters. |
If File Exists | Determines what to do with a file if a file with the same name already exists in the target directory. Select one of the following options:
|
Create intermediate file | Creates an intermediate file until the file is completely transferred to the target location. For example, if you transfer a file named file.txt from a source to a target, you see an intermediate file named file.txt_644a1f88 in the target location until the file is completely transferred. |
Option | Description |
---|---|
Target Table Name | Name of the table in Amazon Redshift to which the files are loaded. Note: You can only transfer files that match the source files' metadata. Ensure the target table is present before you run the job. |
Schema | The Amazon Redshift schema name. Default is the schema that is used while establishing the target connection. |
Add Parameters | Create an expression to add it as Schema and Target Table Name parameters. For more information, see Source and target parameters. |
Truncate Target Table | Truncate the target table before loading data to the table. |
Analyze Target Table | The analyze command collects statistics about the contents of tables in the database to help determine the most efficient execution plans for queries. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
Vacuum Target Table | You can select to vacuum the target table to recover disk space and sorts rows in a specified table. Select one of the following recovery options:
Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
File Format and Copy Options | Select the format with which to copy data. Select one of the following options:
|
Property | Description |
---|---|
Copy Command | Amazon Redshift COPY command appends the data to any existing rows in the table. If the Amazon S3 staging directory and the Amazon Redshift target belongs to different regions, you must specify the region in the COPY command. For example, copy public.messages from '{{FROM-S3PATH}}' credentials 'aws_access_key_id={{ACCESS-KEY-ID}};aws_secret_access_key={{SECRET-ACCESS-KEY-ID}}' MAXERROR 0 REGION '' QUOTE '"' DELIMITER ',' NULL '' CSV; Where public is the schema and messages is the table name. For more information about the COPY command, see the AWS documentation. |
Property | Description |
---|---|
Pre SQL | SQL command to run before the file ingestion and replication task runs the COPY command. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
Post SQL | SQL command to run after the file ingestion and replication task runs the COPY command. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
S3 Staging Directory | Specify the Amazon S3 staging directory. You must specify the Amazon S3 staging directory in <bucket_name/folder_name> format. The staging directory is deleted after the file ingestion and replication task runs. |
Upload to Redshift with no Intermediate Staging | Upload files from Amazon S3 to Amazon Redshift directly from the Amazon S3 source directory with no additional, intermediate staging. If you select this option, ensure that the Amazon S3 bucket and the Amazon S3 staging directory belong to the same region. If you do not select this option, ensure that the Amazon S3 staging directory and Amazon Redshift target belong to the same region. |
File Compression* | Determines whether or not files are compressed before they are transferred to the target directory. Select one of the following options:
|
File Encryption Type* | Type of Amazon S3 file encryption to use during file transfer. Select one of the following options:
Note: Client-side encryption does not apply to tasks where Amazon S3 is the source. |
S3 Accelerated Transfer* | Select whether to use Amazon S3 Transfer Acceleration on the S3 bucket. To use Transfer Acceleration, accelerated transfer must be enabled for the bucket. Select one of the following options:
|
Minimum Upload Part Size* | Minimum upload part size in megabytes when uploading a large file as a set of multiple independent parts. Use this option to tune the file load to Amazon S3. |
Multipart Upload Threshold* | Multipart download minimum threshold in megabytes to determine when to upload objects in multiple parts in parallel. |
*Not applicable when you read data from Amazon S3 to Amazon Redshift V2. |
Option | Description |
---|---|
Folder Path | Amazon S3 folder path to where files are transferred, including bucket name. The default value is the folder path specified in the connection. You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the target directory specified in the connection. |
Add Parameters | Create an expression to add it as a Folder Path parameter. For more information, see Source and target parameters. |
File Compression | Determines whether or not files are compressed before they are transferred to the target directory. Select one of the following options:
|
File Encryption Type | Type of Amazon S3 file encryption to use during file transfer. Select one of the following options:
|
If File Exists | Determines what to do with a file if a file with the same name already exists in the target directory. Select one of the following filter options:
|
S3 Accelerated Transfer | Select whether to use Amazon S3 Transfer Acceleration on the S3 bucket. To use Transfer Acceleration, accelerated transfer must be enabled for the bucket. Select one of the following options:
|
Minimum Upload Part Size | Minimum upload part size in megabytes when uploading a large file as a set of multiple independent parts. Use this option to tune the file load to Amazon S3. |
Multipart Upload Threshold | Multipart download minimum threshold in megabytes to determine when to upload objects in multiple parts in parallel. |
Option | Description |
---|---|
Database | Required. Name of the database in Databricks Lake that contains the target table. You can use a relative value to pick the database value passed in the connection. To use a relative value, enter an ellipses (...). |
Add Parameters | Create an expression to add it as Database and Table Name parameters. For more information, see Source and target parameters. |
Table Name | Required. Name of an existing table in Databricks. |
If Table Exists | Determines the action that the Secure Agent must take on a table if the table name matches the name of an existing table in the target database. Select one of the following options:
Default is Overwrite. |
Option | Description |
---|---|
Target Table Name | Specify the Google BigQuery target table name. Note: You can only transfer files that match the source files' metadata. Ensure the target table is present before you run the job. |
Dataset ID | Specify the Google BigQuery dataset name. |
Add Parameters | Create an expression to add it as Target Table Name and Dataset ID parameters. For more information, see Source and target parameters. |
Field Delimiter | Indicates whether Google BigQuery V2 Connector must allow field separators for the fields in a .csv file. |
Quote Character | Specifies the quote character to skip when you write data to Google BigQuery. When you write data to Google BigQuery and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table. |
Allow Quoted Newlines | Indicates whether Google BigQuery V2 Connector must allow the quoted data sections with newline character in a .csv file. |
Allow Jagged Rows | Indicates whether Google BigQuery V2 Connector must accept the rows without trailing columns in a .csv file. |
Skip Leading Rows | Specifies the number of top rows in the source file that Google BigQuery V2 Connector skips when loading the data. The default value is 0. |
Data format of the File | Specifies the data format of the source file. You can select one of the following data formats:
|
Write Disposition | Specifies how Google BigQuery V2 Connector must write data in bulk mode if the target table already exists. You can select one of the following values:
|
Option | Description |
---|---|
Folder Path | Path in Google Cloud Storage where files are transferred. You can either enter the bucket name or the bucket name and folder name. For example, enter <bucket name> or <bucket name>/<folder name> Note: Do not use a single slash (/) in the beginning of path. You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the bucket specified in the connection. |
Add Parameters | Create an expression to add it as a Folder Path parameter. For more information, see Source and target parameters. |
File Compression | Determines whether or not files are compressed before they are transferred to the target directory. Select one of the following options:
|
If File Exists | Determines the action that the Secure Agent must take with a file if a file with the same name exists in the target directory. Select one of the following options:
|
Option | Description |
---|---|
Target Directory | Directory to where files are transferred. |
Add Parameters | Create an expression to add it as a Target Directory parameter. For more information, see Source and target parameters. |
Option | Description |
---|---|
Target Directory | Directory to where files are transferred. The Secure Agent must be able to access the directory. |
Add Parameters | Create an expression to add it as a Target Directory parameter. For more information, see Source and target parameters. |
If File Exists | Determines what to do with a file if a file with the same name exists in the target directory. Select one of the following options:
|
Create intermediate file | Creates an intermediate file until the file is completely transferred to the target location. For example, if you transfer a file named file.txt from a source to a target, you see an intermediate file named file.txt_644a1f88 in the target location until the file is completely transferred. Important: Not applicable when you read data from Amazon S3 V2, Google Cloud Storage V2, Microsoft Azure Blob Storage v3, Microsoft Azure Data Lake Storage Gen2, and Hadoop Files V2. |
Option | Description |
---|---|
Blob Container | Microsoft Azure Blob Storage container, including folder path and container name. |
Add Parameters | Create an expression to add it as a Blob Container parameter. For more information, see Source and target parameters. |
Blob Type | Type of blob. Select one of the following options:
|
File Compression | Determines whether or not files are compressed before they are transferred to the target directory. The following options are available:
|
Number of Concurrent Connections to Blob Store | Number of concurrent connections to the Microsoft Azure Blob Store Storage container. |
Target Property | Description |
---|---|
Target Directory | Directory to where files are transferred. The directory is created at run time if it does not exist. The directory path specified at run time overrides the path specified while creating a connection. The default value is the target directory specified in the connection. You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the target directory specified in the connection. |
Add Parameters | Create an expression to add it as a Target Directory parameter. For more information, see Source and target parameters. |
File Compression | Determines whether or not files are compressed before they are transferred to the target directory. The following options are available:
|
If File Exists | Determines what to do with a file if a file with the same name exists in the target directory. The following options are available:
|
Block Size (Bytes) | Divides a large file into smaller specified block size. When you write a large file, divide the file into smaller parts and configure concurrent connections to spawn the required number of threads to process data in parallel. Default is 8388608 bytes (8 MB). |
Target Property | Description |
---|---|
Target Directory | Directory to where files are transferred. The default value is the target directory specified in the connection. You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the target directory specified in the connection. |
Add Parameters | Create an expression to add it as a Target Directory parameter. For more information, see Source and target parameters. |
File Compression | Determines whether or not files are compressed before they are transferred to the target directory. The following options are available:
|
If File Exists | Determines what to do with a file if a file with the same name exists in the target directory. The following options are available:
|
Property | Description |
---|---|
Ingestion Method | The ingestion method to load data to Microsoft Azure Synapse SQL. Select one of the following options:
|
Command Type | The command type for the ingestion method. Select one of the following options:
|
Property | Description |
---|---|
Target Table Name | Name of the table in Microsoft Azure Synapse SQL to which the files are loaded. Note: You can only transfer files that match the source files' metadata. Ensure the target table is present before you run the job. |
Add Parameters | Create an expression to add it as Target Table Name and Schema parameters. For more information, see Source and target parameters. |
Schema | The Microsoft Azure Synapse SQL schema name. You can enter a relative value to pick the schema value passed in the connection. To use a relative value, enter an ellipses (...). |
Truncate Target Table | Truncate the target table before loading. |
Pre SQL | SQL command to run before the file ingestion and replication task runs the PolyBase or Copy command. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
Post SQL | SQL command to run after the file ingestion and replication task runs the PolyBase or Copy command. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
Field Delimiter | Character used to separate fields in the file. Default is 0x1e. You can select the following field delimiters from the list: ~ ` | . TAB 0x1e |
Quote Character | Specifies the quote character to skip when you write data to Microsoft Azure Synapse SQL. When you write data to Microsoft Azure Synapse SQL and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table. |
External Stage* | Specifies the external stage directory to use for loading files into Microsoft Azure Synapse SQL. You can stage files in Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. File ingestion and replication tasks automatically populates the external stage path property with the values provided in the following properties in the Microsoft Azure Synapse SQL connection in Administrator:
You can override the value. |
File Compression* | Determines whether or not files are compressed before they are transferred to the target directory. The following options are available:
|
Number of Concurrent Connections* | Number of concurrent connections to extract data from the Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. When reading a large file or object, you can spawn multiple threads to process data. Configure Blob Part Size or Block Size to divide a large file into smaller parts. Default is 4. Maximum is 10. |
*Not applicable when you read data from Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. |
Property | Description |
---|---|
File Format Definition | Applies to Polybase ingestion method. Transact-SQL CREATE EXTERNAL FILE FORMAT statement. For example: CREATE EXTERNAL FILE FORMAT {{fileFormatName}} WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = ',', STRING_DELIMITER = '"') ) The following is an example to create an external file in parquet format: CREATE EXTERNAL FILE FORMAT {{fileFormatName}} WITH (FORMAT_TYPE = PARQUET) Similarly, you can create an external file in JSON, Avro, and ORC formats. For more information about the CREATE EXTERNAL FILE FORMAT statement, see the Microsoft documentation. |
External Table Definition | Applies to Polybase ingestion method. Transact-SQL CREATE EXTERNAL TABLE statement. For example: CREATE EXTERNAL TABLE {{externalTable}} ( id INT, name NVARCHAR ( 100 ) ) WITH (LOCATION = '{{blobLocation}}', DATA_SOURCE = {{dataSourceName}}, FILE_FORMAT = {{fileFormatName}}) The following is an example to create an external table in parquet format: CREATE EXTERNAL TABLE {{externalTable}} (username VARCHAR(100),number int,colour VARCHAR(100))WITH (LOCATION='{{blobLocation}}',DATA_SOURCE={{dataSourceName}},FILE_FORMAT={{fileFormatName}}) Similarly, you can create an external table in JSON, Avro, and ORC formats. For more information about the CREATE EXTERNAL TABLE statement, see the Microsoft documentation. |
Insert SQL Definition | Applies to Polybase ingestion method. Transact-SQL INSERT statement. For example: INSERT INTO schema.table (id, name) SELECT id+5, name FROM {{externalTable}} The following is an example for defining insert SQL in parquet format: INSERT INTO testing.test_parq(username,number,colour) SELECT username, number,colour FROM {{externalTable}}; Similarly, you can define insert SQL in JSON, Avro, and ORC formats. For information about the INSERT statement, see the Microsoft documentation. |
Copy Command Definition | Applies to COPY Command ingestion method. Transact-SQL COPY INTO statement. For example: COPY INTO schema.table FROM EXTERNALLOCATION WITH(CREDENTIAL = (AZURECREDENTIALS), FIELDTERMINATOR = ',', FIELDQUOTE = '') The following is an example for defining COPY Command in parquet format: COPY INTO testing.test_parq FROM EXTERNALLOCATION WITH(CREDENTIAL = (AZURECREDENTIALS), FILE_TYPE = 'PARQUET') Similarly, you can define COPY Command in JSON, Avro, and ORC formats. For more information about the COPY INTO statement, see the Microsoft documentation. |
Pre SQL | SQL command to run before the file ingestion and replication task runs the PolyBase command. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
Post SQL | SQL command to run after the file ingestion and replication task runs the PolyBase command. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
External Stage* | Specifies the external stage directory to use for loading files into Microsoft Azure Synapse SQL. You can stage the files in Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. |
Number of Concurrent Connections* | Number of concurrent connections to extract data from the Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. When reading a large file or object, you can spawn multiple threads to process data. Configure Blob Part Size or Block Size to divide a large file into smaller parts. Default is 4. Maximum is 10. |
*Not applicable when you read data from Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. |
Option | Description |
---|---|
Target Directory | Directory to which files are transferred. The Secure Agent must be able to access the directory. |
Add Parameters | Create an expression to add it as a Folder Path parameter. For more information, see Source and target parameters. |
File Compression | Determines whether or not files are compressed before they are transferred to the target directory. Select one of the following options:
|
If File Exists | Determines the action that the Secure Agent must take with a file if a file with the same name exists in the target directory. Select one of the following options:
Note: When you run a job with the append option for binary files, the content of the files does not get appended. |
Property | Description |
---|---|
Warehouse | Overrides the name specified in the Snowflake Data Cloud connection. You can enter a relative value to pick the warehouse value passed in the connection. To enter a relative value, enter three periods (...). |
Add Parameters | Create an expression to add it as Warehouse, Database, Schema, and Target Table Name parameters. For more information, see Source and target parameters. |
Database | The database name of Snowflake Data Cloud. |
Schema | The schema name in Snowflake Data Cloud. |
Target Table Name | The table name of the Snowflake Data Cloud target table. The target table name is case-sensitive. Note: You can only transfer files that match the source files' metadata. Ensure the target table is present before you run the job. |
Role | Overrides the Snowflake Data Cloud user role specified in the connection. You can enter a relative value to pick the role value passed in the connection. To use a relative value, enter an ellipses (...). |
Pre SQL | SQL statement to run on the target before the start of write operations. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
Post SQL | SQL statement to run on the target table after a write operation completes. Note: This operation runs on every batch even if the Parallel Batch value is greater than 1. |
Truncate Target Table | Truncates the database target table before inserting new rows. Enable the option to truncate the target table before inserting all rows. Disabling the option inserts new rows without truncating the target table. |
File Format and Copy Options | The copy option and the file format to load the data to Snowflake Data Cloud. The copy option specifies the action that the task performs when an error is encountered while loading data from a file: You can specify the following copy option to abort the COPY statement if any error is encountered: ON_ERROR = ABORT_STATEMENT When you load files, you can specify the file format and define the rules for the data files. The task uses the specified file format and rules while bulk loading data into Snowflake Data Cloud tables. The following formats are supported:
|
External Stage | Specifies the external stage directory to use for loading files into Snowflake Data Cloud tables. Ensure that the source folder path you specify is the same as the folder path provided in the URL of the external stage for the specific connection type in Snowflake Data Cloud. Applicable when the source for file ingestion and replication is Microsoft Azure Blob Storage and Amazon S3. The external stage is mandatory when you use the connection type Microsoft Azure Blob Storage V3, but is optional for Amazon S3 V2. If you do not specify an external stage for Amazon S3 V2, Snowflake Data Cloud creates an external stage by default. |
File Compression | Determines whether or not files are compressed before they are transferred to the target directory. The following options are available:
Applicable for all sources that support the file ingestion and replication task except for Microsoft Azure Blob Storage V3 and Amazon S3 V2. |
Action | Description |
---|---|
Compress | To compress the files, select Compress. Then select one of the following action types:
You can choose to protect a .zip file by entering a password. If the .zip file contains multiple files, the same password applies to all the compressed files. |
Decompress | To decompress compressed files, select Decompress. Then select one of the following action types:
Use the action type that corresponds to the compression action type that was used to compress the file. For example, for a .zip file, use the Unzip method. Enter the correct password to decompress a password-protected .zip file. A job fails if you decompress zipped files with incorrect passwords. For example: If you have five files and four of them are protected with the same password, but one file has a different password, then the job fails to run when it tries to decompress the file with the different password. When you open compressed files that don't use a password, file ingestion and replication ignores the password that you entered in the action. For example: If you have five files and only four of them are password protected, then the job runs successfully and decompresses all the five files. |
Encrypt | To encrypt files by using the PGP encryption method, select Encrypt. Then, select PGP and enter the key ID of the user who decrypts the file.
The key ID and the key passphrase are enabled. Note: For more information about securing files that file ingestion and replication transfers, see File Ingestion and Replication security. |
Decrypt | To decrypt PGP-encrypted files, select Decrypt. Then, select PGP and enter the key passphrase of the user of the target directory. Do not include spaces in key passphrases. |
File Operations | To perform operations on the files in the target directory, select File Operations. Then select one of the following action types:
If you choose to rename files as the action, enter a variable to suffix the renamed file. |
Virus scan | To scan files for viruses by using the ICAP protocol, select Virus Scan. Then select ICAP and enter the ICAP Server URL or the server where the files are scanned. ICAP sends a response code that indicates whether the malwares are detected in the files. Note: Use the ICAP server of the organization. |