Amazon S3 Sources

You can use an Amazon S3 object as a source in a synchronization task, mapping, or mapping task.

You can use single Amazon S3 standard object as sources in a synchronization task, mapping, or mapping task. When you configure the advanced source properties, you configure properties specific to Amazon S3.

Client-side Encryption for Amazon S3 Sources

Client-side encryption is a technique to encrypt data before transmitting the data to the Amazon S3 server.

When you enable client-side encryption for Amazon S3 sources, Amazon S3 unloads the data in encrypted format, and then pushes the data to the Secure Agent. The Secure Agent writes the data to the target based on the task or mapping logic.

To enable client-side encryption, you must provide a master symmetric key or customer master key in the connection properties. The Secure Agent encrypts the data by using the master symmetric key or customer master key.

Working with Multiple Files

To read multiple files, all files must be available in the same Amazon S3 bucket. When you want to read from multiple sources in the Amazon S3 bucket, you must create a .manifest file that contains all the source files with the respective absolute path or directory path. You must specify the .manifest file name in the following format: <file_name>.manifest

For example, the .manifest file contains source files in the following format:

{
"fileLocations": [{
"URIs": [
"dir1/dir2/file_1.csv",
"dir1/dir2/dir4/file_2.csv",
"dirA/dirB/file_3.csv",
"dirA/dirB/file_4.csv"
]
}, {
"URIPrefixes": [
"dir1/dir2/",
"dir1/dir2/"]
}
],
"settings": {
"stopOnFail": "true"
}
}

The Data Preview tab displays the data of the first file available in the URI specified in the .manifest file. If the URI section is empty, the first file in the folder specified in URIPrefixes is displayed.

You can specify an asterisk (*) wildcard in the file name to fetch files from the Amazon S3 bucket. You can specify the asterisk (*) wildcard to fetch all the files or only the files that match the name pattern. Specify the wildcard character in the following format:

For example, if you specify result*.txt, all the file names starting with the term result and ending with the .txt file extension are read. If you specify result.*, all the file names starting with the term result are read regardless of the extension.

Use the wildcard character to specify files from a single folder. For example,

{
"fileLocations": [{
"URIs": [
"dir1/dir2/file_1.csv",
"dir1/dir2/dir4/file_2.csv",
]
}, {
"URIPrefixes": [
"dir1/dir2/",
"dir1/dir2/"]
}
],
{ "WildcardURIs": [ "multiread_wildcard/file_1/*.csv" ] }
]
"settings": {
"stopOnFail": "true"
}
}

You cannot use the wildcard characters to specify folder names. For example,

You can configure the stopOnFail property to display error messages while reading multiple files. Set the value to true, if you want the Secure Agent to display error messages if the read operation fails for any of the source files. If you set the value to false, the error messages appear only in the session log. The Secure Agent skips the file that generated the error and continues to read other files.

Partitioning

You can configure partitioning to optimize the mapping performance at run time when you read data from Amazon S3 sources.

The partition type controls how the agent distributes data among partitions at partition points. You can define the partition type as passthrough partitioning. With partitioning, the Secure Agent distributes rows of source data based on the number of threads that you define as partition.

You can specify the value of the Number of Partition field in the Partition tab under the mapping task to configure partitioning for Amazon S3 sources. The Secure Agent configures the partition for Amazon S3 sources based on the value you enter in the Number of Partition field. By default, the value of the Number of Partition field is one.

The Secure Agent enables the partition according to the size of the Amazon S3 source file. The file name is appended with a number starting from 0 in the following format: <file name>_<number>

Note: If you enable partitioning and the precision for the source column is less than the maximum data length in that column, you might receive unexpected results. To avoid unexpected results, the precision for the source column must be equal to or greater than the maximum data length in that column for partitioning to work as expected.

You cannot configure partitioning for mappings that read data from multiple Amazon S3 sources.