Amazon S3 V2 Connector > Amazon S3 V2 sources > Amazon S3 V2 sources

Amazon S3 V2 sources

You can use an Amazon S3 V2 object as a source in a mapping or amapping task.

Specify the name and description of the Amazon S3 V2 source. Configure the Amazon S3 V2 source and advanced properties for the source object.

Data encryption in Amazon S3 V2 sources

You can decrypt data when you read from an Amazon S3 V2 source.

The following table lists the encryption types that you can use for various file types:

Encryption type	File type
Client-side encryption	Binary, Flat
Server-side encryption	Avro, Binary, Delta, Flat, JSON, ORC, Parquet
Server-side encryption with KMS	Avro, Binary, Delta, Flat, JSON, ORC, Parquet
Informatica encryption	Binary, Flat

Client-side encryption for Amazon S3 V2 sources

Client-side encryption is a technique to encrypt data before transmitting the data to the Amazon S3 server.

You can read a client-side encrypted file in an Amazon S3 bucket. To read client-side encrypted files, you must provide a master symmetric key or customer master key in the connection properties. The Secure Agent decrypts the data by using the master symmetric key or customer master key.

When you generate a client-side encrypted file using a third-party tool, metadata for the encrypted file is generated. To read an encrypted file from Amazon S3, you must upload the encrypted file and the metadata for the encrypted file to the Amazon S3 bucket.

You require the following keys in the metadata when you upload the encrypted file:

•Content-Type
•x-amz-meta-x-amz-key
•x-amz-meta-x-amz-unencrypted-content-length
•x-amz-meta-x-amz-matdesc
•x-amz-meta-x-amz-iv

Reading a client-side encrypted file

Perform the following tasks to read a client-side encrypted file:

1Provide the master symmetric key when you create an Amazon S3 V2 connection.

Ensure that you provide a 256-bit AES encryption key in Base64 format.

2Copy the local_policy.jar and US_export_policy.jar files to either of the following directories available within your Secure Agent installation:

- From <Secure Agent installation directory>/jdk/jre/lib/security/policy/unlimited/ to <Secure Agent installation directory>/jdk/jre/lib/security/
- From <Secure Agent installation directory>/jdk8/jre/lib/security/policy/unlimited/ to <Secure Agent installation directory>/jdk8/jre/lib/security/

3Restart the Secure Agent.

Server-side encryption for Amazon S3 V2 sources

Server-side encryption is a technique to encrypt data using Amazon S3-managed encryption keys. Server-side encryption with KMS is a technique to encrypt data using the AWS KMS-managed customer master key.

Server-side encryption: To read a server-side encrypted file, select the encrypted file in the Amazon S3 V2 source.

Server-side encryption with KMS: To read a server-side encrypted file with KMS, specify the AWS KMS-managed customer master key in the Customer Master Key ID connection property and select the encrypted file in the Amazon S3 V2 source.

Note: You do not need to specify the encryption type in the advanced source properties.

Source types in Amazon S3 V2 sources

You can select the type of source from which you want to read data.

You can select the following type of sources from the Source Type option under the Amazon S3 V2 advanced source properties:

File: You must enter the bucket name that contains the Amazon S3 file.; Amazon S3 V2 Connector provides the option to override the value of the Folder Path and File Name properties during run time.

Directory

Reading from multiple files

You can read multiple files, which are of flat format type, from Amazon S3 .

You can use the following types of manifest files:

•Custom manifest file
•Amazon Redshift manifest file

Custom manifest file

You can read multiple files, which are of flat format type, from Amazon S3 . To read multiple flat files, all files must be available in the same Amazon S3 bucket.

When you want to read from multiple sources in the Amazon S3 bucket, you must create a .manifest file that contains all the source files with the respective absolute path or directory path. You must specify the .manifest file name in the following format: <file_name>.manifest.

For example, the .manifest file contains source files in the following format:

{
"fileLocations": [
{
"URIs": [
"dir1/dir2/dir3/file_1.csv",
"dir1/dir2/dir3/file_2.csv",
"dir1/file_3.csv"
]
},
{
"URIPrefixes": [
"dir1/dir2/dir3/",
"dir1/dir2/dir4/"
]
},
{
"WildcardURIs": [
"dir1/dir2/dir3/*.csv"
]
}
]
}

The custom manifest file contains the following tags:

•URIs. Specify the path for the files relative to the bucket name.
•URIPrefixes. Specify the path for the directory relative to the bucket name.
•WildcardURIs. Specify an asterisk (*) wildcard in the file name, which are of flat format type, to fetch files from the Amazon S3 bucket. Specify the asterisk (*) wildcard to fetch all the files or only the files that match the name pattern.

You can specify URIs, URIPrefixes, WildcardURIs, or all sections within fileLocations in the .manifest file.

You cannot use the wildcard characters to specify folder names. For example, { "WildcardURIs": [ "multiread_wildcard/dir1*/", "multiread_wildcard/*/" ] }.

The Data Preview tab displays the data of the first file available in the URI specified in the .manifest file. If the URI section is empty, the first file in the folder specified in URIPrefixes is displayed.

Amazon Redshift manifest file

You can use an Amazon Redshift manifest file created by the UNLOAD command to read multiple flat files from Amazon S3. All flat files must have the same metadata and must be available in the same Amazon S3 bucket.

Create a .manifest file and list all the source files with the URL that includes the bucket name and full object path for the file. You must specify the .manifest file name in the following format: <file_name>.manifest.

For example, the Amazon Redshift manifest file contains source files in the following format:

{
"entries": [
{"url": "s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url": "s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url": "s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url": "s3://mybucket-beta/2013-10-05-custdata", "mandatory":true},
]
}

The Redshift manifest file format contains the following tags:

url: The url tag consists of the source file in the following format:
mandatory: Amazon S3 V2 Connector uses the mandatory tag to determine whether to continue reading the files in the .manifest file or not, based on the following scenarios:; By default, the value of mandatory tag is false.

Reading source objects path

When you import source objects, the Secure Agent appends a FileName field to the imported source object. The FileName field stores the absolute path of the source file from which the Secure Agent reads the data at run time.

For example, a directory contains a number of files and each file contains multiple records that you want to read. You select the directory as source type in the Amazon S3 V2 source advanced properties. When you run the mapping, the Secure Agent reads each record and stores the absolute path of the respective source file in the FileName field.

The FileName field is applicable to the following file formats:

•Avro
•Binary. Applicable only to mappings.
•ORC
•Parquet

Note: Avoid using FileName as the column name in the source data. FileName is a reserved keyword. The name is case sensitive.

Feature	Mapping
File name	xyz.amazonaws.com/aa.bb.bucket/1024/characterscheckfor1024
Directory name	<absolute path of the file including the file name>

Note: The FileName field in a source object uses the format with -, by default. For example, s3-us-west-2.amazonaws.com/<bucket_name>/automation/customer.avro.

To change the format for the FileName field to use ., set the JVM option changeS3EndpointForFileNamePort = true. For example, s3.us-west-2.amazonaws.com/<bucket_name>/automation/customer.avro.

SQL ELT optimization

You can enable full SQL ELT optimization when you want to load data from Amazon S3 sources to your data warehouse in Amazon Redshift. While loading the data to Amazon Redshift, you can transform the data as per your data warehouse model and requirements. When you enable full SQL ELT optimization on a mapping task, the mapping logic is pushed to the AWS environment to leverage AWS commands. Full SQL ELT optimization is enabled by default in mapping tasks.

For more information on SQL ELT optimization, see the help for Amazon Redshift V2 Connector. If your use case involves loading data to any other supported cloud data warehouse, see the connector help for the applicable cloud data warehouse.