You can use an Amazon S3 V2 object as a source in a mapping or amapping task.
Specify the name and description of the Amazon S3 V2 source. Configure the Amazon S3 V2 source and advanced properties for the source object.
Data encryption in Amazon S3 V2 sources
You can decrypt data when you read from an Amazon S3 V2 source.
The following table lists the encryption types that you can use for various file types:
Encryption type
File type
Client-side encryption
Binary, Flat
Server-side encryption
Avro, Binary, Delta, Flat, JSON, ORC, Parquet
Server-side encryption with KMS
Avro, Binary, Delta, Flat, JSON, ORC, Parquet
Informatica encryption
Binary, Flat
Client-side encryption for Amazon S3 V2 sources
Client-side encryption is a technique to encrypt data before transmitting the data to the Amazon S3 server.
You can read a client-side encrypted file in an Amazon S3 bucket. To read client-side encrypted files, you must provide a master symmetric key or customer master key in the connection properties. The Secure Agent decrypts the data by using the master symmetric key or customer master key.
When you generate a client-side encrypted file using a third-party tool, metadata for the encrypted file is generated. To read an encrypted file from Amazon S3, you must upload the encrypted file and the metadata for the encrypted file to the Amazon S3 bucket.
You require the following keys in the metadata when you upload the encrypted file:
•Content-Type
•x-amz-meta-x-amz-key
•x-amz-meta-x-amz-unencrypted-content-length
•x-amz-meta-x-amz-matdesc
•x-amz-meta-x-amz-iv
Reading a client-side encrypted file
Perform the following tasks to read a client-side encrypted file:
1Provide the master symmetric key when you create an Amazon S3 V2 connection.
Ensure that you provide a 256-bit AES encryption key in Base64 format.
2Copy the local_policy.jar and US_export_policy.jar files to either of the following directories available within your Secure Agent installation:
- From <Secure Agent installation directory>/jdk/jre/lib/security/policy/unlimited/ to <Secure Agent installation directory>/jdk/jre/lib/security/
- From <Secure Agent installation directory>/jdk8/jre/lib/security/policy/unlimited/ to <Secure Agent installation directory>/jdk8/jre/lib/security/
3Restart the Secure Agent.
Server-side encryption for Amazon S3 V2 sources
Server-side encryption is a technique to encrypt data using Amazon S3-managed encryption keys. Server-side encryption with KMS is a technique to encrypt data using the AWS KMS-managed customer master key.
Server-side encryption
To read a server-side encrypted file, select the encrypted file in the Amazon S3 V2 source.
Server-side encryption with KMS
To read a server-side encrypted file with KMS, specify the AWS KMS-managed customer master key in the Customer Master Key ID connection property and select the encrypted file in the Amazon S3 V2 source.
Note: You do not need to specify the encryption type in the advanced source properties.
Source types in Amazon S3 V2 sources
You can select the type of source from which you want to read data.
You can select the following type of sources from the Source Type option under the Amazon S3 V2 advanced source properties:
File
You must enter the bucket name that contains the Amazon S3 file.
Amazon S3 V2 Connector provides the option to override the value of the Folder Path and File Name properties during run time.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent writes the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
Directory
You must select the source file when you create a mapping and select the source type as Directory at run time. When you select the Source Type option as Directory, the value of File Name is honored only when you use wildcard characters to specify the folder path or file name, or recursively read files from directories.
For the read operation, if you provide the Folder Path value during run time, the Secure Agent considers the value of the Folder Path from the advanced source properties. If you do not provide the Folder Path value during run time, the Secure Agent considers the value of the Folder Path that you specify during the connection creation.
Use the following rules and guidelines to select Directory as the source type:
- All the source files in the directory must contain the same metadata.
- All the files must have data in the same format. For example, delimiters, header fields, and escape characters must be same.
- All the files under a specified directory are parsed. The files under subdirectories are parsed only when you recursively read files from directories.
Reading from multiple files
You can read multiple files, which are of flat format type, from Amazon S3 .
You can use the following types of manifest files:
•Custom manifest file
•Amazon Redshift manifest file
Custom manifest file
You can read multiple files, which are of flat format type, from Amazon S3 . To read multiple flat files, all files must be available in the same Amazon S3 bucket.
When you want to read from multiple sources in the Amazon S3 bucket, you must create a .manifest file that contains all the source files with the respective absolute path or directory path. You must specify the .manifest file name in the following format: <file_name>.manifest.
For example, the .manifest file contains source files in the following format:
The custom manifest file contains the following tags:
•URIs. Specify the path for the files relative to the bucket name.
•URIPrefixes. Specify the path for the directory relative to the bucket name.
•WildcardURIs. Specify an asterisk (*) wildcard in the file name, which are of flat format type, to fetch files from the Amazon S3 bucket. Specify the asterisk (*) wildcard to fetch all the files or only the files that match the name pattern.
You can specify URIs, URIPrefixes, WildcardURIs, or all sections within fileLocations in the .manifest file.
You cannot use the wildcard characters to specify folder names. For example, { "WildcardURIs": [ "multiread_wildcard/dir1*/", "multiread_wildcard/*/" ] }.
The Data Preview tab displays the data of the first file available in the URI specified in the .manifest file. If the URI section is empty, the first file in the folder specified in URIPrefixes is displayed.
Amazon Redshift manifest file
You can use an Amazon Redshift manifest file created by the UNLOAD command to read multiple flat files from Amazon S3. All flat files must have the same metadata and must be available in the same Amazon S3 bucket.
Create a .manifest file and list all the source files with the URL that includes the bucket name and full object path for the file. You must specify the .manifest file name in the following format: <file_name>.manifest.
For example, the Amazon Redshift manifest file contains source files in the following format:
Amazon S3 V2 Connector uses the mandatory tag to determine whether to continue reading the files in the .manifest file or not, based on the following scenarios:
- If the value of mandatory tag is true, and the S3 bucket does not have the specified source file, Amazon S3 V2 Connector does not read the rest of the files as well in the .manifest file. The mapping task fails.
- If the value of mandatory tag is false, and the S3 bucket does not have the specified file, Amazon S3 V2 Connector continues to read the rest of the files in the .manifest file in a sequence.
- If the .manifest file does not contain any files, the mapping task fails.
By default, the value of mandatory tag is false.
Reading source objects path
When you import source objects, the Secure Agent appends a FileName field to the imported source object. The FileName field stores the absolute path of the source file from which the Secure Agent reads the data at run time.
For example, a directory contains a number of files and each file contains multiple records that you want to read. You select the directory as source type in the Amazon S3 V2 source advanced properties. When you run the mapping, the Secure Agent reads each record and stores the absolute path of the respective source file in the FileName field.
The FileName field is applicable to the following file formats:
•Avro
•Binary. Applicable only to mappings.
•ORC
•Parquet
Note: Avoid using FileName as the column name in the source data. FileName is a reserved keyword. The name is case sensitive.
<absolute path of the file including the file name>
Note: The FileName field in a source object uses the format with -, by default. For example, s3-us-west-2.amazonaws.com/<bucket_name>/automation/customer.avro.
To change the format for the FileName field to use ., set the JVM option changeS3EndpointForFileNamePort = true. For example, s3.us-west-2.amazonaws.com/<bucket_name>/automation/customer.avro.
SQL ELT optimization
You can enable full SQL ELT optimization when you want to load data from Amazon S3 sources to your data warehouse in Amazon Redshift. While loading the data to Amazon Redshift, you can transform the data as per your data warehouse model and requirements. When you enable full SQL ELT optimization on a mapping task, the mapping logic is pushed to the AWS environment to leverage AWS commands. Full SQL ELT optimization is enabled by default in mapping tasks.
For more information on SQL ELT optimization, see the help for Amazon Redshift V2 Connector. If your use case involves loading data to any other supported cloud data warehouse, see the connector help for the applicable cloud data warehouse.