Connections for INFACore > Connections to source and target endpoints > Amazon S3
  

Amazon S3

Create an Amazon S3 connection to read from or write data of formats such as Avro, flat, binary, ORC, and Parquet file formats to Amazon S3.

Feature snapshot

Operation
Support
Read
Yes
Write
Yes

Before you begin

Before you configure the connection properties, you'll need to get information from your AWS account.
The following video shows you how to get information from your AWS account:
https://infa.media/3CuOKFQ

Connection properties

The following table describes the Amazon S3 connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Access Key
Access key to access the Amazon S3 bucket.
Enter the access key value based on the following authentication methods:
  • - Basic authentication. Enter the actual access key value.
  • - IAM authentication. Don't enter the access key value.
  • - Temporary security credentials using assume role. Enter the secret access key of an IAM user with no permissions to access Amazon S3 bucket.
  • - Assume role for EC2. Don't enter the access key value.
  • - Credential profile file authentication. Don't enter the access key value.
  • - Federated user single sign-on. Don't enter the secret access key value.
Secret Key
Secret access key to access the Amazon S3 bucket. The secret key is associated with the access key and uniquely identifies the account.
Enter the secret access key value based on the following authentication methods:
  • - Basic authentication. Enter the actual access secret value.
  • - IAM authentication. Don't enter the access secret value.
  • - Temporary security credentials using assume role. Enter access secret of an IAM user with no permissions to access Amazon S3 bucket.
  • - Assume role for EC2. Don't enter the access key value.
  • - Credential profile file authentication. Don't enter the access secret value.
  • - Federated user single sign-on. Don't enter the access secret value.
IAM Role ARN
The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role assumed by the user to use the dynamically generated temporary security credentials.
Enter the value of this property if you want to use the temporary security credentials to access the AWS resources.
Note: Even if you remove the IAM role that enables the agent to access the Amazon S3 bucket and create a connection, the test connection is successful.
For more information about how to get the ARN of the IAM role, see the AWS documentation.
External Id
Provides a more secure access to the Amazon S3 bucket when the Amazon S3 bucket is in a different AWS account.
Use EC2 Role to Assume Role
Enables the EC2 role to assume another IAM role specified in the IAM Role ARN option.
Note: The EC2 role must have a policy attached with a permission to assume an IAM role from the same or different account.
By default, the Use EC2 Role to Assume Role check box is not selected.
Folder Path
Bucket name or complete folder path to the Amazon S3 objects.
Don't use a slash at the end of the folder path. For example, <bucket name>/<my folder name>.
Master Symmetric Key
A 256-bit AES encryption key in the Base64 format when you use client-side encryption. You can generate a key using a third-party tool.
Customer Master Key ID
The customer master key ID or alias name generated by AWS Key Management Service (AWS KMS) or the Amazon Resource Name (ARN) of your custom key for cross-account access.
You must generate the customer master key for the same region where the Amazon S3 bucket resides.
You can specify the following master keys:
  • - Customer generated customer master key. Enables client-side or server-side encryption.
  • - Default customer master key. Enables client-side or server-side encryption. Only the administrator user of the account can use the default customer master key ID to enable client-side encryption.
S3 Account Type
The type of the Amazon S3 account.
Select from the following options:
  • - Amazon S3 Storage. Enables you to use the Amazon S3 services.
  • - S3 Compatible Storage. Enables you to use the endpoint for a third-party storage provider such as Scality RING or MinIO.
Default is Amazon S3 storage.
REST Endpoint
The S3 storage endpoint required for S3 compatible storage.
Enter the S3 storage endpoint in HTTP or HTTPs format.
For example, http://s3.isv.scality.com.
Region Name
The AWS region of the bucket that you want to access.
Select one of the following regions:
  • - Asia Pacific(Mumbai)
  • - Asis Pacific(Jakarta)
  • - Asia Pacific (Osaka)
  • - Asia Pacific(Seoul)
  • - Asia Pacific(Singapore)
  • - Asia Pacific(Sydney)
  • - Asia Pacific(Tokyo)
  • - Asia Pacific(Hong Kong)
  • - AWS GovCloud (US)
  • - AWS GovCloud (US-East)
  • - Canada(Central)
  • - China(Bejing)
  • - China(Ningxia)
  • - EU(Ireland)
  • - EU(Frankfurt)
  • - EU (London)
  • - EU (Milan)
  • - EU(Paris)
  • - EU(Stockholm)
  • - South America(Sao Paulo)
  • - Middle East(Bahrain)
  • - US East(N. Virginia)
  • - US East(Ohio)
  • - US ISO East
  • - US ISOB East (Ohio)
  • - US ISO West
  • - US West(N. California)
  • - US West(Oregon)
Default is US East (N. Virginia).
Federated SSO IdP
SAML 2.0-enabled identity provider for the federated user single sign-on to use with the AWS account.
Amazon S3 connector supports only the ADFS 3.0 identity provider. Select None if you don't want to use federated user single sign-on.
Other Authentication Type
Select one the following authentication types:
  • - NONE
  • - Credential Profile File Authentication
Select the Credential Profile File Authentication option to access the Amazon S3 credentials from a credential file that contains the access key and secret key.
Enter the credential profile file path and the profile name to establish the connection with Amazon S3.
You can use permanent IAM credentials or temporary session tokens when you configure the Credential Profile File Authentication.
Default is NONE.
Credential Profile File Path
Specifies the credential profile file path.
If you don't enter the credential profile path, the Secure Agent uses the credential profile file present in the following default location in your home directory:
~/.aws/credentials
Profile Name
Name of the profile in the credential profile file used to get the credentials.
If you don't enter the profile name, the credentials from the default profile in the credential profile file are used.
S3 VPC Endpoint Type
The VPC endpoint type for Amazon S3.
You can enable private communication with Amazon S3 by selecting a VPC endpoint.
Select one of the following VPC endpoint types:
  • - None
  • - Gateway Endpoint
  • - Interface Endpoint
Default is None.
Endpoint DNS Name for Amazon S3
The DNS name for the Amazon S3 interface endpoint.
Enter the DNS name in the following format:
bucket.<DNS name of the interface endpoint>
STS VPC Endpoint Type
Applicable when you select the S3 VPC interface endpoint.
The VPC endpoint type for AWS STS.
When you select IAM Role ARN or Federated SSO IdP, configure the STS VPC endpoint.
Endpoint DNS Name for AWS STS service
The DNS name for the AWS STS interface endpoint.
KMS VPC Endpoint Type
Applicable when you select the interface endpoint.
The VPC endpoint type for the AWS KMS.
When you select Customer Master Key ID, configure the KMS VPC endpoint.
Endpoint DNS Name for AWS KMS service
The DNS name for the AWS KMS interface endpoint.

Federated user single sign-on connection properties

Configure the following properties when you select ADFS 3.0 in Federated SSO IdP:
Property
Description
Federated User Name
User name of the federated user to access the AWS account through the identity provider.
Federated User Password
Password for the federated user to access the AWS account through the identity provider.
IdP SSO URL
Single sign-on URL of the identity provider for AWS.
SAML Identity Provider ARN
ARN of the SAML identity provider that the AWS administrator created to register the identity provider as a trusted provider.
Role ARN
ARN of the IAM role assumed by the federated user.

Read properties

The following table describes the advanced source properties that you can configure in the Python code to read from Amazon S3:
Property
Description
Source Type
Type of the source from which you want to read data.
You can select the following source types:
  • - File
  • - Directory
Default is File.
Folder Path
Overwrites the bucket name or folder path of the Amazon S3 source file.
If applicable, include the folder name that contains the source file in the <bucket_name>/<folder_name> format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent reads the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
File Name
Overwrites the Amazon S3 source file name.
Incremental File Load
Indicates whether you want to incrementally load files when you use a directory as the source for a mapping in advanced mode. When you incrementally load files, the mapping task reads and processes only files in the directory that have changed since the mapping task last ran.
Allow Wildcard Characters
Indicates whether you want to use wildcard characters for the directory source type.
If you select this option, you can use the question mark (?) and asterisk (*) wildcard characters in the folder path or file name.
Enable Recursive Read
Indicates whether you want to read flat, Avro, JSON, ORC, or Parquet files recursively from the specified folder and its subfolders and files. Applicable when you select the directory source type.
Encryption Type
Method you want to use to decrypt data.
You can select one of the following encryption types:
  • - None
  • - Informatica encryption
Default is None.
Note: You cannot select client-side encryption, server-side encryption, and server-side encryption with KMS encryption types.
Staging Directory
Path of the local staging directory.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the /temp directory on the machine that hosts the Secure Agent.
When you specify the directory path, the Secure Agent create folders depending on the number of partitions that you specify in the following format: InfaS3Staging<00/11><timestamp>_<partition number> where, 00 represents read operation and 11 represents write operation.
For example, InfaS3Staging000703115851268912800_0.
The temporary files are created within the new directory.
The staging directory source property does not apply to Avro, ORC, and Parquet files.
Hadoop Performance Tuning Options
This property is not applicable for Amazon S3 V2 Connector.
Compression Format
Decompresses data when you read data from Amazon S3.
You can choose to decompress the data in the following formats:
  • - None
  • - Bzip22
  • - Gzip
  • - Lzo
Default is None.
You can decompress data for a mapping in advanced mode if the mapping reads data from a JSON file in Bzip2 format.
Note: Amazon S3 V2 Connector does not support the Lzo compression format even though the option appears in this property.
Download Part Size
Downloads the part size of an Amazon S3 object in bytes.
Default is 5 MB. Use this property when you run a mapping to read a file of flat format type.
Multiple Download Threshold
Minimum threshold size to download an Amazon S3 object in multiple parts.
To download the object in multiple parts in parallel, ensure that the file size of an Amazon S3 object is greater than the value you specify in this property. Default is 10 MB.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.

Write properties

The following table describes the advanced target properties that you can configure in the Python code to write to Amazon S3:
Property
Description
Overwrite File(s) If Exists
Overwrites an existing target file.
Default is true.
Folder Path
Bucket name or folder path where you want to write the Amazon S3 target file. The path that you enter here overrides the path specified for the target configured to create at runtime.
If applicable, include the folder name that contains the target file in the <bucket_name>/<folder_name> format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent writes the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
File Name
Creates a new file name or overwrites an existing target file name.
Encryption Type
Method you want to use to encrypt data.
Select one of the following encryption types:
  • - None
  • - Client Side Encryption
  • - Server Side Encryption
  • - Server Side Encryption with KMS
  • - Informatica Encryption
Default is None.
Staging Directory
Enter the path of the local staging directory.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the /temp directory on the machine that hosts the Secure Agent.
When you specify the directory path, the Secure Agent create folders depending on the number of partitions that you specify in the following format: InfaS3Staging<00/11><timestamp>_<partition number> where, 00 represents read operation and 11 represents write operation.
For example, InfaS3Staging000703115851268912800_0
The temporary files are created within the new directory.
The staging directory target property does not apply to Avro, ORC, and Parquet files.
File Merge
This property is not applicable for Amazon S3 V2 Connector.
Hadoop Performance Tuning Options
This property is not applicable for Amazon S3 V2 Connector.
Compression Format
Compresses data when you write data to Amazon S3.
You can compress the data in the following formats:
  • - None
  • - Bzip2
  • - Deflate
  • - Gzip
  • - Lzo
  • - Snappy
  • - Zlib
Default is None.
Note: Amazon S3 V2 Connector does not support the Lzo compression format even though the option appears in this property.
Object Tags
The key value pairs to add single or multiple tags to the objects stored on the Amazon S3 bucket.
You can either enter the key value pairs or specify the file path that contains the key value pairs.
Use this property when you run a mapping to write a file of flat format type.
TransferManager Thread Pool Size
The number of threads to write data in parallel.
Default is 10. Use this property when you run a mapping to write a file of flat format type.
Amazon S3 V2 Connector uses the AWS TransferManager API to upload a large object in multiple parts to Amazon S3.
When the file size is more than 5 MB, you can configure multipart upload to upload object in multiple parts in parallel. If you set the value of TransferManager Thread Pool Size to greater than 50, the value reverts to 50.
Merge Partition Files
Determines whether the Secure Agent must merge the number of partition files as a single file or maintain separate files based on the number of partitions specified to write data to the Amazon S3 V2 targets.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.
Part Size
Uploads the part size of an Amazon S3 object in bytes.
Default is 5 MB. Use this property when you run a mapping to write a file of flat format type.