Amazon S3 V2 Connector > Mappings and mapping tasks with Amazon S3 V2 > Amazon S3 V2 objects in mappings
  

Amazon S3 V2 objects in mappings

You can define source and target properties to run mappings and mapping tasks using an Amazon S3 V2 connection. You can also parameterize the connection and objects. You can use a parameter file in the task properties to overwrite the connection and object properties at runtime. Additionally, you can also specify file formatting options for Amazon S3 V2 objects.
You can use Amazon S3 V2 objects in mappings configured in advanced mode.

Amazon S3 V2 sources in mappings

In a mapping, you can configure a Source transformation to represent an Amazon S3 V2 object as the source to read data from Amazon S3.
The following table describes the Amazon S3 V2 source properties that you can configure in a source transformation:
Property
Description
Connection Name
Name of the Amazon S3 V2 source connection. Select a source connection or click New Parameter to define a new parameter for the source connection.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Source Type
Source type. Select one of the following types:
  • - Single Object
  • - Parameter. Select Parameter to define the source type when you configure the mapping task.
Object
Name of the source object.
When you select an object, you can also select a .manifest file object when you want to read from multiple files.
Parameter
Select an existing parameter for the source object or click New Parameter to define a new parameter for the source object. The Parameter property appears only if you select Parameter as the source type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Format
Specifies the file format that the Amazon S3 V2 Connector uses to read data from Amazon S3.
You can select the following file format types:
  • - None1
  • - Flat
  • - Avro
  • - ORC
  • - Parquet
  • - JSON2
  • - Delta1
  • - Discover Structure2
Default is None. If you select None as the format type, the Secure Agent reads data from Amazon S3 files in binary format.
You cannot use parameterized sources when you select the discover structure format.
Open the Formatting Options dialog box to define the format of the file.
For more information, see File formatting options.
Intelligent Structure Model2
Applies to Discover Structure format type. Determines the underlying patterns in a sample file and auto-generates a model for files with the same data and structure.
Select one of the following options to associate a model with the transformation:
  • - Select. Select an existing model.
  • - New. Create a new model. Select Design New to create the model. Select Auto-generate from sample file for Intelligent Structure Discovery to generate a model based on sample input that you select.
Select one of the following options to validate the XML source object against an XML-based hierarchical schema:
  • - Source object doesn't require validation.
  • - Source object requires validation against a hierarchical schema. Select to validate the XML source object against an existing or a new hierarchical schema.
When you create a mapping task, on the Runtime Options tab, you configure how Data Integration handles the schema mismatch. You can choose to skip the mismatched files and continue to run the task or stop the task when the task encounters the first file that does not match.
For more information, see Components.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.
The following table describes the advanced source properties:
Property
Description
Source Type
Type of the source from which you want to read data.
You can select the following source types:
  • - File
  • - Directory
Default is File.
Directory source type doesn't apply to Delta files.
For more information, see Source types in Amazon S3 V2 sources.
Folder Path
Overwrites the bucket name or folder path of the Amazon S3 source file.
If applicable, include the folder name that contains the source file in the <bucket_name>/<folder_name> format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent reads the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
File Name
Overwrites the Amazon S3 source file name.
Incremental File Load2
Indicates whether you want to incrementally load files when you use a directory as the source for a mapping in advanced mode. When you incrementally load files, the mapping task reads and processes only files in the directory that have changed since the mapping task last ran.
For more information, see Incrementally loading files.
Allow Wildcard Characters2
Indicates whether you want to use wildcard characters for the directory source type.
If you select this option, you can use the question mark (?) and asterisk (*) wildcard characters in the folder path or file name.
For more information, see Wildcard characters.
Recursive Directory Read2
Indicates whether you want to read flat, Avro, JSON, ORC, or Parquet files recursively from the specified folder and its subfolders and files. Applicable when you select the directory source type.
For more information, see Recursively read files from directories.
Encryption Type
Method you want to use to decrypt data.
You can select one of the following encryption types:
  • - None
  • - Informatica encryption
Default is None.
Note: You cannot select client-side encryption, server-side encryption, and server-side encryption with KMS encryption types.
Staging Directory1
Path of the local staging directory.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the /temp directory on the machine that hosts the Secure Agent.
When you specify the directory path, the Secure Agent create folders depending on the number of partitions that you specify in the following format: InfaS3Staging<00/11><timestamp>_<partition number> where, 00 represents read operation and 11 represents write operation.
For example, InfaS3Staging000703115851268912800_0.
The temporary files are created within the new directory.
The staging directory source property does not apply to Avro, ORC, Parquet, and Delta files.
Hadoop Performance Tuning Options
This property is not applicable for Amazon S3 V2 Connector.
Compression Format
Decompresses data when you read data from Amazon S3.
You can choose to decompress data in the following formats:
  • - None
  • - Bzip22
  • - Gzip
  • - Lzo
Default is None.
You can decompress data for a mapping in advanced mode if the mapping reads data from a JSON file in Bzip2 format.
Note: Amazon S3 V2 Connector does not support the Lzo compression format even though the option appears in this property.
Download Part Size1
Downloads the part size of an Amazon S3 object in bytes.
Default is 5 MB. Use this property when you run a mapping to read a file of flat format type.
This property applies only to flat files.
Multiple Download Threshold1
Minimum threshold size to download an Amazon S3 object in multiple parts.
To download the object in multiple parts in parallel, ensure that the file size of an Amazon S3 object is greater than the value you specify in this property. Default is 10 MB.
This property applies only to flat files.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.
Tracing Level
This property is not applicable for Amazon S3 V2 Connector.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.

Amazon S3 V2 targets in mappings

In a mapping, you can configure a Target transformation to represent an Amazon S3 V2 object as the target to write data to Amazon S3.
Specify the name and description of the Amazon S3 V2 target. Configure the Amazon S3 V2 target and advanced properties for the target object.
The following table describes the Amazon S3 V2 target properties that you can configure in a Target transformation:
Property
Description
Connection
Name of the Amazon S3 V2 target connection. Select a target connection or click New Parameter to define a new parameter for the target connection.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Target Type
Target type. Select one of the following types:
  • - Single Object
  • - Parameter: Select Parameter to define the target type when you configure the mapping task.
Object
Name of the target object.
You can select an existing object or create an object at runtime. When you create an object at runtime, enter a name and the path for the target object.
Parameter
Select an existing parameter for the source object or click New Parameter to define a new parameter for the target object. The Parameter property appears only if you select Parameter as the target type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Create Target
Creates a target.
Enter a name and path for the target object. You can use parameters defined in a parameter file in the target name.
Format
Specifies the file format that the Amazon S3 V2 Connector uses to write data Amazon S3.
You can select the following file format types:
  • - None 1
  • - Flat
  • - Avro
  • - ORC
  • - Parquet
  • - JSON2
  • - Delta1
Default is None. If you select None is as the format type, the Secure Agent writes data to Amazon S3 files in binary format.
Open the Formatting Options dialog box to define the format of the file.
For more information about format options, see File formatting options.
Operation
Type of the target operation.
You can perform only insert operation on an Amazon S3 V2 target.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.
The following table describes the Amazon S3 V2 advanced target properties that you can configure in a Target transformation:
Property
Description
Overwrite File(s) If Exists
Overwrites an existing target file.
Default is true. For more information about overwriting the existing files, see Overwriting existing files.
Folder Path
Bucket name or folder path where you want to write the Amazon S3 target file. The path that you enter here overrides the path specified for the target configured to create at runtime.
If applicable, include the folder name that contains the target file in the <bucket_name>/<folder_name> format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent writes the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
File Name
Creates a new file name or overwrites an existing target file name.
Encryption Type
Method you want to use to encrypt data.
Select one of the following encryption types:
  • - None
  • - Client Side Encryption1
  • - Server Side Encryption
  • - Server Side Encryption with KMS
  • - Informatica Encryption
Default is None.
For more information about the encryption type, see Data encryption in Amazon S3 V2 targets.
Staging Directory1
Enter the path of the local staging directory.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the /temp directory on the machine that hosts the Secure Agent.
When you specify the directory path, the Secure Agent create folders depending on the number of partitions that you specify in the following format: InfaS3Staging<00/11><timestamp>_<partition number> where, 00 represents read operation and 11 represents write operation.
For example, InfaS3Staging000703115851268912800_0
The temporary files are created within the new directory.
The staging directory target property does not apply to Avro, ORC, Parquet, and Delta files.
File Merge
This property is not applicable for Amazon S3 V2 Connector.
Hadoop Performance Tuning Options
This property is not applicable for Amazon S3 V2 Connector.
Compression Format
Compresses data when you write data to Amazon S3.
You can compress the data in the following formats:
  • - None
  • - Bzip22
  • - Deflate
  • - Gzip
  • - Lzo
  • - Snappy
  • - Zlib
Default is None.
Note: Amazon S3 V2 Connector does not support the Lzo compression format even though the option appears in this property.
For more information about the compression format, see Data compression in Amazon S3 V2 sources and targets.
Object Tags
The key value pairs to add single or multiple tags to the objects stored on the Amazon S3 bucket.
You can either enter the key value pairs or specify the file path that contains the key value pairs.
Use this property when you run a mapping to write a file of flat format type. For more information about the object tags, see Object tag.
This property applies only to flat files.
TransferManager Thread Pool Size1
The number of threads to write data in parallel.
Default is 10. Use this property when you run a mapping to write a file of flat format type.
Amazon S3 V2 Connector uses the AWS TransferManager API to upload a large object in multiple parts to Amazon S3.
When the file size is more than 5 MB, you can configure multipart upload to upload object in multiple parts in parallel. If you set the value of TransferManager Thread Pool Size to greater than 50, the value reverts to 50.
This property applies only to flat files.
Merge Partition Files1
Determines whether the Secure Agent must merge the number of partition files as a single file or maintain separate files based on the number of partitions specified to write data to the Amazon S3 V2 targets.
This property applies only to flat files.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.
Part Size1
Uploads the part size of an Amazon S3 object in bytes.
Default is 5 MB. Use this property when you run a mapping to write a file of flat format type.
This property applies only to flat files.
Forward Rejected Rows
This property is not applicable for Amazon S3 V2 Connector.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.
When you create a mapping and the column name in the Amazon S3 source or target object contains special characters, the Secure Agent replaces the special characters with an underscore (_) and the mapping fails.

Amazon S3 V2 lookups

You can use Amazon S3 V2 objects in a connected and an unconnected cached Lookup transformation.
For more information about the Lookup transformation, see Transformations.

File formatting options

When you select the format of an Amazon S3 file, you can configure the formatting options.
The following table describes the formatting options for Avro, Parquet, JSON, ORC, and delimited flat files:
Property
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
  • - Read from data file. Imports the schema from the file in Amazon S3.
  • - Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
The following table describes the formatting options for flat files:
Property
Description
Read from data file
Imports the schema from the file in Amazon S3.
If you select Read from data file and use the JSON2 file format, you can select one of the following options:
  • - Data elements to sample. The number of rows to read from the metadata.
  • - Memory available to process data. The memory that the parser uses to read the JSON sample schema and process it. You can increase the parser memory. Default is 2 MB.
Import from schema file
Imports schema from a schema definition file in your local machine.
If you select Import from schema file, you can select Schema File to upload a schema file.
You cannot upload a schema file when you select the Create Target option to write data to Amazon S3.
Flat File Type
The type of flat file.
Select one of the following options:
  • - Delimited. Reads a flat file that contains column delimiters.
  • - Fixed Width. Reads a flat file with fields that have a fixed length.
  • You must select the file format in the Fixed Width File Format option.
    If you do not have a fixed-width file format, click New > Components > Fixed Width File Format to create one.
Delimiter
Character used to separate columns of data. You can configure parameters such as comma, tab, colon, semicolon, or others. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.
Escape Char
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string.
You can specify a character or \<decimal value>. When you specify \<decimal value>, the agent considers the ASCII character for the decimal value as the escape character.
For example, if you specify \64, the agent considers the ASCII character @.
To ignore the escape character, specify \0.
Qualifier
Quote character that defines the boundaries of data.
You can set the qualifier as single quote or double quote.
Qualifier Mode
Specify the qualifier behavior for the target object.
You can select one of the following options:
  • - Minimal. Applies qualifier to data that has a delimiter value in the data. Otherwise, the Secure Agent does not apply the qualifier when writing data to the target.
  • - All. Applies the qualifier to all non-empty columns.
Default mode is minimal.
Code Page
Select the code page that the agent must use to read or write data.
Amazon S3 V2 Connector supports the following code pages:
  • - MS Windows Latin 1. Select for ISO 8859-1 Western European data.
  • - UTF-8. Select for Unicode and non-Unicode data.
  • - Shift-JIS. Select for double-byte character data.
  • - ISO 8859-15 Latin 9 (Western European).
  • - ISO 8859-2 Eastern European.
  • - ISO 8859-3 Southeast European.
  • - ISO 8859-5 Cyrillic.
  • - ISO 8859-9 Latin 5 (Turkish).
  • - IBM EBCDIC International Latin-1.
Disable escape char when a qualifier is set
Check to disable the escape character when a qualifier value is already set.
Header Line Number
Specify the line number that you want to use as the header when you read data from Amazon S3. You can also read a file that does not have a header.
To read data from a file with no header, specify the value of the Header Line Number field as 0. To read data from a file with a header, set the value of the Header Line Number field to a value that is greater than or equal to one.
This property is applicable when you preview the source data and at runtime for the mapping.
Default is 1.
First Data Row1
Specify the line number from where you want the Secure Agent to read data. You must enter a value that is greater or equal to one.
To read data from the header, the value of the Header Line Number and the First Data Row fields should be the same. Default is 1.
This property is applicable during runtime and data preview to read a file. This property is applicable during data preview to write a file.
Target Header
Select whether you want to write data to a target that contains a header or without a header in the flat file. You can select With Header or Without Header options.
This property is not applicable when you read data from a Amazon S3 source.
Distribution Column1
Specify the name of the column that is used to create multiple target files during run time.
This property is not applicable when you read data from a Amazon S3 source. For more information about the distribution column, see Distribution column.
Max Rows To Preview
Not applicable to Amazon S3 V2 Connector.
Row Delimiter
Character used to separate rows of data. You can set values as \r, \n, and \r\n.
1 Doesn't apply to mapping in advanced mode.
2Applies only to mappings in advanced mode.
The following table describes the formatting options for JSON files:
Property
Description
Data elements to sample1
Specify the number of rows to read to find the best match to populate the metadata.
Memory available to process data1
The memory that the parser uses to read the JSON sample schema and process it.
The default value is 2 MB.
If the file size is more than 2 MB, you might encounter an error. Set the value to the file size that you want to read.
Read multiple-line JSON files
Not applicable.
1Applies only to mappings in advanced mode.

Delta files

You can read from and write to Delta files in Amazon Redshift.
A Delta file consists of the following components:
Consider the following rules and guidelines when you read from and write to Delta files:

Rules and guidelines for setting formatting options

You must set the appropriate formatting options when you select the Amazon S3 file format types.
Use the following guidelines when you select the format types and set the formatting options:

Specifying a target

You can use an existing target or create a target to hold the results of a mapping. If you choose to create the target, the Secure Agent creates the target when you run the task.
To specify the target properties, follow these steps:
    1Select the Target transformation in the mapping.
    2On the Incoming Fields tab, configure field rules to specify the fields to include in the target.
    3To specify the target, click the Target tab.
    4Select the target connection.
    5For the target type, choose Single Object or Parameter.
    6Specify the target object or parameter. You must specify a .csv target file name.
    7Click Formatting Options if you want to configure the formatting options for the file, and click OK.
    8Click Select and choose a target object. You can select an existing target object or create a new target object at run time and specify the object name.
    The following image shows the Target Object box:
    The following image shows the Target Object box where you can select an existing target object or create a new target object at the runtime.
    9Specify the advanced properties for the target, if needed.

Rules and guidelines for creating a target

Consider the following rules and guidelines when you use the Create Target property:
Rules and guidelines for the path in target object
Consider the following rules and guidelines when you specify a path in the Create Target property:

Amazon S3 V2 parameterization

You can parameterize both Source and Target objects using input parameter and data in advanced properties using in-out parameter.
You can parameterize the file name and target folder location for Amazon S3 V2 target objects to pass the file name and folder location at run time. If the folder does not exist, the Secure Agent creates the folder structure dynamically.
If you configure a mapping with following criteria, the mapping fails:

Parameterization using timestamp

You can append time stamp information to the file name to show when the file is created. You can use parameterization using timestamp when you create a mapping to write a file of flat format type.
You cannot parameterize using timestamp in mappings in advanced mode.
When you specify a file name for the target file, include special characters based on Apache STRFTIME function formats that the mapping task uses to include time stamp information in the file name. You must enable Handle Special Characters options to handle any special characters in the %[mod] format included in the file name. You can use the STRFTIME function formats in a mapping.
If you enable Handle Special Characters, the Secure Agent ignores the input and output parameters in Create Target.
The following table describes some common STRFTIME function formats that you might use in a mapping or mapping task:
Special Character
Description
%d
Day as a two-decimal number, with a range of 01-31.
%m
Month as a two-decimal number, with a range of 01-12.
%y
Year as a two-decimal number without the century, with range of 00-99.
%Y
Year including the century, for example 2015.
%T
Time in 24-hour notation, equivalent to %H:%:M:%S.
%H
Hour in 24-hour clock notation, with a range of 00-24.
%l
Hour in 12-hour clock notation, with a range of 01-12.
%M
Minute as a decimal, with a range of 00-59.
%S
Second as a decimal, with a range of 00-60.
%p
Either AM or PM.

Parameterization using a parameter file

You can parameterize an Amazon S3 V2 target file using a parameter file. You can use a parameter file when you create a mapping to write a file of flat format type.
Perform the following steps to parameterize an Amazon S3 V2 target file using a parameter file:
  1. 1Create an Amazon S3 V2 target object.
  2. 2Specify the values of the Target File Name as $p1 and Target Object Path as $p2 in the Create Target option.
  3. 3Define the parameters that you added for the target object name and target object path in the parameter file.
  4. For example,
    $p1=filename
    $p2=path
  5. 4Place the parameter file in the following location:
  6. <Informatica Cloud Secure Agent\apps\Data_Integration_Server\data\userparameters>
  7. 5Specify the parameter file name in Runtime Options tab of the mapping task.
  8. 6Save and run the mapping task.

Rules and guidelines for mappings in advanced mode

Consider the following guidelines when you create a mapping in advanced mode:

Mapping in advanced mode example

You work for one of the largest community college that maintains millions of records in their ongoing student database. The college has more than 10,000 faculty members teaching at 45 campuses and 700 locations across the globe. The college has a very large IT infrastructure and about 15 TB of information gets downloaded on daily basis from the Internet.
To avoid performance, scalability, and high cost challenges, the college plans to port its entire data from its operational data stores to Amazon S3 within a short span of time. Create a mapping that runs in advanced mode to achieve faster performance when you read all the purchase records from Amazon S3 and write the records to an Amazon Redshift target.
    1In Data Integration, click New > Mappings > Mapping.
    2In the Mapping Designer, click Switch to Advanced.
    The following image shows the Switch to Advanced button in the Mapping Designer. In the Mapping Designer, the header includes the Switch to Advanced button.
    3In the Switch to Advanced dialog box, click Switch to Advanced.
    The Mapping Designer updates the mapping canvas to display the transformations and functions that are available in advanced mode.
    4Enter a name, location, and description for the mapping.
    5On the Source transformation, specify a name and description in the general properties.
    6On the Source tab, perform the following steps to provide the source details to read data from the source:
    1. aIn the Connection field, select the required source connection.
    2. bIn the Source Type field, select the type of the source.
    3. cIn the Object field, select the required object.
    4. dIn the Advanced Properties section, provide the appropriate values.
    7On the Fields tab, map the source fields to the target fields.
    8On the Target transformation, specify a name and description in the general properties.
    9On the Target tab, perform the following steps to provide the target details to write data to the Amazon S3 target:
    1. aIn the Connection field, select the Amazon S3 V2 target connection.
    2. bIn the Target Type field, select the type of the target.
    3. cIn the Object field, select the required object.
    4. dIn the Operation field, select the required operation.
    5. eIn the Advanced Properties section, provide appropriate values for the advanced target properties.
    10Map the source and target.
    11Click Save > Run to validate the mapping.
    In Monitor, you can monitor the status of the logs after you run the task.