You can define source and target properties to run mappings and mapping tasks using an Amazon S3 V2 connection. You can also parameterize the connection and objects. You can use a parameter file in the task properties to overwrite the connection and object properties at runtime. Additionally, you can also specify file formatting options for Amazon S3 V2 objects.
You can use Amazon S3 V2 objects in mappings configured in advanced mode.
Amazon S3 V2 sources in mappings
In a mapping, you can configure a Source transformation to represent an Amazon S3 V2 object as the source to read data from Amazon S3.
The following table describes the Amazon S3 V2 source properties that you can configure in a source transformation:
Property
Description
Connection Name
Name of the Amazon S3 V2 source connection. Select a source connection or click New Parameter to define a new parameter for the source connection.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Source Type
Source type. Select one of the following types:
- Single Object
- Parameter. Select Parameter to define the source type when you configure the mapping task.
Object
Name of the source object.
When you select an object, you can also select a .manifest file object when you want to read from multiple files.
Parameter
Select an existing parameter for the source object or click New Parameter to define a new parameter for the source object. The Parameter property appears only if you select Parameter as the source type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Format
Specifies the file format that the Amazon S3 V2 Connector uses to read data from Amazon S3.
You can select the following file format types:
- None1
- Flat
- Avro
- ORC
- Parquet
- JSON2
- Delta1
- Discover Structure2
Default is None. If you select None as the format type, the Secure Agent reads data from Amazon S3 files in binary format.
You cannot use parameterized sources when you select the discover structure format.
Open the Formatting Options dialog box to define the format of the file.
Applies to Discover Structure format type. Determines the underlying patterns in a sample file and auto-generates a model for files with the same data and structure.
Select one of the following options to associate a model with the transformation:
- Select. Select an existing model.
- New. Create a new model. Select Design New to create the model. Select Auto-generate from sample file for Intelligent Structure Discovery to generate a model based on sample input that you select.
Select one of the following options to validate the XML source object against an XML-based hierarchical schema:
- Source object doesn't require validation.
- Source object requires validation against a hierarchical schema. Select to validate the XML source object against an existing or a new hierarchical schema.
When you create a mapping task, on the Runtime Options tab, you configure how Data Integration handles the schema mismatch. You can choose to skip the mismatched files and continue to run the task or stop the task when the task encounters the first file that does not match.
For more information, see Components.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.
The following table describes the advanced source properties:
Property
Description
Source Type
Type of the source from which you want to read data.
You can select the following source types:
- File
- Directory
Default is File.
Directory source type doesn't apply to Delta files.
Overwrites the bucket name or folder path of the Amazon S3 source file.
If applicable, include the folder name that contains the source file in the <bucket_name>/<folder_name> format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent reads the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
File Name
Overwrites the Amazon S3 source file name.
Incremental File Load2
Indicates whether you want to incrementally load files when you use a directory as the source for a mapping in advanced mode. When you incrementally load files, the mapping task reads and processes only files in the directory that have changed since the mapping task last ran.
Indicates whether you want to read flat, Avro, JSON, ORC, or Parquet files recursively from the specified folder and its subfolders and files. Applicable when you select the directory source type.
You can select one of the following encryption types:
- None
- Informatica encryption
Default is None.
Note: You cannot select client-side encryption, server-side encryption, and server-side encryption with KMS encryption types.
Staging Directory1
Path of the local staging directory.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the /temp directory on the machine that hosts the Secure Agent.
When you specify the directory path, the Secure Agent create folders depending on the number of partitions that you specify in the following format: InfaS3Staging<00/11><timestamp>_<partition number> where, 00 represents read operation and 11 represents write operation.
For example, InfaS3Staging000703115851268912800_0.
The temporary files are created within the new directory.
The staging directory source property does not apply to Avro, ORC, Parquet, and Delta files.
Hadoop Performance Tuning Options
This property is not applicable for Amazon S3 V2 Connector.
Compression Format
Decompresses data when you read data from Amazon S3.
You can choose to decompress data in the following formats:
- None
- Bzip22
- Gzip
- Lzo
Default is None.
You can decompress data for a mapping in advanced mode if the mapping reads data from a JSON file in Bzip2 format.
Note: Amazon S3 V2 Connector does not support the Lzo compression format even though the option appears in this property.
Downloads the part size of an Amazon S3 object in bytes.
Default is 5 MB. Use this property when you run a mapping to read a file of flat format type.
This property applies only to flat files.
Multiple Download Threshold1
Minimum threshold size to download an Amazon S3 object in multiple parts.
To download the object in multiple parts in parallel, ensure that the file size of an Amazon S3 object is greater than the value you specify in this property. Default is 10 MB.
This property applies only to flat files.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.
Tracing Level
This property is not applicable for Amazon S3 V2 Connector.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.
Amazon S3 V2 targets in mappings
In a mapping, you can configure a Target transformation to represent an Amazon S3 V2 object as the target to write data to Amazon S3.
Specify the name and description of the Amazon S3 V2 target. Configure the Amazon S3 V2 target and advanced properties for the target object.
The following table describes the Amazon S3 V2 target properties that you can configure in a Target transformation:
Property
Description
Connection
Name of the Amazon S3 V2 target connection. Select a target connection or click New Parameter to define a new parameter for the target connection.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Target Type
Target type. Select one of the following types:
- Single Object
- Parameter: Select Parameter to define the target type when you configure the mapping task.
Object
Name of the target object.
You can select an existing object or create an object at runtime. When you create an object at runtime, enter a name and the path for the target object.
Parameter
Select an existing parameter for the source object or click New Parameter to define a new parameter for the target object. The Parameter property appears only if you select Parameter as the target type.
If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties.
Create Target
Creates a target.
Enter a name and path for the target object. You can use parameters defined in a parameter file in the target name.
You can perform only insert operation on an Amazon S3 V2 target.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.
The following table describes the Amazon S3 V2 advanced target properties that you can configure in a Target transformation:
Property
Description
Overwrite File(s) If Exists
Overwrites an existing target file.
Default is true. For more information about overwriting the existing files, see Overwriting existing files.
Folder Path
Bucket name or folder path where you want to write the Amazon S3 target file. The path that you enter here overrides the path specified for the target configured to create at runtime.
If applicable, include the folder name that contains the target file in the <bucket_name>/<folder_name> format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent writes the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
File Name
Creates a new file name or overwrites an existing target file name.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the /temp directory on the machine that hosts the Secure Agent.
When you specify the directory path, the Secure Agent create folders depending on the number of partitions that you specify in the following format: InfaS3Staging<00/11><timestamp>_<partition number> where, 00 represents read operation and 11 represents write operation.
For example, InfaS3Staging000703115851268912800_0
The temporary files are created within the new directory.
The staging directory target property does not apply to Avro, ORC, Parquet, and Delta files.
File Merge
This property is not applicable for Amazon S3 V2 Connector.
Hadoop Performance Tuning Options
This property is not applicable for Amazon S3 V2 Connector.
Compression Format
Compresses data when you write data to Amazon S3.
You can compress the data in the following formats:
- None
- Bzip22
- Deflate
- Gzip
- Lzo
- Snappy
- Zlib
Default is None.
Note: Amazon S3 V2 Connector does not support the Lzo compression format even though the option appears in this property.
The key value pairs to add single or multiple tags to the objects stored on the Amazon S3 bucket.
You can either enter the key value pairs or specify the file path that contains the key value pairs.
Use this property when you run a mapping to write a file of flat format type. For more information about the object tags, see Object tag.
This property applies only to flat files.
TransferManager Thread Pool Size1
The number of threads to write data in parallel.
Default is 10. Use this property when you run a mapping to write a file of flat format type.
Amazon S3 V2 Connector uses the AWS TransferManager API to upload a large object in multiple parts to Amazon S3.
When the file size is more than 5 MB, you can configure multipart upload to upload object in multiple parts in parallel. If you set the value of TransferManager Thread Pool Size to greater than 50, the value reverts to 50.
This property applies only to flat files.
Merge Partition Files1
Determines whether the Secure Agent must merge the number of partition files as a single file or maintain separate files based on the number of partitions specified to write data to the Amazon S3 V2 targets.
This property applies only to flat files.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.
Part Size1
Uploads the part size of an Amazon S3 object in bytes.
Default is 5 MB. Use this property when you run a mapping to write a file of flat format type.
This property applies only to flat files.
Forward Rejected Rows
This property is not applicable for Amazon S3 V2 Connector.
1Doesn't apply to mappings in advanced mode.
2Applies only to mappings in advanced mode.
When you create a mapping and the column name in the Amazon S3 source or target object contains special characters, the Secure Agent replaces the special characters with an underscore (_) and the mapping fails.
Amazon S3 V2 lookups
You can use Amazon S3 V2 objects in a connected and an unconnected cached Lookup transformation.
For more information about the Lookup transformation, see Transformations.
File formatting options
When you select the format of an Amazon S3 file, you can configure the formatting options.
The following table describes the formatting options for Avro, Parquet, JSON, ORC, and delimited flat files:
Property
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
- Read from data file. Imports the schema from the file in Amazon S3.
- Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
The following table describes the formatting options for flat files:
Property
Description
Read from data file
Imports the schema from the file in Amazon S3.
If you select Read from data file and use the JSON2 file format, you can select one of the following options:
- Data elements to sample. The number of rows to read from the metadata.
- Memory available to process data. The memory that the parser uses to read the JSON sample schema and process it. You can increase the parser memory. Default is 2 MB.
Import from schema file
Imports schema from a schema definition file in your local machine.
If you select Import from schema file, you can select Schema File to upload a schema file.
You cannot upload a schema file when you select the Create Target option to write data to Amazon S3.
Flat File Type
The type of flat file.
Select one of the following options:
- Delimited. Reads a flat file that contains column delimiters.
- Fixed Width. Reads a flat file with fields that have a fixed length.
You must select the file format in the Fixed Width File Format option.
If you do not have a fixed-width file format, click New > Components > Fixed Width File Format to create one.
Delimiter
Character used to separate columns of data. You can configure parameters such as comma, tab, colon, semicolon, or others. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.
Escape Char
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string.
You can specify a character or \<decimal value>. When you specify \<decimal value>, the agent considers the ASCII character for the decimal value as the escape character.
For example, if you specify \64, the agent considers the ASCII character @.
To ignore the escape character, specify \0.
Qualifier
Quote character that defines the boundaries of data.
You can set the qualifier as single quote or double quote.
Qualifier Mode
Specify the qualifier behavior for the target object.
You can select one of the following options:
- Minimal. Applies qualifier to data that has a delimiter value in the data. Otherwise, the Secure Agent does not apply the qualifier when writing data to the target.
- All. Applies the qualifier to all non-empty columns.
Default mode is minimal.
Code Page
Select the code page that the agent must use to read or write data.
Amazon S3 V2 Connector supports the following code pages:
- MS Windows Latin 1. Select for ISO 8859-1 Western European data.
- UTF-8. Select for Unicode and non-Unicode data.
- Shift-JIS. Select for double-byte character data.
- ISO 8859-15 Latin 9 (Western European).
- ISO 8859-2 Eastern European.
- ISO 8859-3 Southeast European.
- ISO 8859-5 Cyrillic.
- ISO 8859-9 Latin 5 (Turkish).
- IBM EBCDIC International Latin-1.
Disable escape char when a qualifier is set
Check to disable the escape character when a qualifier value is already set.
Header Line Number
Specify the line number that you want to use as the header when you read data from Amazon S3. You can also read a file that does not have a header.
To read data from a file with no header, specify the value of the Header Line Number field as 0. To read data from a file with a header, set the value of the Header Line Number field to a value that is greater than or equal to one.
This property is applicable when you preview the source data and at runtime for the mapping.
Default is 1.
First Data Row1
Specify the line number from where you want the Secure Agent to read data. You must enter a value that is greater or equal to one.
To read data from the header, the value of the Header Line Number and the First Data Row fields should be the same. Default is 1.
This property is applicable during runtime and data preview to read a file. This property is applicable during data preview to write a file.
Target Header
Select whether you want to write data to a target that contains a header or without a header in the flat file. You can select With Header or Without Header options.
This property is not applicable when you read data from a Amazon S3 source.
Distribution Column1
Specify the name of the column that is used to create multiple target files during run time.
This property is not applicable when you read data from a Amazon S3 source. For more information about the distribution column, see Distribution column.
Max Rows To Preview
Not applicable to Amazon S3 V2 Connector.
Row Delimiter
Character used to separate rows of data. You can set values as \r, \n, and \r\n.
1 Doesn't apply to mapping in advanced mode.
2Applies only to mappings in advanced mode.
The following table describes the formatting options for JSON files:
Property
Description
Data elements to sample1
Specify the number of rows to read to find the best match to populate the metadata.
Memory available to process data1
The memory that the parser uses to read the JSON sample schema and process it.
The default value is 2 MB.
If the file size is more than 2 MB, you might encounter an error. Set the value to the file size that you want to read.
Read multiple-line JSON files
Not applicable.
1Applies only to mappings in advanced mode.
Delta files
You can read from and write to Delta files in Amazon Redshift.
A Delta file consists of the following components:
•Parquet files where the data is stored.
•JSON files where the metadata and data change logs are stored.
Each transaction that modifies the data results in a new JSON file. The JSON files are stored in _delta_log directory.
Consider the following rules and guidelines when you read from and write to Delta files:
•You cannot read and write Delta files in a mapping in SQL ELT mode or a mapping task enabled with SQL ELT optimization.
•You cannot use source partitioning or target partitioning when you read from or write to Delta files.
•When you read from a Delta file and edit the metadata, do not change the data type. Else, the mapping fails. You can only change the precision of the data types.
• If you select the Delta format type and select Import from schema file as the value of the Schema Source formatting option, you can only upload a schema file in the JSON format.
The following sample shows a schema file for a Delta file:
•When there is a change in the metadata of a Delta file, you cannot write data to the same Delta file in its current folder path or bucket. You must specify a different path or bucket name.
Rules and guidelines for setting formatting options
You must set the appropriate formatting options when you select the Amazon S3 file format types.
Use the following guidelines when you select the format types and set the formatting options:
•You can use JSON format only for mappings in advanced mode.
•When you create a mapping and if you do not click the Formatting Options tab, the Secure Agent considers the Format Type as None by default.
•If you select an Avro, JSON, ORC, or Parquet format type and select Read from data file as the value of the Schema Source formatting option, you cannot configure the delimiter, escapeChar, and qualifier options.
•If you select an Avro, JSON, ORC, or Parquet format type and select Import from schema file as the value of the Schema Source formatting option, you can only upload a schema file in the Schema File property field. You cannot configure the delimiter, escapeChar, and qualifier options.
• If you select the flat format type and select Import from schema file as the value of the Schema Source formatting option, you can only upload a schema file in the JSON format.
The following sample shows a schema file for a flat file:
•Set the appropriate Formatting Options for the Avro, JSON, ORC, or Parquet format types that you select to avoid the following exception:
invalid character encapsulated
•You cannot select the Read multiple-line JSON files checkbox in the formatting options, as Amazon S3 V2 does not support the feature.
•When you run a mapping with an Amazon S3 V2 source, with source columns having values of Parquet-datetime data types, then the timestamp values that are mapped to the target do not appear in UTC format if you do not enable full SQL ELT optimization.
Specifying a target
You can use an existing target or create a target to hold the results of a mapping. If you choose to create the target, the Secure Agent creates the target when you run the task.
To specify the target properties, follow these steps:
1Select the Target transformation in the mapping.
2On the Incoming Fields tab, configure field rules to specify the fields to include in the target.
3To specify the target, click the Target tab.
4Select the target connection.
5For the target type, choose Single Object or Parameter.
6Specify the target object or parameter. You must specify a .csv target file name.
- To create a target file at run time, enter the name for the target file including the extension. For example, Accounts.csv.
7Click Formatting Options if you want to configure the formatting options for the file, and click OK.
8Click Select and choose a target object. You can select an existing target object or create a new target object at run time and specify the object name.
The following image shows the Target Object box:
9Specify the advanced properties for the target, if needed.
Rules and guidelines for creating a target
Consider the following rules and guidelines when you use the Create Target property:
•The target name can contain alphanumeric characters. You can use only a period (.), an underscore (_), an at the rate sign (@), a dollar sign ($), and a percentage sign (%) special characters in the file name.
•When you write an Avro, ORC, or Parquet file using Create Target and the target directory has a file name that contains a colon (:), the mapping fails.
•If you specify a target name that already exists, you do not get a warning message. However, the Secure Agent overwrites the existing target with the same file name.
•When you write an Avro, ORC, or Parquet file using the Create Target option, you cannot provide a Null data type.
•When you configure the precision of a string column of a JSON file in the source and select Create Target, the default precision of the string column is retained.
•When you select the file or directory source type, select Create New at Runtime, and run the mapping, the tomcat log shows the following exception even if the mapping succeeds:
Internal error. Encountered an error because invalid path element [Partition] was encountered. Contact Informatica Global Customer Support.
•When you run a mapping on the advanced mode with an Amazon S3 source, and create a new target object by selecting Create New at Runtime, the mapping is successful, but the special characters in the source are replaced by an underscore (_).
•When you write data to a flat file created at runtime, the target flat file contains a blank line at the end of the file.
Rules and guidelines for the path in target object
Consider the following rules and guidelines when you specify a path in the Create Target property:
•If you specify the path, the Secure Agent creates target object in the path you specify in this property and within the bucket that you specify in the Folder Path connection property. The Secure Agent creates target object in the following format: <bucket_name>/<path_name>/<target_object_name>.
•The Secure Agent only considers the bucket and ignores the path you specify in the Folder Path connection property.
For example, specify the path as folder1/folder2 and target object name as Records. Specify <bucket_name>/folder3 as the Folder Path in the connection property. The Secure Agent creates the target object in the following location: <bucket_name>/folder1/folder2/Records.
•If you do not specify the path, the Secure Agent creates target object name within the bucket that you specify in the Folder Path connection property in the following format: <bucket_name>/<target_object_name>.
For example, if you do not specify the path and specify the target object name as Records, the Secure Agent creates the target object within the bucket that you specify in the Folder Path connection property in the following location: <bucket_name>/Records.
•Do not specify a bucket name in the path.
- For mappings, if you specify a bucket name in the path, the bucket name is considered as a folder in the folder path. For example, if you specify the path as <bucket_name>/<path_name>, <bucket_name> is considered as a folder.
- For mappings in advanced mode, if you specify a bucket name in the path, the bucket name is ignored.
Amazon S3 V2 parameterization
You can parameterize both Source and Target objects using input parameter and data in advanced properties using in-out parameter.
You can parameterize the file name and target folder location for Amazon S3 V2 target objects to pass the file name and folder location at run time. If the folder does not exist, the Secure Agent creates the folder structure dynamically.
If you configure a mapping with following criteria, the mapping fails:
•Parameterized source and target
•Allow parameter to be overridden at run time checkbox is selected
•Source object is selected within a folder during the mapping task creation
Parameterization using timestamp
You can append time stamp information to the file name to show when the file is created. You can use parameterization using timestamp when you create a mapping to write a file of flat format type.
You cannot parameterize using timestamp in mappings in advanced mode.
When you specify a file name for the target file, include special characters based on Apache STRFTIME function formats that the mapping task uses to include time stamp information in the file name. You must enable Handle Special Characters options to handle any special characters in the %[mod] format included in the file name. You can use the STRFTIME function formats in a mapping.
If you enable Handle Special Characters, the Secure Agent ignores the input and output parameters in Create Target.
The following table describes some common STRFTIME function formats that you might use in a mapping or mapping task:
Special Character
Description
%d
Day as a two-decimal number, with a range of 01-31.
%m
Month as a two-decimal number, with a range of 01-12.
%y
Year as a two-decimal number without the century, with range of 00-99.
%Y
Year including the century, for example 2015.
%T
Time in 24-hour notation, equivalent to %H:%:M:%S.
%H
Hour in 24-hour clock notation, with a range of 00-24.
%l
Hour in 12-hour clock notation, with a range of 01-12.
%M
Minute as a decimal, with a range of 00-59.
%S
Second as a decimal, with a range of 00-60.
%p
Either AM or PM.
Parameterization using a parameter file
You can parameterize an Amazon S3 V2 target file using a parameter file. You can use a parameter file when you create a mapping to write a file of flat format type.
Perform the following steps to parameterize an Amazon S3 V2 target file using a parameter file:
1Create an Amazon S3 V2 target object.
2Specify the values of the Target File Name as $p1 and Target Object Path as $p2 in the Create Target option.
3Define the parameters that you added for the target object name and target object path in the parameter file.
For example,
$p1=filename $p2=path
4Place the parameter file in the following location:
5Specify the parameter file name in Runtime Options tab of the mapping task.
6Save and run the mapping task.
Rules and guidelines for mappings in advanced mode
Consider the following guidelines when you create a mapping in advanced mode:
•When you run a mapping in advanced mode, a folder of the following format is created in the target and multiple target files are generated within the folder: <target_foldername>.<file_extension>.
•When a mapping in advanced mode writes data to an Amazon S3 file, the file is replaced with a folder and the target file is generated inside the folder. If you create another mapping in advanced mode that references the same Amazon S3 file, the file cannot be found and the mapping fails with the following error message:
Operation failed: Index: 0, Size: 0.
•When you configure a read operation to read from a file, the file name should not begin with _ or ..
•When you configure a Lookup transformation and select the Report Error option to report an error on multiple matches, the data from the unmatched columns is written to the target without displaying an error message.
•When you read data from a JSON file and if one of the rows contains an incorrect boolean value, all rows are rejected and a null value is written to the target.
•When you read data from a JSON file that contains unicode characters, data inconsistencies might occur.
•If you select data types that Amazon S3 V2 Connector does not support, the mapping might either fail or reject the rows.
•When there are empty array elements such as "{"Elements":[]}" as part of the JSON sample data file for metadata resolution, the JSON parser fails with the following error:
[SDK_APP_COM_20000] error [Array must contain at least 1 element for projection].
Provide the entire schema as a sample data row without any empty array for metadata resolution. You can use the Import from schema file option to upload this file.
• When there are empty struct elements such as "{"Elements":[]}" as part of the JSON sample data file for metadata resolution, the JSON parser fails with the following error:
Struct must contain at least one key :: fields
Provide the entire schema as a sample data row without any empty structs for metadata resolution. You can use the Import from schema file option to upload this file.
•The JSON parser interprets the data type for an array element using the first value from the array. For example, if the first value is an integer and subsequently contains long values, the metadata is resolved as an integer. During runtime, the entire row is dropped because the long value cannot fit into the DTM buffer.
Provide the entire schema as a sample data row with the first array or struct elements containing the data types or sub-fields that are required for the metadata resolution. You can use the Import from schema file option to upload the file.
•There is a limit of 2 MB to the first row size used to interpret metadata from a JSON file. If the first row in the data file is larger than 2 MB, the JSON parser fails.
Decrease the sample data row size by removing the additional tags from the struct and array elements. The JSON parser only requires the first element within struct and array elements. Provide the data types or sub-fields that are required for metadata resolution in the first element. You can use the Import from schema file option to upload the file.
•The staging directory source and target advanced property is not applicable. However, you must specify a staging directory on Amazon S3 in advanced configurations. For more information, see Administrator.
•When you set the qualifier mode to Minimal and use an escape character, the characters are not escaped and quoted in the target. Set the qualifier mode to All.
•When you set the qualifier mode to All and do not specify a value for the qualifier, \0 (NUL) is considered as the qualifier.
•When you parameterize a flat source file, the FileName field appears in the source fields. As a workaround, add a rule in the incoming fields of the target to exclude the FileName field.
•If the table or schema names contain a backward slash (/), the mapping fails.
•If the JSON data that you read from a source fails to align with the source schema defined in the schema definition file, the data written to the target appears corrupted.
Mapping in advanced mode example
You work for one of the largest community college that maintains millions of records in their ongoing student database. The college has more than 10,000 faculty members teaching at 45 campuses and 700 locations across the globe. The college has a very large IT infrastructure and about 15 TB of information gets downloaded on daily basis from the Internet.
To avoid performance, scalability, and high cost challenges, the college plans to port its entire data from its operational data stores to Amazon S3 within a short span of time. Create a mapping that runs in advanced mode to achieve faster performance when you read all the purchase records from Amazon S3 and write the records to an Amazon Redshift target.
1In Data Integration, click New > Mappings > Mapping.
2In the Mapping Designer, click Switch to Advanced.
The following image shows the Switch to Advanced button in the Mapping Designer.
3In the Switch to Advanced dialog box, click Switch to Advanced.
The Mapping Designer updates the mapping canvas to display the transformations and functions that are available in advanced mode.
4Enter a name, location, and description for the mapping.
5On the Source transformation, specify a name and description in the general properties.
6On the Source tab, perform the following steps to provide the source details to read data from the source:
aIn the Connection field, select the required source connection.
bIn the Source Type field, select the type of the source.
cIn the Object field, select the required object.
dIn the Advanced Properties section, provide the appropriate values.
7On the Fields tab, map the source fields to the target fields.
8On the Target transformation, specify a name and description in the general properties.
9On the Target tab, perform the following steps to provide the target details to write data to the Amazon S3 target:
aIn the Connection field, select the Amazon S3 V2 target connection.
bIn the Target Type field, select the type of the target.
cIn the Object field, select the required object.
dIn the Operation field, select the required operation.
eIn the Advanced Properties section, provide appropriate values for the advanced target properties.
10Map the source and target.
11Click Save > Run to validate the mapping.
In Monitor, you can monitor the status of the logs after you run the task.