You can use an Amazon Redshift V2 object as a source in a mapping. You can use tables, views, or materialized views as Amazon Redshift V2 sources.
When you configure the advanced source properties, configure properties specific to Amazon Redshift V2. You can encrypt data, retain the staging files on Amazon S3, and securely unload the results of a query to files on Amazon Redshift.
Read from Amazon Redshift with or without staging
When you configure a read operation, you have the option to read data from the Amazon Redshift source with or without staging data on the S3 bucket.
You can set the read mode to direct or staging in the Source and Lookup transformations. You cannot configure direct mode in a mapping in advanced mode.
Staging read mode
If you choose to set the read mode to staging, the agent creates a staging file in the directory that you specify in the mapping properties. You need to specify a staging directory on the Secure Agent machine that has sufficient disk space corresponding to the volume of data that you want to process.
When you run the mapping, the UNLOAD command loads the data from the Amazon Redshift table to the Amazon S3 bucket, and then the agent loads the data from S3 to the staging directory on the agent machine. After the agent completes the read operation, the staging files are deleted.
Direct read mode
When you set the read mode to direct, Data Integration directly accesses and retrieves the data from Amazon Redshift without staging the data.
When you configure the direct read mode, you can also specify the fetch size to determine the number of rows of data to retrieve from the Redshift source at a given time. The advanced source attributes applicable for S3 staging in the mapping properties and the staging optimization property in the agent properties are not required for the direct mode.
To protect data, you can encrypt the data when you read the data from a source.
Sslect the type of the encryption in the Encryption Type field under the Amazon Redshift V2 advanced source properties on the Runtime Options page. The Unload command creates staging files on Amazon S3 for server-side encryption with the AWS-managed encryption keys and AWS Key Management Service key.
Use the customer master key ID generated by AWS Key Management Service in the Unload command for server-side encryption.
You can select the following types of encryption:
None
The data is not encrypted.
SSE-S3
If you select the SSE-S3 encryption type, the Unload command creates the staging files in the Amazon S3 bucket and Amazon S3 encrypts the file using AWS-managed encryption keys for server-side encryption.
SSE-KMS
If you select the SSE-KMS encryption type, the Unload command creates the staging files in the Amazon S3 bucket and Amazon S3 encrypts the file using AWS KMS-managed customer master key or Amazon Resource Name (ARN) for server-side encryption.
The AWS KMS-managed customer master key or ARN that you specify in the connection property must belong to the same region where Amazon S3 is hosted.
For example, if Amazon S3 is hosted in the US West (Oregon) region, you must use the AWS KMS-managed customer master key enabled in the same region when you select the SSE-KMS encryption type.
CSE-SMK
If you select the CSE-SMK encryption type, Amazon Redshift uploads the data to the Amazon S3 server by using the master symmetric key and then loads the data by using the copy command with the encrypted option and a private encryption key for additional security.
You must provide a master symmetric key ID in the connection property to enable CSE-SMK encryption type.
Unload command
You can use the Unload command to extract data from Amazon Redshift and create staging files on Amazon S3. The Unload command uses a secure connection to load data into one or more files on Amazon S3.
You can specify the Unload command options directly in the Unload Options field or specify the path to a text file that contains all the command options.
Enter the options in uppercase and use a semicolon to separate the options.
For example, DELIMITER = \036;ESCAPE = OFF;NULL=text;PARALLEL = ON;AWS_IAM_ROLE=arn;aws;iam;;<account ID>;role/<role-name>
Note: You cannot specify the path to a text file that contains the Unload command options in a mapping in advanced mode.
If you run the Unload command as a pre-SQL or post-SQL command, specify the ALLOWOVERWRITE option to overwrite the existing objects.
By default, the UNLOAD property field is empty.
Unload command options
The Unload command options extract data from Amazon Redshift and load data to staging files on Amazon S3 in a particular format. You can delimit the data with a particular character or load data to multiple files in parallel.
To add options to the Unload command, use the Unload Options option.
You can set the following options:
DELIMITER
A single ASCII character to separate fields in the input file. You can use characters such as pipe (|), tilde (~), or a tab (\t). The delimiter you specify must not be a part of the data. If the delimiter is a part of data, use ESCAPE to read the delimiter character as a regular character.
The default value is \036 which represents the octal representation of the non-printable known as the record separator.
It is recommended to use octal representation of non-printable characters as DELIMITER.
ESCAPE
You can add an escape character for CHAR and VARCHAR columns in delimited unload files before the delimiter character is specified for the unloaded data. By default, the escape option is ON. To disable the escape option, specify OFF as the value of the escape option. For example, ESCAPE=OFF.
Note: If you enable the optimization property that you set for staging data, you must use ESCAPE=OFF in Unload command options. If you do not specify ESCAPE=OFF, the mapping runs without optimizing the staging performance.
REGION
You can use the REGION attribute when the Amazon S3 staging bucket is not in the same region as the cluster region. If Amazon Redshift resides in the US East (N. Virginia) region, you can use an Amazon S3 bucket residing in the Asia Pacific (Mumbai) region to create staging files. For example, REGION = ap-south-1.
PARALLEL
The Unload command writes data in parallel to multiple files, according to the number of slices in the cluster. Default is on. If you turn the Parallel option off, the Unload command writes data serially. The maximum size of a data file is 6.5 GB.
NULL
You can use NULL Unload command option to replace the null values in an Amazon Redshift source table with the string that you specify using the NULL Unload command option.
Enter the value of the NULL Unload command option in the following format: NULL=text. Do not add spaces when you enter the string value.
Note: The NULL Unload command option does not apply to mappings in advanced mode.
For more information about the NULL Unload command, see the AWS documentation.
AWS_IAM_ROLE
Specify the Amazon Redshift Role Resource Name (ARN) to run the mapping on Secure Agent installed on an Amazon EC2 system in the following format: AWS_IAM_ROLE=arn:aws:iam::<account ID>:role/<role-name>
For example: arn:aws:iam::123123456789:role/redshift_read
ADDQUOTES
ADDQUOTES is implemented with the UNLOAD command by default. The Unload command adds quotation marks to each data field. With added quotation marks, the UNLOAD command can read data values that contain the delimiter. If double quote (") is a part of data, use ESCAPE to read the double quote as a regular character.
Source partitioning
When you read data from Amazon Redshift, you can configure partitioning to optimize the mapping performance at run time. The partition type controls how the Secure Agent distributes data among partitions at partition points. You cannot configure source partitioning when you run a mapping in advanced mode.
You can define the partition type as key range partitioning. Configure key range partitioning to partition Amazon Redshift data based on the value of a fields or set of fields. With key range partitioning, the Secure Agent distributes rows of source data based the fields that you define as partition keys. The Secure Agent compares the field value to the range values for each partition and sends rows to the appropriate partition.
Use key range partitioning for columns that have an even distribution of data values. Otherwise, the partitions might have unequal size. For example, a column might have 10 rows between key values 1 and 1000 and the column might have 999 rows between key values 1001 and 2000.
With key range partitioning, a query for one partition might return rows sooner than another partition. Or, one partition can return rows while the other partitions are not returning rows. This situation occurs when the rows in the table are in a similar order as the key range. One query might be reading and returning rows while the other queries are reading and filtering the same rows.
Note: You can configure a partition key only of the Integer and String data types.
When you configure more than two partitions in a mapping, the Secure Agent ignore the values that you specify in the start range for the first partition and end range for the last partition. The Secure Agent uses the start range value for the first partition as less than 10 and the end range value for the last partition as greater than the value you specify for the last partition.
For example, if you configure three partitions in a mapping and specify the start range value for the first partition as 5 and the end range value for the last partition as 90, the mapping runs successfully. However, the Secure Agent ignores the values that you specify and uses the start range value for the first partition as less than 10 and the end range value for the last partition as greater than 90.
Note: When you configure key range partitioning for the source and create a target, the mapping task fails.
You cannot configure filters when you use source partitioning.