You can configure SQL ELT optimization from a mapping task to enhance the mapping performance.
After you create a mapping, add the mapping to a mapping task, and then configure SQL ELT optimization in the mapping task. You can select how Data Integration handles SQL ELT optimization in the SQL ELT Optimization Fallback Option menu on the Runtime Options tab.
The task converts the transformation logic to Amazon Redshift queries, sends the queries to Amazon Redshift, and the mapping logic is processed in the Amazon Redshift database.
If your mapping contains multiple pipelines, you can define the flow run order to load the targets from the pipelines in a particular order.
You cannot use SQL ELT optimization when you run a mapping in advanced mode.
SQL ELT optimization types
When you apply SQL ELT optimization , the task pushes transformation logic to the source or target database based on the optimization type you specify in the task properties. Data Integration translates the transformation logic into SQL queries or Amazon Redshift commands to the Amazon Redshift database. The database runs the SQL queries or Amazon Redshift commands to process the transformations.
You can configure the following types of SQL ELT optimization in a mapping:
None
The task does not push down the transformation logic to the Amazon Redshift database.
Full
The task pushes as much of the transformation logic as possible to process in the target database. When a transformation is not supported in the mapping, the task partially pushes down the mapping logic to the point where the transformation is supported for SQL ELT optimization.
Source
The task pushes down as much as the transformation logic as possible to process in the source database. When you select source SQL ELT optimization, the task pushes the transformation logic for all the configured transformations downstream in the mapping until it reaches the Target transformation. It leaves out the target transformation logic and pushes down the rest of the mapping logic to the source database.
Data Integration behavior with source and full SQL ELT optimization
When you select full or source SQL ELT optimization for a mapping that reads from or writes to Amazon Redshift, Data Integration analyzes the mapping from the source to the target or until it reaches a downstream transformation to determine whether to push down only partial or the entire mapping logic for processing to the database.
If all the transformations in the mapping are supported for SQL ELT optimization, the task pushes down the entire mapping logic to the database.
When a transformation is not supported in the mapping, the task partially pushes down the mapping logic to the point where the transformation is supported for SQL ELT optimization. Data Integration generates and executes a SELECT statement for the transformation logic that needs to be pushed down. Then, it reads the results of this SQL query and processes the remaining transformations in the mapping.
SQL ELT optimization scenarios
You can configure SQL ELT optimization using the Amazon Redshift V2 Connector or the Amazon Redshift ODBC Connector in mappings.
You can use configure SQL ELT optimization for the following scenarios when you use Amazon Redshift V2 Connector in mappings:
Note: You cannot configure SQL ELT optimization for a mapping task that is based on mapping in advanced mode.
Source and target endpoints
Supported SQL ELT optimization scenarios in mappings
SQL ELT optimization type
Amazon S3 source
Amazon Redshift target
Reads from Amazon S3 and writes to Amazon Redshift using the Amazon Redshift V2 connection.
When data is read from the source to the target, this connection uses the AWS commands.
Full
Amazon Redshift source
Amazon Redshift target
Reads from Amazon Redshift and writes to Amazon Redshift using the Amazon Redshift V2 connection.
The source and target can be in the same or different cluster regions.
Source, Full
Note: The Secure Agent pushes the entire mapping logic or only the partial mapping logic, as applicable, for processing to Amazon Redshift.
Amazon Redshift source
Reads from Amazon Redshift using the Amazon Redshift V2 connection and writes to other targets.
Source
Note: You can use the Secure Agent or the Hosted Agent to run mappings enabled with SQL ELT optimization.
Note: You can configure SQL ELT for a mapping that uses an Amazon Redshift ODBC connection to read from and write to Amazon Redshift. Informatica recommends that you use the Amazon Redshift V2 connection in mappings to configure SQL ELT optimization. If you cannot push down specific transformation logic using the Amazon Redshift V2 connection, you can explore configuring SQL ELT optimization using the Amazon Redshift ODBC connection.
You can configure the task to push transformations, functions, and operators to the database.
When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent operators and functions in the database. If there is no equivalent operator and function, the Secure Agent processes the transformation logic.
Functions with Amazon Redshift V2
When you use SQL ELT optimization, the Secure Agent converts the expression in the transformation by determining equivalent functions in the database. If there is no equivalent function, the Secure Agent processes the transformation logic.
The following table summarizes the availability of SQL ELT functions in an Amazon Redshift database. Columns marked with an X indicate that the function can be pushed to the Amazon Redshift database. Columns marked with a dash (-) symbol indicate that the function cannot be pushed to the database.
Function
SQL ELT
Function
SQL ELT
Function
SQL ELT
ABORT()
-
INITCAP()
X
REG_MATCH()
X
ABS()
X
INSTR()
X
REG_REPLACE
X
ADD_TO_DATE()
X
IS_DATE()
-
REPLACECHR()
X
AES_DECRYPT()
-
IS_NUMBER()
X
REPLACESTR()
X
AES_ENCRYPT()
-
IS_SPACES()
X
REVERSE()
-
ASCII()
X
ISNULL()
X
ROUND(DATE)
-
AVG()
X
LAST()
-
ROUND(NUMBER)
X
CEIL()
X
LAST_DAY()
X
RPAD()
X
CHOOSE()
-
LEAST()
-
RTRIM()
X
CHR()
X
LENGTH()
X
SET_DATE_PART()
-
CHRCODE()
-
LN()
X
SIGN()
X
COMPRESS()
-
LOG()
X
SIN()
X
CONCAT()
X
LOOKUP()
X
SINH()
-
COS()
X
LOWER()
X
SOUNDEX()
-
COSH()
-
LPAD()
X
SQRT()
X
COUNT()
X
LTRIM()
X
STDDEV()
X
CRC32()
-
MAKE_DATE_TIME()
-
SUBSTR()
X
CUME()
-
MAX()
X
SUM()
X
DATE_COMPARE()
X
MD5()
X
SYSTIMESTAMP()
X
DATE_DIFF()
X
MEDIAN()
X
TAN()
X
DECODE()
X
METAPHONE()
-
TANH()
-
DECODE_BASE64()
-
MIN()
X
TO_BIGINT
X
DECOMPRESS()
-
MOD()
X
TO_CHAR(DATE)
X
ENCODE_BASE64()
-
MOVINGAVG()
-
TO_CHAR(NUMBER)
X
EXP()
X
MOVINGSUM()
-
TO_DATE()
X
FIRST()
-
NPER()
-
TO_DECIMAL()
X
FLOOR()
X
PERCENTILE()
-
TO_FLOAT()
X
FV()
-
PMT()
-
TO_INTEGER()
X
GET_DATE_PART()
X
POWER()
X
TRUNC(DATE)
X
GREATEST()
-
PV()
-
TRUNC(NUMBER)
X
IIF()
X
RAND()
-
UPPER()
X
IN()
X
RATE()
-
VARIANCE()
X
INDEXOF()
-
REG_EXTRACT()
-
Operators with Amazon Redshift V2
When you use SQL ELT optimization, Data Integration converts the expression in the transformation by determining equivalent operators in the database. If there is no equivalent operator, the Data Integration processes the transformation logic.
These operator can be pushed to the Amazon Redshift database by using full SQL ELT optimization.
The following table lists the operators that you can push to an Amazon Redshift database:
Operator
Operator
+
=
-
>=
*
<=
/
!=
%
AND
||
OR
>
NOT
<
Variables with Amazon Redshift V2
You can use full SQL ELT to push the SYSDATE variable to the Amazon Redshift database in an expression transformation.
Note: When you use full SQL ELT to push the SYSDATE variable, it corresponds to the date/time data type of the Amazon Redshift cluster, while for mappings without SQL ELT optimization the SYSDATE variable corresponds to the date/time data type of the agent.
Configuring SQL ELT optimization
To optimize a mapping, add the mapping to a task, and then configure SQL ELT optimization in the mapping task.
1Create a mapping task.
2In the SQL ELT optimization section on the Runtime Options tab, set the SQL ELT optimization value to Full or To Source.
You can use Source SQL ELT optimization only when you read from an Amazon Redshift source.
Note: The optimization context type option is not applicable for an Amazon Redshift V2 mapping task.
3If full SQL ELT optimization is not available, select how Data Integration handles SQL ELT optimization in the SQL ELT optimization Fallback Option menu:
- Partial SQL ELT optimization. Default. Data Integration pushes as much transformation logic as possible to the source and target database. The task processes any transformation logic that it can't push to a database. You can use Partial SQL ELT optimization only when you read from and write to Amazon Redshift.
- Non SQL ELT optimization. The task runs without SQL ELT optimization.
- Fail Task. Data Integration fails the task.
Note: The fallback options are not applicable to mappings in advanced mode.
When you run the mapping task, the transformation logic is pushed to the Amazon Redshift database. To verify that the SQL ELT optimization has taken place, you can check the session log for the job. You can monitor the jobs that you initiated on the My Jobs page.
Transformations for Amazon Redshift V2 mappings
When you configure SQL ELT optimization, the Secure Agent tries to push the configured transformation to Amazon Redshift.
The following list summarizes the availability of transformations that you can push down to Amazon Redshift.
•Aggregator
•Expression
•Filter
•Joiner
•Lookup
•Sorter
•SQL
•Router
•Union
Aggregator transformation
You can configure full SQL ELT optimization to push an Aggregator transformation to process in Amazon Redshift.
Aggregate calculations
You can perform the following aggregate calculations:
•AVG
•COUNT
•MAX
•MIN
•MEDIAN
•SUM
•VARIANCE
Incoming fields
When you configure an Aggregator transformation and the incoming field is not used in an aggregate function or in a group by field, the output is not deterministic as the ANY_VALUE() function returns any value from the field.
Expression transformation
You can configure full SQL ELT optimization to push an Expression transformation to process in Amazon Redshift.
You can add an Expression transformation to each of the sources in the mapping, followed by a join downstream in the mapping. Additionally, you can add multiple Expression transformations that branch out from a transformation and then branch in into a transformation downstream in the mapping.
When you configure an Expression transformation, consider the following rules to include variables in the expression:
•You cannot use variables where you are using the value assigned while processing a previous row for calculations in the current row. if you do, the mapping runs without SQL ELT optimization.
•The variables can be nested, but you cannot refer to a variable before it is defined in the expression.
If the variables are not defined in that order, the mapping runs without SQL ELT optimization.
Here, AGEPLUS1 refers to AGEPLUS2 and remains unresolved.
Joiner transformation
You can configure a Joiner transformation between two Amazon S3 sources or two Amazon Redshift sources.
Lookup transformation
You can configure full SQL ELT optimization to push a Lookup transformation to process in Amazon Redshift. You can push both a connected and an unconnected lookup.
Consider the following rules when you configure a lookup in a mapping:
•You can configure a lookup to Amazon S3 only when the source used is Amazon S3.
•You can configure a lookup to Amazon Redshift when the source used is Amazon S3 or Amazon Redshift.
You can configure full SQL ELT optimization to push a Sorter transformation to process in Amazon Redshift.
When you configure a Sorter transformation, only the following sorter advanced properties are applicable:
•Distinct. You can remove the duplicate rows and create a distinct query.
•Null Treated Low. You can sort the columns in a query based on the columns having null values. You can sort the columns only in a query. The sort order is not honored on the Redshift target.
SQL transformation
You can use an SQL transformation to push Redshift supported scalar functions to Amazon Redshift.
When you configure SQL ELT optimization for a mapping, you can use SQL user-defined functions (UDFs) in a SQL transformation and run queries with the Amazon Redshift target endpoint.
You can use only the SELECT clause SQL statement to push down a function. The following snippet demonstrates the syntax of a simple select SQL query:
You can push a SQL transformation with the following restrictions:
•You can configure only a SQL query in the SQL transformation. You cannot enable a stored procedure when you configure SQL ELT optimization to Amazon Redshift.
•The SQL query must be a simple SELECT statement without 'FROM' and 'WHERE' arguments. The SQL transformation only supports functions with Simple SELECT statement.
•When you specify a SELECT query, you must also specify the column name and number of columns based on the functions. For example, when you specify the query select square(~AGE~), sqrt(~SNAME~), you must specify two output columns for AGE and SNAME functions each, otherwise the mapping fails.
•You can only use a SQL transformation when the SELECT statement is present in the query property. Even if an entire query containing the SELECT statement comes from a parameterized input port, the SQL ELT optimization fails.
•If any SQL error occurs, the error is added to the SQLError field by default. However, when mapping runs in PDO mode, the SQLError field will remain as Null.
•The NumRowsAffected field number records the number of rows affected while computing the output buffer. However, for SQL transformation, the NumRowsAffected will be 0 since the query runs for all records in a single go and not for each row.
•Amazon Redshift offers only passive behavior of SQL transformations where the support for dynamic queries are limited.
Router transformation
When you configure a Router transformation, you must connect or map only one output group to the target transformation.
Features
You can configure SQL ELT optimization for a mapping that reads from the following sources and writes to an Amazon Redshift target:
•Amazon Redshift V2 source
•Amazon S3 V2 source
When you configure a mapping, some parameters are not supported for a mapping enabled for SQL ELT optimization. You can refer to the list of parameters that each source supports.
Amazon Redshift V2 sources, targets, and lookups
You can configure an Amazon Redshift V2 connection in the source transformation with basic, IAM, and assume role authentication and enable SQL ELT optimization in the mapping task.
Connection properties
When you configure SQL ELT optimization, you can use the following advanced properties for an Amazon Redshift V2 source in the Amazon Redshift V2 connection:
•User name
•Password
•Access key and secret key
•JDBC URL
•Region
•Master symmetric key
•Customer master key ID
Source properties
When you configure SQL ELT optimization, you can use the following properties for an Amazon Redshift V2 source in the Source transformation:
•Source connection- Parameter, Allow parameter to be overridden at run time
•Source type - Single object, multiple objects, query, and parameter
•Filter
•Read mode - Only staging mode is applicable in mappings.
•S3 bucket name
•Enable compression
•Unload options
• Encryption type - SSE-S3, SSE-KMS, SSE-SMK
Note: You must specify the same master symmetric key in the target properties.
•Schema name
•Source table name
•Pre-SQL
•Post-SQL
•SQL query
•Select distinct
Target properties
When you configure SQL ELT optimization, you can use the following properties for an Amazon Redshift V2 target:
•Target connection- Parameter, Allow parameter to be overridden at run time
•Target type- Single object, parameter
•Allow parameter to be overridden at run time
•Target object- Existing target, Create new at runtime
•Operation- Insert, update, upsert, delete, or data driven
•Analyze target table
•Assume Role
•COPY command
- Region
- Truncatecolumn
- AWS_IAM_Role, only for Parquet and ORC files
•Pre-SQL
•Post-SQL
•Require null value for Char and Varchar
•Schema name
•Target table name
•Truncate target table before data upload
•Vacuum table
•Override Target Query
•Treat Source Rows As (Insert, Update, Upsert, Delete, and None). Select None when you use data driven operation.
•Max Errors per Upload Batch for INSERT
Note: If you configure source and target advanced properties that are not applicable, the mappings run in the Informatica runtime environment.
Lookup properties
When you configure SQL ELT optimization, you can use the following properties for Amazon Redshift V2 lookups:
•Lookup connection- Parameter, Allow parameter to be overridden at run time
•Source type - Single object, query, parameter
•Multiple matches - Report error. Applicable for unconnected lookups.
•Multiple matches- Return all rows. Applicable for connected lookups.
•S3 Bucket Name
•Enable Compression
•Unload Options
•Encryption Type
•Schema Name
•Source Table Name
•Pre-SQL
•Post-SQL
•SQL Query
•Select Distinct
Note: You can specify a parameter file in the mapping task to override the Amazon Redshift V2 source, lookup, and target connections and objects in a mapping.
Amazon S3 V2 source
You must configure an Amazon S3 V2 connection with basic, assume role, and IAM authentication when you enable SQL ELT optimization in a mapping task.
Source properties
When you configure SQL ELT optimization, you can use the following properties for an Amazon S3 V2 source in a mapping:
•Source connection
•Source type- Single object, parameter
•Parameter
•Encryption type- Client-side encryption (applicable to flat files), Server-side encryption, Server-side encryption with KMS
•Data compression- Gzip (applicable to flat and Parquet files), Deflate (applicable to Avro files), Snappy (applicable to Avro, Parquet, and ORC files), and Zlib (applicable to ORC files)
•File format
- Delimiter
- Qualifier
- Code Page
- Header Line Number
- First Data Row
- Source Type
- Folder Path
•File source type
•File name
•Format type
- Avro
- Parquet
- ORC
- JSON
- Delimited
SQL ELT optimization for multiple targets
You can add multiple Amazon Redshift V2 targets in a mapping.
When you configure a mapping to write to multiple Amazon Redshift V2 targets, you can further optimize the write operation when you configure full SQL ELT optimization.
To optimize, you can choose to configure an insert, update, upsert, delete, or data driven operation for multiple targets individually.
You can select the same or different Amazon Redshift V2 target table in multiple Target transformations and perform different operations for each of the Target transformations to run independent of each other.
If your mapping contains multiple pipelines, you can define the flow run order to load the targets from the pipelines in a particular order.
When you define the flow run order for multiple pipelines and also set -DEnableSingleCommit=true for a single pipeline, the -DEnableSingleCommit=true property in given precedence.
Rules and guidelines
Consider the following rules and guidelines when you optimize full SQL ELT optimization for multiple targets:
•When you run a mapping and use the same target table in all the targets, the row count is different from the row count of the mapping that runs without SQL ELT optimization. Applicable when you use an Amazon S3 source and perform an upsert operation on multiple Amazon Redshift V2 targets.
•When you run a mapping and use the same target table in all the targets, the Secure Agent writes a different set of data to the target than the data of the mapping that runs without SQL ELT optimization. Applicable when you use an Amazon S3 source, perform an insert operation on multiple Amazon Redshift V2 targets, and enable the Truncate Target Table Before Data Load advanced target property.
Determine the order of processing for multiple targets
When you configure a mapping to write to multiple targets in a single pipeline, with each target configured for any write operation, the target operations are processed in a predefined order of delete, update, upsert, and insert.
To process the target operations in the predefined order, you need to set certain properties in the Secure Agent and in the task properties.
Set -DEnableSingleCommit=true in the Secure Agent properties
Perform the following tasks to set the property for the Secure Agent:
1Open Administrator and select Runtime Environments.
2Select the Secure Agent for which you want to set the property.
3On the upper-right corner of the page, click Edit.
4In the System Configuration Details section, select the Type as DTM for the Data Integration Service.
5Edit the JVM options and set the property to -DEnableSingleCommit=true.
Set the EnableSingleCommit property in the task properties
Perform the following tasks to set the property in the task:
1On the Runtime Options tab in the mapping task properties, navigate to the Advanced Session Properties section.
2From the Session Property Name list, select Custom Properties, and set the Session Property Value to Yes.
Consider the following guidelines when the target operations are processed in a predefined order:
•After you run a mapping, the results are not displayed in the order of query execution. You can view the queries and the affected rows for each query in the session log.
•When you select the data driven target operation and set the Treat Source Rows As option to None for a target, consider the following guidelines:
- When you enter a single value, such as, DD_INSERT, DD_UPDATE, or DD_DELETE in the data driven condition, the target operation specified in the data driven condition is considered and the target operations are processed in a predefined order of delete, update, upsert, and insert.
- When you enter a Nested IFF statement or DD_REJECT in the data driven condition, the data driven target operation is considered and the target operations are not processed in a predefined order.
•The order of execution of post SQL and pre SQL queries after the -DEnableSingleCommit=true property is set differs from the order of execution before the property was set.
Clean stop a SQL ELT optimization job
When a task enabled for SQL ELT optimization is running, you can clean stop the job to terminate all the issued statements and processes spawned by the job.
Use the Clean Stop option on the My Jobs page in Data Integration and the All Jobs and Running Jobs page in Monitor.
See the following exceptions before you clean stop a SQL ELT optimization task:
•When you clean stop a task enabled for SQL ELT optimization and the target or source properties in the mapping contains SQL statements, all the SQL statements in progress are attempted to be reverted. All SQL statements that are committed cannot be reverted.
•When you run a mapping configured to create a new target at runtime and clean stop the job immediately, Data Integration creates the target table even if the job is terminated.
Rules and guidelines for SQL ELT optimization
Use the following rules and guidelines when configuring SQL ELT optimization for an Amazon Redshift database:
•To copy data from Amazon S3 to Amazon Redshift, you must use multiple data files by splitting large files. For more information, see the Amazon documentation.
•When you use an upper case in the column name of a JSON file and configure SQL ELT optimization, the mapping fails with the following error:
Error Reason: Invalid JSONPath format: Member is not an object.
•When you configure a mapping with SQL ELT optimization and define an ORC source with date/time values, the Secure Agent might not write a few values correctly to the Redshift target.
•When you define a Parquet source with decimal values having precision greater than 19,0 and use COPY command to write to a Redshift target, the mapping fails with SQL ELT optimization.
•When the data in delimited, Avro, or JSON files has values that are greater than the precision values, specify the attribute TRUNCATECOLUMNS=ON.
•For the ORC and Parquet file types, specify AWS_IAM_ROLE in the COPY command, to enable full SQL ELT optimization.
•You cannot use the assume role when the source has a Parquet or ORC file format and enable full SQL ELT optimization for the mapping task.
•You cannot enable full SQL ELT optimization for a mapping task when the task contains a mapping with a single transformation connected to multiple transformations downstream and vice-versa.
•You cannot enable full SQL ELT optimization for an Avro source with Date, Decimal, and Timestamp data types.
•When you read data from an Amazon S3 flat file with Shift-JIS encoding, write to an Amazon Redshift target, and enable full SQL ELT optimization, the mapping fails.
•When you configure a mapping that reads an Amazon S3 Avro source with column names in uppercase letters and uses an Amazon Redshift V2 connection to write data to an Amazon Redshift target, the COPY command writes blank rows to the target.
•When you configure an update operation for data that contains the Time column for a mapping enabled with full SQL ELT optimization and you override the target query using the value specified in the Override Target Query field from the advanced target properties, the task runs but data is not processed.
•If the Amazon S3 bucket region and the Amazon Redshift region are different, specify the REGION attribute in the COPY command to enable full SQL ELT optimization. Applies to delimited, Avro, and JSON files.
•When you perform an update, upsert, or delete operation, ensure that you specify a primary key or an update column for the target table.
•When you perform an upsert operation and set the JVM option -DuseDeleteForUpsert=true, the target statistics of processed rows shows an additional row as compared to the case when you do not set the JVM option. The number of rows in the target table are the same in both the cases.
•When you perform an update, upsert, or delete operation on an Amazon Redshift target and specify the update column as a Date/Time data type in a different time zone, data is not written to the target. However, the task runs successfully. The issue occurs when you use an Amazon S3 Parquet source.
•When you perform an update operation, you cannot map an id column of a target table in the field mapping.
•Even if you configure a condition using a DD_INSERT, DD_UPDATE, or DD_DELETE data driven operation on a target object, the log contains queries for the remaining data driven operations as well. The mapping runs successfully.
•When the data driven condition contains only the DD_REJECT operation, the mapping runs without generating a query.
•When you parameterize a transformation in a mapping enabled for full SQL ELT optimization and configure a parameter file to override the input parameters, the Secure Agent ignores the overridden parameters.
•If the Union transformation has inputs from a few source columns and you do not map the rest of the columns to the target, the columns that you do not map show null data in target.
•If the input fields contain a decimal field and you do not map the decimal field to the target in a Union transformation, the mapping that runs with full SQL ELT optimization and uses an Amazon Redshift V2 connection fails with an error.
•If the mapping enabled for SQL ELT optimization contains Union and Aggregator transformations, include the incoming field from the aggregate function or group by field in the field mapping, or remove the field from the aggregate function or group by field altogether. Otherwise, the mapping fails.
•A mapping fails if the column of the Date or Datetime data type is not of the YYYY-DD-MM format.
•You must map all the fields from the SQL query to the target for the mappings enabled for SQL ELT optimization to run successfully.
•If the custom query contains duplicate columns, the mapping runs without SQL ELT optimization.
•When you run a mapping enabled for SQL ELT optimization that uses an Amazon Redshift V2 connection to update data with the float or integer data types in an Amazon Redshift target, the mapping might fail.
•When you parameterize an Expression transformation in a mapping task and configure a parameter file to override the parameters, the Secure Agent does not read the overridden parameters. The issue occurs when you configure full SQL ELT optimization for a mapping that uses an Amazon Redshift V2 connection.
•A mapping fails at runtime when you specify an advanced native filter in the following format: schema_name.table_name.column_name. While defining the advance filter, the condition should not have table name qualified with a schema name.
•When you use the query source type to read from Amazon Redshift, you can choose to retain the field metadata and save the mapping. Even if you edit the query and run the mapping, the field metadata specified at design time is retained.
Rules for SQL ELT optimization in mappings that read from and write to Amazon Redshift
Consider the following guidelines when you configure full SQL ELT optimization for mappings that read from or write to Amazon Redshift:
•When the Amazon Redshift source and target are in the same cluster, the Secure Agent does not use the UNLOAD and COPY commands to stage the data on Amazon S3 and then write to Amazon Redshift. The Secure Agent directly performs an insert, update, upsert, delete, or data driven operation to write the data to Amazon Redshift and the performance of the task is improved.
•When you configure an update operation for data that contains the Time column for a mapping enabled with full SQL ELT optimization and you override the target query using the value specified in the Override Target Query field from the advanced target properties, the task runs but data is not processed.
• When you assume an IAM role and run mappings enabled with source or full SQL ELT optimization to read data from Redshift, the assume role is not honored.
To honor the assume role and run mappings successfully, you must specify the AWS_IAM_ROLE property and set its value in the Unload Options field of the Source transformation.
For example, AWS_IAM_ROLE=arn:aws:iam::0093:role/redshift_role
•If the source and target objects in a mapping point to Amazon S3 buckets in different regions, specify the REGION attribute in the COPY command to set the Amazon S3 bucket for target in the same region as Amazon S3 bucket for source.
• If the data that you read contains delimiters (|), quotes ("), and escape (\) characters and you set the Unload Options field in the Source transformation to ADDQUOTES;DELIMITER = \174;, the mapping fails.
To avoid this error, set the following properties in the Copy Options field in the Target transformation: ESCAPE;REMOVEQUOTES;CSV=OFF;QUOTE=OFF;DELIMITER = \174;
•When you read data from a single column table of Bigint, Datetime, and Boolean data type that contains NULL values, the null values are not written to the target.
To avoid this error, you must set IGNOREBLANKLINES in the Copy Options field of the Target transformation.
• When you configure a Source transformation in a mapping to read data and you set ESCAPE=OFF or ON and the ADDQUOTES in the Unload Options field to add escape characters and quotes in the data, null values are written to the target.
To avoid this, you must add QUOTE=\042 in the Copy Options field of the Target transformation and then run the mapping.
•If the data contains an escape (\) character when you read data, you must specify ESCAPE=OFF in the Unload Options field of the Source transformation. If you do not set the property, the escape character gets duplicated in the target. For example, the data output 12\12\2012 appears as 12\\12\\2012.
•When you read from and write data that contains float4 or double values, the float values show a slight change in the target.
•When you specify a user defined function in a transformation, the function name is not displayed correctly in the session log.
•If the mapping contains a Sequence Generator transformation, ensure that you map the NEXTVAL output field to the target. Do not map the CURRVAL output field to the target. Else, the mapping task does not partially push down the mapping logic to the point where the transformation is supported and runs without SQL ELT optimization.
•When you configure a mapping with multiple objects at source where the table name and column name are the same or the table name is substring of the column name, the mapping fails.
•You cannot perform data driven operations on target objects of the timestamptz data type.
•When you run a mapping in full SQL ELT optimization with multi-objects at source using advance filters, the UNLOAD and COPY commands do not work across different clusters and the mapping fails.
•When you define pre-SQL or post-SQL queries that modify the same object in a mapping where the source and target belong to different clusters, the task fails at runtime in full SQL ELT optimization.
•When a mapping contains multiple pipelines, the logic of only one pipeline is pushed to the source if an unsupported function is encountered. The other pipelines run without SQL ELT optimization.
Rules for source SQL ELT optimization in mappings that read from Amazon Redshift
Consider the following guidelines when you configure source SQL ELT optimization for mappings that read from Amazon Redshift:
•You can configure an SQL query or custom query in the advanced source property to push the mapping to Amazon Redshift.
•When you use a custom query as a source or an SQL override, the table name alias is not generated as expected. The name starts with "INF" in the SQL ELT optimization query.
•When you run a mapping by overriding the SQL query in full SQL ELT optimization, where the source column names are aliased, the mapping fails. Ensure that the alias names and the source column names are the same.
•You cannot push a Router transformation with multiple output groups to the Amazon Redshift source.
•When you use the query source type to read from Amazon Redshift, you can choose to retain the field metadata and save the mapping. Even if you edit the query and run the mapping, the field metadata specified at design time is retained.
•You cannot set the read mode to Direct in Source and Lookup transformations.
•COPY command options are not applicable for the timestamptz data type.
•When you use the [^\x00-\x7F]+ pattern in a REG_REPLACE() function and run a mapping with SQL ELT optimization, both the non-ASCII characters and spaces are replaced in the target. But, when you run the same mapping without SQL ELT optimization, only the non-ASCII characters are replaced and the spaces are retained in the target.
Rules and guidelines for adding multiple source objects
Consider the following rules and guidelines when you add multiple source objects:
•You must specify double quotes for the table name when you use a reserved word for the table name and the column name.
•You cannot use a self join when you add multiple source objects.
•When you use special characters in column names for an advanced relationship, the query formed is not correct and the mapping task fails.
•You can use the full outer-join condition only with the =, ,and AND operators.
•When you override the schema name and configure an advanced filter on a related source object, the Secure Agent applies the advanced filter only on the parent object and not on the related source object.
•When you select parent and child objects that have a primary key and foreign key relationship, and the foreign key of the related object is also a primary key in the table, the mapping task fails when you create a target.
• When you select the Multiple Objects source type, add a source object, for example, emp, and define a primary key and foreign key relationship on different columns, for example, emp.id and dept.d_id, the mapping fails with the following error:
[FATAL] Unload/Copy command failed with error: Invalid operation: column emp.d_id does not exist.
The Select Related Objects list shows the join condition for the dept related object asemp.d_id=dept.d_id, even though theemp table does not have d_id column.
•When you select the Multiple Objects source type, you cannot use a period(.) in the table name.
Rules and guidelines for functions
Use the following rules and guidelines when pushing functions to an Amazon Redshift database:
•To push TO_DATE() and TO_CHAR() to Amazon Redshift, you must define the string and format arguments.
•If you use the NS format as part of the ADD_TO_DATE() function, the agent does not push the function to Amazon Redshift.
•If you use any of the following formats as part of the TO_CHAR() and TO_DATE() functions, the agent does not push the function to Amazon Redshift:
- - NS
- - SSSS
- - SSSSS
- - RR
•To push TRUNC(DATE), GET_DATE_PART(), and DATE_DIFF() to Amazon Redshift, you must use the following formats:
- - D
- - DDD
- - HH24
- - MI
- - MM
- - MS
- - SS
- - US
- - YYYY
•You can use REPLACESTR() only to replace a single string value with a new string value.
•To push SUBSTR() to Amazon Redshift, you must define an integer value for the length argument.
•When you push MD5() to Amazon Redshift, the Secure Agent returns all the MD5 fields in the lower case. However, when you run a mapping without SQL ELT optimization, the Secure Agent returns the MD5 fields in the upper case.
•When you use the IS_NUMBER function in a transformation and the input data contains d or D, for example in formats such as +3.45d+32 or +3.45D-32 , the function returns False or 0.
•When you use the IN function in an expression, you must not include the CaseFlag attribute.
•When you use the IN function and the arguments contain date and timestamp values, you must include the TO_DATE function in the expression.
•When you use the REG_REPLACE function in an Expression transformation, ensure that the expressions used in the argument are supported by AWS.
•When you use the ISNULL() function in an Expression transformation, Data Integration parses the column values differently when the source and target in the mapping are from the same cluster environment or across different cluster environments. The UNLOAD and COPY commands parse NULL and empty string values differently when the source and target are in different clusters.
You can enable the use TempTableForRedshiftAPDO property when you push down functions from mappings where the source and target are across different clusters. When you set the property and run the mapping, the mapping considers NULL and empty string values in the columns as NULL only. However, when the source and target is in the same cluster, Data Integration considers NULL as NULL and empty strings as empty strings in the columns.
Mappings without SQL ELT optimization parses the ISNULL() function differently. When you select the Treat NULL Value as NULL option, the mapping considers NULL and empty string values as NULL. When you don't select the Treat NULL Value as NULL option, the mapping considers NULL and empty strings values as empty strings.
Rules and guidelines for aggregate functions
Use the following rules and guidelines when pushing aggregate functions to an Amazon Redshift database:
•You cannot use conditional clauses in the aggregate expression functions.
•You can use non-aggregate functions in aggregate expressions.
•You cannot use nested aggregate functions directly. For example, you cannot specify SUM(AVG(col1)) in the aggregate expression function columns. Use nested aggregations by adding multiple aggregate transformations in the mapping.
•You can parameterize the GROUP BY fields and aggregate functions in a mapping task.
•When you use STDDEV or VARIANCE functions for an expression that consists of a single value and run the mapping, the result of the function is NULL. When you run the mapping without pushing it down, the result of the function is 0.
•During the field mapping, you must map only the fields that you added in the GROUP BY port. Else, the mapping runs in the non-PDO mode with an error message.
•When you do not specify a port from an Amazon S3 flat file source in the GROUP BY clause of the aggregate function and map the port to a Redshift target, the mapping task runs successfully without SQL ELT optimization with the following message:
Pushdown optimization to the source stops before transformation [Aggregator] because [f_varchar] is a non-group by passthrough port, which is not allowed.
The mapping fails when you push down a mapping with an Amazon S3 Avro or Parquet source.
•A mapping enabled for full or source SQL ELT optimization which contains a table column with a regular expression argument in the REG_MATCH() function fails. However, the mapping runs successfully when you do not enable SQL ELT optimization.
•When you configure the REG_MATCH() expression:
- Use double-slashes for parsing expressions in full SQL ELT optimization. For example, use REGMATCH(searchstring,'\\D W').
- Use single-slash to parse expressions without SQL ELT optimization. For example, use REGMATCH(searchstring,'\D\W').
•When you pass timestamp and date data values through an ASCII() function, the ASCII() function parses the values differently with and without SQL ELT optimization.
In a mapping without SQL ELT optimization, the ASCII() function returns the first digit of the day, while in full and source SQL ELT optimization, the function returns the first digit of the year.
For example, to read a date time MM/DD/YYYY HH24:MI:SS.US for a session, the ASCII() value returns the first character of MM without SQL ELT optimization and the first character of YYYY with SQL ELT optimization.
As a workaround, you can set the DateTime Format String property value to yyyy-mm-dd in the Advanced Session Properties when you enable mappings without SQL ELT optimization.
Rules and guidelines for Router transformation
Consider the following rules and guidelines for a Router transformation:
•For a source field, do not edit the metadata for an id column from string to int.
•When the source has a field of the binary data type, the mapping fails with the following error:
[Full Pushdown Optimization Failed Due To PARQUET : Not Supported data type binary.]
•When you create conditions for router groups and connect the router groups to target tables, and one of the condition fails, the mapping does not process the rest of the conditions as well. The mapping runs without SQL ELT optimization.
Rules and guidelines for Lookup transformation
When you configure a Lookup transformation, adhere to the following guidelines:
•You cannot configure dynamic lookup cache and persistent cache.
•You cannot configure the advanced lookup properties for connected and unconnected lookups.
•Even if you select an advanced lookup property for a Lookup transformation, the Secure Agent ignores the advanced property. An error message does not appear in the logs.
•If you add the ADDQUOTE option in the Unload command for an Amazon Redshift lookup, you must also add the QUOTE option as QUOTE=" in the Copy command for the Amazon Redshift target.
•If the source and target in a mapping point to Amazon S3 buckets in different regions, specify the REGION attribute in the COPY command to set the Amazon S3 bucket for target in the same region as Amazon S3 bucket for source.
•If you configure the CSE-SMK encryption type for an Amazon Redshift source or lookup object, ensure that you specify a master symmetric key in the target properties.
•You can only specify the = operator in a completely parameterized lookup condition. If you specify the operators such as <, <=, >, >=, and != in a complex condition, the mapping fails.
•When you configure a lookup for an Amazon S3 source, remove the FileName field from both the Amazon S3 source and lookup object. The FileName field is not applicable.
Connected lookups
Consider the following rules and guidelines for a connected Lookup transformation:
•You must select the Multiple Matches property value as Return all rows in the connected lookup properties for SQL ELT optimization to work.
•When an Amazon S3 location contains source files having the same prefix, for example, abc.txt.1 and abc.txt.2, the COPY command tries to load both the files to an Amazon Redshift target and the mapping might fail.
Unconnected lookups
Consider the following rules and guidelines for an unconnected Lookup transformation:
•You must select the Multiple Matches property value as Report error in the unconnected lookup properties for SQL ELT optimization to work. However, when multiple matches are encountered in the lookup table, the Secure Agent does not report an error. Hence, ensure that multiple matches are not encountered in the lookup table.
•If you select the Return Any property, the mapping task runs without SQL ELT optimization.
•You cannot use operators for unconnected lookups in a lookup expression. Use an additional expression transformation to include the operator.
•When you use the same column names in the source and unconnected lookup tables, and enable full SQL ELT optimization, the mapping fails with a duplicate column error. To run the mapping successfully, rename the incoming fields using a prefix or postfix and use the fields in the lookup expression.
Rules and guidelines for SQL transformation
Consider the following rules and guidelines for a SQL transformation:
•When you configure a SQL transformation in a mapping enabled for SQL ELT optimization and use the DATE_PART function in a query, the task fails. You can only configure a SQL transformation using the simple SELECT statement for any Redshift supported scalar function with the prescribed format.
•When you configure a SQL transformation in a mapping with user-defined functions that have date, decimal, or smallint data types, the mapping fails. As a workaround, configure user-defined functions in Redshift only with the corresponding transformation data types supported in Amazon Redshift.
•When you run a mapping with SQL transformation, and define user-defined functions (UDFs) with Unicode or special characters, enclose the schema and UDF in double quotes.
•When you run a mapping with SQL transformation having multiple select queries, the mapping fails. Amazon Redshift only supports SQL transformations with a single simple select query.
•When you use user-defined functions and scalar functions together in the same query to partially push down a mapping logic, the mapping fails if the target data type does not match with the source type. As a workaround, you can enable full SQL ELT optimization or define only scalar functions in the query.