Custom Queries
A custom SQL query is a SELECT statement that overrides the default SQL query in a customized data object.
A custom query overrides the default SQL query that the Data Integration Service uses to read data from a relational source. The custom query also overrides the simple query settings that you define when you enter a source filter, use sorted ports, enter a user-defined join, or select distinct ports.
You can create a custom query to perform SQL operations that are valid in the database language but not available in the transformation language. When you define a custom query in a customized data object, you can reuse the object in multiple mappings or profiles.
Use the following guidelines when you create a custom query in a customized data object:
- •In the SELECT statement, list the column names in the order in which they appear in the source transformation.
- •Enclose all database reserved words in quotes.
- •Add an escape character before a dollar sign ($). If the $ has preceding forward slash (\) characters, add an escape character (\) to both the forward slash and dollar sign characters, for example, enter $ as \$ and \$ as \\\$.
If you use a customized data object to perform a self-join, you must enter a custom SQL query that includes the self-join. You can use a customized data object with a custom query as a source in a mapping. The source database executes the query before it passes data to the Data Integration Service. You can create a custom query to add sources to an empty customized data object. You can also use a custom query to override the default SQL query.
Custom Query Optimization
The Data Integration Service can push a custom query to run in a relational data object for increased performance. Choose to push a custom query if the query forms a valid subquery for the database.
When you use a custom query to read data in a relational data object, the Data Integration Service can optimize the query by pushing the query to run in the database. The Data Integration Service can push the custom query if the query forms a valid subquery for the database. If the SQL syntax for the custom query is not valid in a subquery for the database, the resulting query fails to run.
If you push a custom query to a relational database other than IBM DB2, you must specify an alias for each expression in the select list that is not a column reference. The aliases allow the Data Integration Service to refer to the expressions in the select list.
See the database documentation for information about valid SQL syntax for aliases and subqueries.
Default Query
The Data Integration Service generates a default SQL query that it uses to read data from relational sources. You can override the default query in a customized data object or an instance of a relational data object.
You can override the default query through the simple or advanced query. Use the simple query to select distinct values, enter a source filter, sort ports, or enter a user-defined join. Use the advanced query to create a custom SQL query for reading data from the sources. The custom query overrides the default and simple queries.
If any table name or column name contains a database reserved word, you can create and maintain a reserved words file, reswords.txt. Create the reswords.txt file on any machine the Data Integration Service can access.
When the Data Integration Service runs a mapping, it searches for the reswords.txt file. If the file exists, the Data Integration Service places quotes around matching reserved words when it executes SQL against the database. If you override the default query, you must enclose any database reserved words in quotes.
When the Data Integration Service generates the default query, it delimits table and field names containing the following characters with double quotes:
/ + - = ~ ` ! % ^ & * ( ) [ ] { } ' ; ? , < > \ | <space>
Creating a Reserved Words File
Create a reserved words file if any table name or column name in the customized data object contains a database reserved word.
You must have administrator privileges to configure the Data Integration Service to use the reserved words file.
1. Create a file called "reswords.txt."
2. Create a section for each database by entering the database name within square brackets, for example, [Oracle].
3. Add the reserved words to the file below the database name.
For example:
[Oracle]
OPTION
START
where
number
[SQL Server]
CURRENT
where
number
Entries are not case sensitive.
4. Save the reswords.txt file.
5. In Informatica Administrator, select the Data Integration Service.
6. Edit the custom properties.
7. Add the following custom property:
Name | Value |
---|
Reserved Words File | <path>\reswords.txt |
8. Restart the Data Integration Service.
Hints
You can add hints to the source SQL query to pass instructions to a database optimizer. The optimizer uses the hints to choose a query run plan to access the source.
The Hints field appears in the Query view of a relational data object instance or a customized data object. The source database must be Oracle, Sybase, IBM DB2, or Microsoft SQL Server. The Hints field does not appear for other database types.
When the Data Integration Service generates the source query, it adds the SQL hints to the query exactly as you enter them in the Developer tool. The Data Integration Service does not parse the hints. When you run the mapping that contains the source, the mapping log shows the query with the hints in the query.
The Data Integration Service inserts the SQL hints in a position in the query depending on the database type. Refer to your database documentation for information about the syntax for hints.
Oracle
The Data Integration Service add hints directly after the SELECT/UPDATE/INSERT/DELETE keyword.
SELECT /*+ <hints> */ FROM …
The '+' indicates the start of hints.
The hints are contained in a comment (/* ... */ or --... until end of line)
Sybase
The Data Integration Service adds hints after the query. Configure a plan name in the hint.
SELECT … PLAN <plan>
select avg(price) from titles plan "(scalar_agg (i_scan type_price_ix titles )"
IBM DB2
You can enter the optimize-for clause as a hint. The Data Integration Service adds the clause at the end of the query.
SELECT … OPTIMIZE FOR <n> ROWS
The optimize-for clause tells the database optimizer how many rows the query might process. The clause does not limit the number of rows. If the database processes more than <n> rows, then performance might decrease.
Microsoft SQL Server
The Data Integration Service adds hints at the end of the query as part of an OPTION clause.
SELECT … OPTION ( <query_hints> )
Hints Rules and Guidelines
Use the following rules and guidelines when you configure hints for SQL queries:
- •If you enable pushdown optimization or if you use a semi-join in a relational data object, then the original source query changes. The Data Integration Service does not apply hints to the modified query.
- •You can combine hints with join and filter overrides, but if you configure a SQL override, the SQL override takes precedence and the Data Integration Service does not apply the other overrides.
- •The Query view shows a simple view or an advanced view. If you enter a hint with a filter, sort, or join override on the simple view, and you the Developer tool shows the full query override on the advanced view.
Creating Hints
Create hints to send instructions to the database optimizer to determine a query plan.
1. Open the customized data object or the relational data object instance.
2. Select the Read view.
3. Select the Output transformation.
4. Select the Query properties.
5. Select the simple query.
6. Click Edit next to the Hints field.
The Hints dialog box appears.
7. Enter the hint in the SQL Query field.
The Developer tool does not validate the hint.
8. Click OK.
9. Save the data object.
Select Distinct
You can select unique values from sources in a customized data object or a relational data object instance with the select distinct option. When you enable select distinct, the Data Integration Service adds a SELECT DISTINCT statement to the default SQL query.
Use the select distinct option to filter source data. For example, you might use the select distinct option to extract unique customer IDs from a table that lists total sales. When you use the relational data object in a mapping, the Data Integration Service filters data earlier in the data flow, which can increase performance.
Using Select Distinct
Select unique values from a relational source with the Select Distinct property.
1. Open the customized data object or relational data object instance.
2. Select the Read view.
3. Select the Output transformation.
4. Select the Query properties.
5. Select the simple query.
6. Enable the Select Distinct option.
7. Save the customized data object.
Filters
You can enter a filter value in a custom query. The filter becomes the WHERE clause in the query SELECT statement. Use a filter to reduce the number of rows that the Data Integration Service reads from the source table.
Entering a Source Filter
Enter a source filter to reduce the number of rows the Data Integration Service reads from the relational source.
1. Open the customized data object or the relational data object instance.
2. Select the Read view.
3. Select the Output transformation.
4. Select the Query properties.
5. Select the simple query.
6. Click Edit next to the Filter field.
The SQL Query dialog box appears.
7. Enter the filter condition in the SQL Query field.
You can select column names from the Columns list.
8. Click OK.
9. Click Validate to validate the filter condition.
10. Save the data object.
Sorted Ports
You can sort rows in the default query for a customized data object or a relational data object instance. Select the ports to sort by. The Data Integration Service adds the ports to the ORDER BY clause in the default query.
You might sort the source rows to increase performance when you include the following transformations in a mapping:
- •Aggregator. When you configure an Aggregator transformation for sorted input, you can send sorted data by using sorted ports. The group by ports in the Aggregator transformation must match the order of the sorted ports in the customized data object.
- •Joiner. When you configure a Joiner transformation for sorted input, you can send sorted data by using sorted ports. Configure the order of the sorted ports the same in each customized data object.
Note: You can also use the Sorter transformation to sort relational and flat file data before Aggregator and Joiner transformations.
Sorting Column Data
Use sorted ports to sort column data in a customized data object or relational data object instance. When you use the data object as a read transformation in a mapping or mapplet, you can pass the sorted data to transformations downstream from the read transformation.
1. Open the customized data object or relational data object instance.
2. Select the Read view.
The following figure shows the Read view of a customized data object that is open in the editor:
3. Select the Output transformation.
4. Select the Query properties.
5. Select the simple query.
6. Click Edit next to the Sort field.
The Sort dialog box appears.
7. To specify a column as a sorted port, click the New button.
8. Select the column and sort type, either ascending or descending.
The Developer tool sorts the columns in the order in which they appear in the Sort dialog box.
10. Click OK.
In the Query properties, the Developer tool displays the sort columns in the Sort field.
11. Click Validate to validate the sort syntax.
12. Save the data object.
User-Defined Joins
You can configure a user-defined join in a customized data object or relational data object instance. A user-defined join defines the condition to join data from multiple sources in the same data object.
When you add a user-defined join to a customized data object or a relational data object instance, you can use the data object as a read transformation in a mapping. The source database performs the join before it passes data to the Data Integration Service. Mapping performance increases when the source tables are indexed.
Creat a user-defined join to join data from related sources. The user-defined join overrides the default inner join that the Data Integration creates based on the related keys in each source. When you enter a user-defined join, enter the contents of the WHERE clause that specifies the join condition. If the user-defined join performs an outer join, the Data Integration Service might insert the join syntax in the WHERE clause or the FROM clause, based on the database syntax.
You might need to enter a user-defined join in the following circumstances:
- •Columns do not have a primary key-foreign key relationship.
- •The datatypes of columns used for the join do not match.
- •You want to specify a different type of join, such as an outer join.
Use the following guidelines when you enter a user-defined join in a customized data object or relational data object instance:
- •Do not include the WHERE keyword in the user-defined join.
- •Enclose all database reserved words in quotes.
- •If you use Informatica join syntax, and Enable quotes in SQL is enabled for the connection, you must enter quotes around the table names and the column names if you enter them manually. If you select tables and columns when you enter the user-defined join, the Developer tool places quotes around the table names and the column names.
User-defined joins join data from related resources in a database. To join heterogeneous sources, use a Joiner transformation in a mapping that reads data from the sources. To perform a self-join, you must enter a custom SQL query that includes the self-join.
Entering a User-Defined Join
Configure a user-defined join in a customized data object or relational data object to define the join condition for the data object sources.
1. Open the customized data object or relational data object instance.
2. Select the Read view.
3. Select the Output transformation.
4. Select the Query properties.
5. Select the simple query.
6. Click Edit next to the Join field.
The SQL Query dialog box appears.
7. Enter the user-defined join in the SQL Query field.
You can select column names from the Columns list.
8. Click OK.
9. Click Validate to validate the user-defined join.
10. Save the data object.
Outer Join Support
You can use a customized data object to perform an outer join of two sources in the same database. When the Data Integration Service performs an outer join, it returns all rows from one source resource and rows from the second source resource that match the join condition.
Use an outer join when you want to join two resources and return all rows from one of the resources. For example, you might perform an outer join when you want to join a table of registered customers with a monthly purchases table to determine registered customer activity. You can join the registered customer table with the monthly purchases table and return all rows in the registered customer table, including customers who did not make purchases in the last month. If you perform a normal join, the Data Integration Service returns only registered customers who made purchases during the month, and only purchases made by registered customers.
With an outer join, you can generate the same results as a master outer or detail outer join in the Joiner transformation. However, when you use an outer join, you reduce the number of rows in the data flow which can increase performance.
You can enter two kinds of outer joins:
- •Left. The Data Integration Service returns all rows for the resource to the left of the join syntax and the rows from both resources that meet the join condition.
- •Right. The Data Integration Service returns all rows for the resource to the right of the join syntax and the rows from both resources that meet the join condition.
Note: Use outer joins in nested query statements when you override the default query.
You can enter an outer join in a user-defined join or in a custom SQL query.
Informatica Join Syntax
When you enter join syntax, use the Informatica or database-specific join syntax. When you use the Informatica join syntax, the Data Integration Service translates the syntax and passes it to the source database during a mapping run.
Note: Always use database-specific syntax for join conditions.
When you use Informatica join syntax, enclose the entire join statement in braces ({Informatica syntax}). When you use database syntax, enter syntax supported by the source database without braces.
When you use Informatica join syntax, use table names to prefix column names. For example, if you have a column named FIRST_NAME in the REG_CUSTOMER table, enter “REG_CUSTOMER.FIRST_NAME” in the join syntax. Also, when you use an alias for a table name, use the alias within the Informatica join syntax to ensure the Data Integration Service recognizes the alias.
You can combine left outer or right outer joins with normal joins in a single data object. You cannot combine left and right outer joins. Use multiple normal joins and multiple left outer joins. Some databases limit you to using one right outer join.
When you combine joins, enter the normal joins first.
Normal Join Syntax
You can create a normal join using the join condition in a customized data object or relational data object instance.
When you create an outer join, you must override the default join. As a result, you must include the normal join in the join override. When you include a normal join in the join override, list the normal join before outer joins. You can enter multiple normal joins in the join override.
To create a normal join, use the following syntax:
{ source1 INNER JOIN source2 on join_condition }
The following table displays the syntax for normal joins in a join override:
Syntax | Description |
---|
source1 | Source resource name. The Data Integration Service returns rows from this resource that match the join condition. |
source2 | Source resource name. The Data Integration Service returns rows from this resource that match the join condition. |
join_condition | Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator. |
For example, you have a REG_CUSTOMER table with data for registered customers:
CUST_ID | FIRST_NAME | LAST_NAME |
---|
00001 | Marvin | Chi |
00002 | Dinah | Jones |
00003 | John | Bowden |
00004 | J. | Marks |
The PURCHASES table, refreshed monthly, contains the following data:
TRANSACTION_NO | CUST_ID | DATE | AMOUNT |
---|
06-2000-0001 | 00002 | 6/3/2000 | 55.79 |
06-2000-0002 | 00002 | 6/10/2000 | 104.45 |
06-2000-0003 | 00001 | 6/10/2000 | 255.56 |
06-2000-0004 | 00004 | 6/15/2000 | 534.95 |
06-2000-0005 | 00002 | 6/21/2000 | 98.65 |
06-2000-0006 | NULL | 6/23/2000 | 155.65 |
06-2000-0007 | NULL | 6/24/2000 | 325.45 |
To return rows displaying customer names for each transaction in the month of June, use the following syntax:
{ REG_CUSTOMER INNER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Data Integration Service returns the following data:
CUST_ID | DATE | AMOUNT | FIRST_NAME | LAST_NAME |
---|
00002 | 6/3/2000 | 55.79 | Dinah | Jones |
00002 | 6/10/2000 | 104.45 | Dinah | Jones |
00001 | 6/10/2000 | 255.56 | Marvin | Chi |
00004 | 6/15/2000 | 534.95 | J. | Marks |
00002 | 6/21/2000 | 98.65 | Dinah | Jones |
The Data Integration Service returns rows with matching customer IDs. It does not include customers who made no purchases in June. It also does not include purchases made by non-registered customers.
Left Outer Join Syntax
You can create a left outer join with a join override. You can enter multiple left outer joins in a single join override. When using left outer joins with other joins, list all left outer joins together, after any normal joins in the statement.
To create a left outer join, use the following syntax:
{ source1 LEFT OUTER JOIN source2 on join_condition }
The following tables displays syntax for left outer joins in a join override:
Syntax | Description |
---|
source1 | Source resource name. With a left outer join, the Data Integration Service returns all rows in this resource. |
source2 | Source resource name. The Data Integration Service returns rows from this resource that match the join condition. |
join_condition | Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator. |
For example, using the same REG_CUSTOMER and PURCHASES tables described in
Normal Join Syntax, you can determine how many customers bought something in June with the following join override:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Data Integration Service returns the following data:
CUST_ID | FIRST_NAME | LAST_NAME | DATE | AMOUNT |
---|
00001 | Marvin | Chi | 6/10/2000 | 255.56 |
00002 | Dinah | Jones | 6/3/2000 | 55.79 |
00003 | John | Bowden | NULL | NULL |
00004 | J. | Marks | 6/15/2000 | 534.95 |
00002 | Dinah | Jones | 6/10/2000 | 104.45 |
00002 | Dinah | Jones | 6/21/2000 | 98.65 |
The Data Integration Service returns all registered customers in the REG_CUSTOMERS table, using null values for the customer who made no purchases in June. It does not include purchases made by non-registered customers.
Use multiple join conditions to determine how many registered customers spent more than $100.00 in a single purchase in June:
{REG_CUSTOMER LEFT OUTER JOIN PURCHASES on (REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID AND PURCHASES.AMOUNT > 100.00) }
The Data Integration Service returns the following data:
CUST_ID | FIRST_NAME | LAST_NAME | DATE | AMOUNT |
---|
00001 | Marvin | Chi | 6/10/2000 | 255.56 |
00002 | Dinah | Jones | 6/10/2000 | 104.45 |
00003 | John | Bowden | NULL | NULL |
00004 | J. | Marks | 6/15/2000 | 534.95 |
You might use multiple left outer joins if you want to incorporate information about returns during the same time period. For example, the RETURNS table contains the following data:
CUST_ID | CUST_ID | RETURN |
---|
00002 | 6/10/2000 | 55.79 |
00002 | 6/21/2000 | 104.45 |
To determine how many customers made purchases and returns for the month of June, use two left outer joins:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID LEFT OUTER JOIN RETURNS on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Data Integration Service returns the following data:
CUST_ID | FIRST_NAME | LAST_NAME | DATE | AMOUNT | RET_DATE | RETURN |
---|
00001 | Marvin | Chi | 6/10/2000 | 255.56 | NULL | NULL |
00002 | Dinah | Jones | 6/3/2000 | 55.79 | NULL | NULL |
00003 | John | Bowden | NULL | NULL | NULL | NULL |
00004 | J. | Marks | 6/15/2000 | 534.95 | NULL | NULL |
00002 | Dinah | Jones | 6/10/2000 | 104.45 | NULL | NULL |
00002 | Dinah | Jones | 6/21/2000 | 98.65 | NULL | NULL |
00002 | Dinah | Jones | NULL | NULL | 6/10/2000 | 55.79 |
00002 | Dinah | Jones | NULL | NULL | 6/21/2000 | 104.45 |
The Data Integration Service uses NULLs for missing values.
Right Outer Join Syntax
You can create a right outer join with a join override. The right outer join returns the same results as a left outer join if you reverse the order of the resources in the join syntax. Use only one right outer join in a join override. If you want to create more than one right outer join, try reversing the order of the source resources and changing the join types to left outer joins.
When you use a right outer join with other joins, enter the right outer join at the end of the join override.
To create a right outer join, use the following syntax:
{ source1 RIGHT OUTER JOIN source2 on join_condition }
The following table displays syntax for a right outer join in a join override:
Syntax | Description |
---|
source1 | Source resource name. The Data Integration Service returns rows from this resource that match the join condition. |
source2 | Source resource name. With a right outer join, the Data Integration Service returns all rows in this resource. |
join_condition | Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator. |
PreSQL and PostSQL Commands
You can create SQL commands in a customized data object or relational data object instance. You can run the SQL commands to execute SQL statements such as insert, update, and delete. The Data Integration Service runs the SQL commands against the source relational resource.
When you run the mapping, the Data Integration Service runs PreSQL commands against the source database before it reads the source. You might want to use a PreSQL command if you have to update the source before you use the source in the mapping. For example, you can configure a PreSQL command to delete records in the source and update the source to load the latest records in the source database. When you run the mapping, the source that the mapping uses contains the latest records.
The Data Integration Service can also run PostSQL commands against the source database after it writes to the target. For example, you might want to use a PostSQL command to delete the records in the source after the records are loaded to the mapping target. You can configure a PostSQL command to delete the records after the mapping writes to the target.
Use the following guidelines when you configure PreSQL and PostSQL commands:
- •Use any command that is valid for the database type. The Data Integration Service does not allow nested comments, even though the database might allow them.
- •Use a semicolon (;) to separate multiple statements. The Data Integration Service issues a commit after each statement.
- •The Data Integration Service ignores semicolons within /* ... */.
- •If you need to use a semicolon outside comments, you can escape it with a backslash (\). When you escape the semicolon, the Data Integration Service ignores the backslash, and it does not use the semicolon as a statement separator.
- •The Developer tool does not validate the SQL in a PreSQL and PostSQL commands.
Adding Pre- and Post-Mapping SQL Commands
You can add pre- and post-mapping SQL commands to a customized data object or relational data object instance. The Data Integration Service runs the SQL commands when you use the data object in a mapping.
1. Open the customized data object.
2. Select the Read view.
3. Select the Output transformation
4. Select the Advanced properties.
5. Enter a pre-mapping SQL command in the PreSQL field.
6. Enter a post-mapping SQL command in the PostSQL field.
7. Save the customized data object.
Creating a Custom Query
Create a custom query to issue a special SELECT statement for reading data from the sources. The custom query overrides the default query that the Data Integration Service issues to read source data.
1. Open the customized data object or the relational data object instance.
2. Select the Read view.
3. Select the Output transformation.
4. Select the Query properties.
5. Select the advanced query.
6. Select Use custom query.
The Data Integration Service displays the query it issues to read source data.
7. Change the query or replace it with a custom query.
8. If you want to push the custom query to the relational data source, select Push custom query to database.
The Data Integration Service does not the push custom query to the database by default. Choose to push the custom query if the query forms a valid subquery for the database.
9. Save the data object.