Developer Tool Guide > Physical Data Objects > Custom Queries
  

Custom Queries

A custom SQL query is a SELECT statement that overrides the default SQL query in a customized data object.
A custom query overrides the default SQL query that the Data Integration Service uses to read data from a relational source. The custom query also overrides the simple query settings that you define when you enter a source filter, use sorted ports, enter a user-defined join, or select distinct ports.
You can create a custom query to perform SQL operations that are valid in the database language but not available in the transformation language. When you define a custom query in a customized data object, you can reuse the object in multiple mappings or profiles.
Use the following guidelines when you create a custom query in a customized data object:
If you use a customized data object to perform a self-join, you must enter a custom SQL query that includes the self-join. You can use a customized data object with a custom query as a source in a mapping. The source database executes the query before it passes data to the Data Integration Service. You can create a custom query to add sources to an empty customized data object. You can also use a custom query to override the default SQL query.

Custom Query Optimization

The Data Integration Service can push a custom query to run in a relational data object for increased performance. Choose to push a custom query if the query forms a valid subquery for the database.
When you use a custom query to read data in a relational data object, the Data Integration Service can optimize the query by pushing the query to run in the database. The Data Integration Service can push the custom query if the query forms a valid subquery for the database. If the SQL syntax for the custom query is not valid in a subquery for the database, the resulting query fails to run.
If you push a custom query to a relational database other than IBM DB2, you must specify an alias for each expression in the select list that is not a column reference. The aliases allow the Data Integration Service to refer to the expressions in the select list.
See the database documentation for information about valid SQL syntax for aliases and subqueries.

Default Query

The Data Integration Service generates a default SQL query that it uses to read data from relational sources. You can override the default query in a customized data object or an instance of a relational data object.
You can override the default query through the simple or advanced query. Use the simple query to select distinct values, enter a source filter, sort ports, or enter a user-defined join. Use the advanced query to create a custom SQL query for reading data from the sources. The custom query overrides the default and simple queries.
If any table name or column name contains a database reserved word, you can create and maintain a reserved words file, reswords.txt. Create the reswords.txt file on any machine the Data Integration Service can access.
When the Data Integration Service runs a mapping, it searches for the reswords.txt file. If the file exists, the Data Integration Service places quotes around matching reserved words when it executes SQL against the database. If you override the default query, you must enclose any database reserved words in quotes.
When the Data Integration Service generates the default query, it delimits table and field names containing the following characters with double quotes:
/ + - = ~ ` ! % ^ & * ( ) [ ] { } ' ; ? , < > \ | <space>

Creating a Reserved Words File

Create a reserved words file if any table name or column name in the customized data object contains a database reserved word.
You must have administrator privileges to configure the Data Integration Service to use the reserved words file.
    1. Create a file called "reswords.txt."
    2. Create a section for each database by entering the database name within square brackets, for example, [Oracle].
    3. Add the reserved words to the file below the database name.
    For example:
    [Oracle]
    OPTION
    START
    where
    number
    [SQL Server]
    CURRENT
    where
    number
    Entries are not case sensitive.
    4. Save the reswords.txt file.
    5. In Informatica Administrator, select the Data Integration Service.
    6. Edit the custom properties.
    7. Add the following custom property:
    Name
    Value
    Reserved Words File
    <path>\reswords.txt
    8. Restart the Data Integration Service.

Hints

You can add hints to the source SQL query to pass instructions to a database optimizer. The optimizer uses the hints to choose a query run plan to access the source.
The Hints field appears in the Query view of a relational data object instance or a customized data object. The source database must be Oracle, Sybase, IBM DB2, or Microsoft SQL Server. The Hints field does not appear for other database types.
When the Data Integration Service generates the source query, it adds the SQL hints to the query exactly as you enter them in the Developer tool. The Data Integration Service does not parse the hints. When you run the mapping that contains the source, the mapping log shows the query with the hints in the query.
The Data Integration Service inserts the SQL hints in a position in the query depending on the database type. Refer to your database documentation for information about the syntax for hints.

Oracle

The Data Integration Service add hints directly after the SELECT/UPDATE/INSERT/DELETE keyword.
SELECT /*+ <hints> */ FROM …
The '+' indicates the start of hints.
The hints are contained in a comment (/* ... */ or --... until end of line)

Sybase

The Data Integration Service adds hints after the query. Configure a plan name in the hint.
SELECT … PLAN <plan>
select avg(price) from titles plan "(scalar_agg (i_scan type_price_ix titles )"

IBM DB2

You can enter the optimize-for clause as a hint. The Data Integration Service adds the clause at the end of the query.
SELECT … OPTIMIZE FOR <n> ROWS
The optimize-for clause tells the database optimizer how many rows the query might process. The clause does not limit the number of rows. If the database processes more than <n> rows, then performance might decrease.

Microsoft SQL Server

The Data Integration Service adds hints at the end of the query as part of an OPTION clause.
SELECT … OPTION ( <query_hints> )

Hints Rules and Guidelines

Use the following rules and guidelines when you configure hints for SQL queries:

Creating Hints

Create hints to send instructions to the database optimizer to determine a query plan.
    1. Open the customized data object or the relational data object instance.
    2. Select the Read view.
    3. Select the Output transformation.
    4. Select the Query properties.
    5. Select the simple query.
    6. Click Edit next to the Hints field.
    The Hints dialog box appears.
    7. Enter the hint in the SQL Query field.
    The Developer tool does not validate the hint.
    8. Click OK.
    9. Save the data object.

Select Distinct

You can select unique values from sources in a customized data object or a relational data object instance with the select distinct option. When you enable select distinct, the Data Integration Service adds a SELECT DISTINCT statement to the default SQL query.
Use the select distinct option to filter source data. For example, you might use the select distinct option to extract unique customer IDs from a table that lists total sales. When you use the relational data object in a mapping, the Data Integration Service filters data earlier in the data flow, which can increase performance.

Using Select Distinct

Select unique values from a relational source with the Select Distinct property.
    1. Open the customized data object or relational data object instance.
    2. Select the Read view.
    3. Select the Output transformation.
    4. Select the Query properties.
    5. Select the simple query.
    6. Enable the Select Distinct option.
    7. Save the customized data object.

Filters

You can enter a filter value in a custom query. The filter becomes the WHERE clause in the query SELECT statement. Use a filter to reduce the number of rows that the Data Integration Service reads from the source table.

Entering a Source Filter

Enter a source filter to reduce the number of rows the Data Integration Service reads from the relational source.
    1. Open the customized data object or the relational data object instance.
    2. Select the Read view.
    3. Select the Output transformation.
    4. Select the Query properties.
    5. Select the simple query.
    6. Click Edit next to the Filter field.
    The SQL Query dialog box appears.
    7. Enter the filter condition in the SQL Query field.
    You can select column names from the Columns list.
    8. Click OK.
    9. Click Validate to validate the filter condition.
    10. Save the data object.

Sorted Ports

You can sort rows in the default query for a customized data object or a relational data object instance. Select the ports to sort by. The Data Integration Service adds the ports to the ORDER BY clause in the default query.
You might sort the source rows to increase performance when you include the following transformations in a mapping:
Note: You can also use the Sorter transformation to sort relational and flat file data before Aggregator and Joiner transformations.

Sorting Column Data

Use sorted ports to sort column data in a customized data object or relational data object instance. When you use the data object as a read transformation in a mapping or mapplet, you can pass the sorted data to transformations downstream from the read transformation.
    1. Open the customized data object or relational data object instance.
    2. Select the Read view.
    The following figure shows the Read view of a customized data object that is open in the editor:
    The Read view shows the ReportDefinition relational data object and the Output transformation.
    3. Select the Output transformation.
    4. Select the Query properties.
    5. Select the simple query.
    6. Click Edit next to the Sort field.
    The Sort dialog box appears.
    7. To specify a column as a sorted port, click the New button.
    8. Select the column and sort type, either ascending or descending.
    9. Repeat steps Using Sorted Ports and Using Sorted Ports to select other columns to sort.
    The Developer tool sorts the columns in the order in which they appear in the Sort dialog box.
    10. Click OK.
    In the Query properties, the Developer tool displays the sort columns in the Sort field.
    11. Click Validate to validate the sort syntax.
    12. Save the data object.

User-Defined Joins

You can configure a user-defined join in a customized data object or relational data object instance. A user-defined join defines the condition to join data from multiple sources in the same data object.
When you add a user-defined join to a customized data object or a relational data object instance, you can use the data object as a read transformation in a mapping. The source database performs the join before it passes data to the Data Integration Service. Mapping performance increases when the source tables are indexed.
Creat a user-defined join to join data from related sources. The user-defined join overrides the default inner join that the Data Integration creates based on the related keys in each source. When you enter a user-defined join, enter the contents of the WHERE clause that specifies the join condition. If the user-defined join performs an outer join, the Data Integration Service might insert the join syntax in the WHERE clause or the FROM clause, based on the database syntax.
You might need to enter a user-defined join in the following circumstances:
Use the following guidelines when you enter a user-defined join in a customized data object or relational data object instance:
User-defined joins join data from related resources in a database. To join heterogeneous sources, use a Joiner transformation in a mapping that reads data from the sources. To perform a self-join, you must enter a custom SQL query that includes the self-join.

Entering a User-Defined Join

Configure a user-defined join in a customized data object or relational data object to define the join condition for the data object sources.
    1. Open the customized data object or relational data object instance.
    2. Select the Read view.
    3. Select the Output transformation.
    4. Select the Query properties.
    5. Select the simple query.
    6. Click Edit next to the Join field.
    The SQL Query dialog box appears.
    7. Enter the user-defined join in the SQL Query field.
    You can select column names from the Columns list.
    8. Click OK.
    9. Click Validate to validate the user-defined join.
    10. Save the data object.

Outer Join Support

You can use a customized data object to perform an outer join of two sources in the same database. When the Data Integration Service performs an outer join, it returns all rows from one source resource and rows from the second source resource that match the join condition.
Use an outer join when you want to join two resources and return all rows from one of the resources. For example, you might perform an outer join when you want to join a table of registered customers with a monthly purchases table to determine registered customer activity. You can join the registered customer table with the monthly purchases table and return all rows in the registered customer table, including customers who did not make purchases in the last month. If you perform a normal join, the Data Integration Service returns only registered customers who made purchases during the month, and only purchases made by registered customers.
With an outer join, you can generate the same results as a master outer or detail outer join in the Joiner transformation. However, when you use an outer join, you reduce the number of rows in the data flow which can increase performance.
You can enter two kinds of outer joins:
Note: Use outer joins in nested query statements when you override the default query.
You can enter an outer join in a user-defined join or in a custom SQL query.

Informatica Join Syntax

When you enter join syntax, use the Informatica or database-specific join syntax. When you use the Informatica join syntax, the Data Integration Service translates the syntax and passes it to the source database during a mapping run.
Note: Always use database-specific syntax for join conditions.
When you use Informatica join syntax, enclose the entire join statement in braces ({Informatica syntax}). When you use database syntax, enter syntax supported by the source database without braces.
When you use Informatica join syntax, use table names to prefix column names. For example, if you have a column named FIRST_NAME in the REG_CUSTOMER table, enter “REG_CUSTOMER.FIRST_NAME” in the join syntax. Also, when you use an alias for a table name, use the alias within the Informatica join syntax to ensure the Data Integration Service recognizes the alias.
You can combine left outer or right outer joins with normal joins in a single data object. You cannot combine left and right outer joins. Use multiple normal joins and multiple left outer joins. Some databases limit you to using one right outer join.
When you combine joins, enter the normal joins first.

Normal Join Syntax

You can create a normal join using the join condition in a customized data object or relational data object instance.
When you create an outer join, you must override the default join. As a result, you must include the normal join in the join override. When you include a normal join in the join override, list the normal join before outer joins. You can enter multiple normal joins in the join override.
To create a normal join, use the following syntax:
{ source1 INNER JOIN source2 on join_condition }
The following table displays the syntax for normal joins in a join override:
Syntax
Description
source1
Source resource name. The Data Integration Service returns rows from this resource that match the join condition.
source2
Source resource name. The Data Integration Service returns rows from this resource that match the join condition.
join_condition
Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.
For example, you have a REG_CUSTOMER table with data for registered customers:
CUST_ID
FIRST_NAME
LAST_NAME
00001
Marvin
Chi
00002
Dinah
Jones
00003
John
Bowden
00004
J.
Marks
The PURCHASES table, refreshed monthly, contains the following data:
TRANSACTION_NO
CUST_ID
DATE
AMOUNT
06-2000-0001
00002
6/3/2000
55.79
06-2000-0002
00002
6/10/2000
104.45
06-2000-0003
00001
6/10/2000
255.56
06-2000-0004
00004
6/15/2000
534.95
06-2000-0005
00002
6/21/2000
98.65
06-2000-0006
NULL
6/23/2000
155.65
06-2000-0007
NULL
6/24/2000
325.45
To return rows displaying customer names for each transaction in the month of June, use the following syntax:
{ REG_CUSTOMER INNER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Data Integration Service returns the following data:
CUST_ID
DATE
AMOUNT
FIRST_NAME
LAST_NAME
00002
6/3/2000
55.79
Dinah
Jones
00002
6/10/2000
104.45
Dinah
Jones
00001
6/10/2000
255.56
Marvin
Chi
00004
6/15/2000
534.95
J.
Marks
00002
6/21/2000
98.65
Dinah
Jones
The Data Integration Service returns rows with matching customer IDs. It does not include customers who made no purchases in June. It also does not include purchases made by non-registered customers.

Left Outer Join Syntax

You can create a left outer join with a join override. You can enter multiple left outer joins in a single join override. When using left outer joins with other joins, list all left outer joins together, after any normal joins in the statement.
To create a left outer join, use the following syntax:
{ source1 LEFT OUTER JOIN source2 on join_condition }
The following tables displays syntax for left outer joins in a join override:
Syntax
Description
source1
Source resource name. With a left outer join, the Data Integration Service returns all rows in this resource.
source2
Source resource name. The Data Integration Service returns rows from this resource that match the join condition.
join_condition
Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.
For example, using the same REG_CUSTOMER and PURCHASES tables described in Normal Join Syntax, you can determine how many customers bought something in June with the following join override:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Data Integration Service returns the following data:
CUST_ID
FIRST_NAME
LAST_NAME
DATE
AMOUNT
00001
Marvin
Chi
6/10/2000
255.56
00002
Dinah
Jones
6/3/2000
55.79
00003
John
Bowden
NULL
NULL
00004
J.
Marks
6/15/2000
534.95
00002
Dinah
Jones
6/10/2000
104.45
00002
Dinah
Jones
6/21/2000
98.65
The Data Integration Service returns all registered customers in the REG_CUSTOMERS table, using null values for the customer who made no purchases in June. It does not include purchases made by non-registered customers.
Use multiple join conditions to determine how many registered customers spent more than $100.00 in a single purchase in June:
{REG_CUSTOMER LEFT OUTER JOIN PURCHASES on (REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID AND PURCHASES.AMOUNT > 100.00) }
The Data Integration Service returns the following data:
CUST_ID
FIRST_NAME
LAST_NAME
DATE
AMOUNT
00001
Marvin
Chi
6/10/2000
255.56
00002
Dinah
Jones
6/10/2000
104.45
00003
John
Bowden
NULL
NULL
00004
J.
Marks
6/15/2000
534.95
You might use multiple left outer joins if you want to incorporate information about returns during the same time period. For example, the RETURNS table contains the following data:
CUST_ID
CUST_ID
RETURN
00002
6/10/2000
55.79
00002
6/21/2000
104.45
To determine how many customers made purchases and returns for the month of June, use two left outer joins:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID LEFT OUTER JOIN RETURNS on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }
The Data Integration Service returns the following data:
CUST_ID
FIRST_NAME
LAST_NAME
DATE
AMOUNT
RET_DATE
RETURN
00001
Marvin
Chi
6/10/2000
255.56
NULL
NULL
00002
Dinah
Jones
6/3/2000
55.79
NULL
NULL
00003
John
Bowden
NULL
NULL
NULL
NULL
00004
J.
Marks
6/15/2000
534.95
NULL
NULL
00002
Dinah
Jones
6/10/2000
104.45
NULL
NULL
00002
Dinah
Jones
6/21/2000
98.65
NULL
NULL
00002
Dinah
Jones
NULL
NULL
6/10/2000
55.79
00002
Dinah
Jones
NULL
NULL
6/21/2000
104.45
The Data Integration Service uses NULLs for missing values.

Right Outer Join Syntax

You can create a right outer join with a join override. The right outer join returns the same results as a left outer join if you reverse the order of the resources in the join syntax. Use only one right outer join in a join override. If you want to create more than one right outer join, try reversing the order of the source resources and changing the join types to left outer joins.
When you use a right outer join with other joins, enter the right outer join at the end of the join override.
To create a right outer join, use the following syntax:
{ source1 RIGHT OUTER JOIN source2 on join_condition }
The following table displays syntax for a right outer join in a join override:
Syntax
Description
source1
Source resource name. The Data Integration Service returns rows from this resource that match the join condition.
source2
Source resource name. With a right outer join, the Data Integration Service returns all rows in this resource.
join_condition
Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.

PreSQL and PostSQL Commands

You can create SQL commands in a customized data object or relational data object instance. You can run the SQL commands to execute SQL statements such as insert, update, and delete. The Data Integration Service runs the SQL commands against the source relational resource.
When you run the mapping, the Data Integration Service runs PreSQL commands against the source database before it reads the source. You might want to use a PreSQL command if you have to update the source before you use the source in the mapping. For example, you can configure a PreSQL command to delete records in the source and update the source to load the latest records in the source database. When you run the mapping, the source that the mapping uses contains the latest records.
The Data Integration Service can also run PostSQL commands against the source database after it writes to the target. For example, you might want to use a PostSQL command to delete the records in the source after the records are loaded to the mapping target. You can configure a PostSQL command to delete the records after the mapping writes to the target.
Use the following guidelines when you configure PreSQL and PostSQL commands:

Adding Pre- and Post-Mapping SQL Commands

You can add pre- and post-mapping SQL commands to a customized data object or relational data object instance. The Data Integration Service runs the SQL commands when you use the data object in a mapping.
    1. Open the customized data object.
    2. Select the Read view.
    3. Select the Output transformation
    4. Select the Advanced properties.
    5. Enter a pre-mapping SQL command in the PreSQL field.
    6. Enter a post-mapping SQL command in the PostSQL field.
    7. Save the customized data object.

Creating a Custom Query

Create a custom query to issue a special SELECT statement for reading data from the sources. The custom query overrides the default query that the Data Integration Service issues to read source data.
    1. Open the customized data object or the relational data object instance.
    2. Select the Read view.
    3. Select the Output transformation.
    4. Select the Query properties.
    5. Select the advanced query.
    6. Select Use custom query.
    The Data Integration Service displays the query it issues to read source data.
    7. Change the query or replace it with a custom query.
    8. If you want to push the custom query to the relational data source, select Push custom query to database.
    The Data Integration Service does not the push custom query to the database by default. Choose to push the custom query if the query forms a valid subquery for the database.
    9. Save the data object.