Diagnosing and tuning poorly performing Oracle queries

This is a big topic, but we'll focus in on what a bad query plan looks like, what a good query plan looks like, and one example of how to improve a poor performing query. There is no generic way to tune all poorly performing queries, but hopefully this page can educate you on the tools you need to identify and diagnose a poor performing query and an example of how this particular query was tuned. The first part of this how-to is PIM specific, but the important parts are only specific to Oracle.

The first step is to find the poor performing query. In our example, a PIM Desktop user issued a query and noticed poor performance:

images/download/attachments/98009941/image2016-7-26_13_10_40.png

The first step is to go through the logs to try to find the query. As PIM uses hibernate, the following categories (org.hibernate.SQL and org.hibernate.type.descriptor.sql.BasicBinder) had to be updated to TRACE in the log4j.xml file to enable logging of SQLs as well as bind parameters:

<!-- Use "TRACE" to see the debug output from UDA and activate tracing in stored procedures.
Check also server.properties property "db.default.debug.show_sql" -->
<category name="UDA" additivity="false">
<priority value="TRACE"/>
<appender-ref ref="UdaAppender"/>
</category>
<!-- Use "TRACE" to see the debug output from Hibernate
org.hibernate.cache Log all second-level cache activity
org.hibernate.hql.ast.AST Log HQL and SQL ASTs during query parsing
org.hibernate.jdbc Log all JDBC resource acquisition
org.hibernate.pretty Log the state of all entities (max 20 entities) associated with the session at flush time
org.hibernate.SQL Log all SQL DML statements as they are executed (include org.hibernate.type to see binds and values)
org.hibernate.secure Log all JAAS authorization requests
org.hibernate.tool.hbm2ddl Log all SQL DDL statements as they are executed
org.hibernate.transaction Log transaction related activity
org.hibernate.type Log all JDBC parameters
org.hibernate Log everything -->
<category name="org.hibernate.SQL" additivity="false">
<level value="TRACE" />
<appender-ref ref="UdaAppender"/>
</category>
<category name="org.hibernate.type.descriptor.sql.BasicBinder" additivity="false">
<level value="TRACE" />
<appender-ref ref="UdaAppender"/>
</category>

Changes in this file do not require a server restart, and after the above query is executed again, you will need to find the query in the logs.

Here's the query after we've replaced the necessary bind parameters:

set linesize 1000
explain plan for
select articlerev0_."ID", articlerev0_."Identifier"
from
"ArticleRevision" partition(P_PRODUCT) articlerev0_
left outer join
"ArticleDetail" partition(P_PRODUCT) articledet1_
on articlerev0_."ID"=articledet1_."ArticleRevisionID"
and articledet1_."DeletionTimestamp" = TO_TIMESTAMP('9999-12-31','yyyy-mm-dd')
left outer join
"ArticleLang" partition(P_PRODUCT) articlelan2_
on articlerev0_."ID"=articlelan2_."ArticleRevisionID"
and (
articlelan2_."LanguageID"=9
and articlelan2_."Res_LK_Text100_01"='LT'
and articlelan2_."Res_LK_Int_01"=0
and articlelan2_."ChannelID"=1
and articlelan2_."EntityID"=21007
)
and articlelan2_."DeletionTimestamp" = TO_TIMESTAMP('9999-12-31','yyyy-mm-dd')
left outer join
"ArticleAttribute" partition(P_PRODUCT) articleatt3_
on articlerev0_."ID"=articleatt3_."ArticleRevisionID"
and (
articleatt3_."NameInKeyLanguage"='Heel Type'
)
and articleatt3_."DeletionTimestamp" = TO_TIMESTAMP('9999-12-31','yyyy-mm-dd')
left outer join
"ArticleAttributeValue" partition(P_PRODUCT) articleatt4_
on articleatt3_."ID"=articleatt4_."ArticleAttributeID"
and (
articleatt4_."Identifier"='ALL'
and articleatt4_."LanguageID"=9
)
and articleatt4_."DeletionTimestamp" = TO_TIMESTAMP('9999-12-31','yyyy-mm-dd')
where
articlerev0_."RevisionID" = 1
and articlerev0_."DeletionTimestamp" = TO_TIMESTAMP('9999-12-31','yyyy-mm-dd')
and articledet1_."Res_Bit_02"=1
and articlelan2_."Res_Text250_05"='41'
and (
articleatt4_."Value" is not null
)
and articlerev0_."EntityID"=1100;

Explaining the plan query for this resulted in the following. Notice the difference between the estimated row count and the actual row count? This results in Oracle choosing an incorrect plan with an incorrect join order and many nested loops.

images/download/attachments/98009941/image2016-7-26_13_36_54.png

Also notice that an INDEX RANGE SCAN was performed on index XAK2_ArticleAttribute in table ArticleAttribute, which resulted in an incorrect estimated row count.

To investigate further, a 10053 Optimizer trace is needed. In the same session from where you issued the query and explain plan, do the following. Pay attention to the commented lines below. You'll need to turn on tracing, execute the query, then turn off tracing. Then you'll need to view the trace files, preferably with a utility:

-- activate generating trace files
ALTER SESSION SET EVENTS='10053 trace name context forever, level 1';
-- deactive generating trace files
ALTER SESSION SET EVENTS '10053 trace name context off';
-- get filenames for trace session
select u_dump.value || '/' || instance.value || '_ora_' || v$process.spid
|| nvl2(v$process.traceid, '_' || v$process.traceid, null ) || '.trc'"Trace File"
from V$PARAMETER u_dump
cross join V$PARAMETER instance
cross join V$PROCESS
join V$SESSION on v$process.addr = V$SESSION.paddr
where u_dump.name = 'user_dump_dest'
and instance.name = 'instance_name'
and V$SESSION.audsid=sys_context('userenv','sessionid');

Navigate to the Explain Plan section of the trace file to confirm you are looking at the proper query in the trace file:

----- Explain Plan Dump -----
----- Plan Table -----
============
Plan Table
============
---------------------------------------------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
---------------------------------------------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 10 | | | |
| 1 | NESTED LOOPS | | 1 | 116 | 10 | 00:00:01 | | |
| 2 | NESTED LOOPS | | 1 | 116 | 10 | 00:00:01 | | |
| 3 | NESTED LOOPS | | 1 | 84 | 7 | 00:00:01 | | |
| 4 | PARTITION REFERENCE SINGLE | | 1 | 55 | 5 | 00:00:01 | KEY(AP)| KEY(AP)|
| 5 | TABLE ACCESS BY LOCAL INDEX ROWID | ArticleAttribute | 1 | 55 | 5 | 00:00:01 | 2 | 2 |
| 6 | INDEX RANGE SCAN | XAK2_ArticleAttribute | 1 | | 4 | 00:00:01 | 2 | 2 |
| 7 | TABLE ACCESS BY GLOBAL INDEX ROWID | ArticleRevision | 1 | 29 | 2 | 00:00:01 | 2 | 2 |
| 8 | INDEX UNIQUE SCAN | PK_ArticleRevision | 1 | | 1 | 00:00:01 | | |
| 9 | PARTITION REFERENCE SINGLE | | 1 | | 2 | 00:00:01 | KEY(AP)| KEY(AP)|
| 10 | INDEX UNIQUE SCAN | XAK1_ArticleAttributeValue| 1 | | 2 | 00:00:01 | KEY(AP)| KEY(AP)|
| 11 | TABLE ACCESS BY LOCAL INDEX ROWID | ArticleAttributeValue | 1 | 32 | 3 | 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------+-----------------------------------+---------------+

In my trace file the explain plan dump for the query starts at line 6613.

Using a trace file viewer is recommended - general steps to find the issue are:

  1. Find the proper 'optimizing query block' (OPTIMIZING QB) section for your sql (If using v10053.exe look for the OPTIMIZING QB section on the left hand side).

  2. Look for the Access path analysis for the table containing the wrong statistic (ArticleAttribute in our case).

  3. Look for the column containing the index giving the wrong estimated row count. In our case, the XAK2_ArticleAttribute index is on the NameInKeyLanguage column.

At this point you'll want to familiarize yourself with how Oracle uses histograms to manage statistics.

images/download/attachments/98009941/image2016-7-27_15_40_57.png

If you look closely at section 3, you'll notice that the NDV (Number of Distinct Values) is 383, but the #Bkts (Buckets) is 255 from the HtBal (Height Balanced) histogram. The NameInKeyLanguage value of 'Heel Height' was unfortunately not in one of the buckets, which causes the "Using prorated density: 0.000000 of col #54 as selectvity of out-of-range/non-existent value pred" message you see. This causes the optimizer to incorrectly guess that this value doesn't exist in this table and that joining to this table would also return no rows for the query (prorated density of 0).

To fix this, we need to delete the histogram. Here's a script which deletes all histograms for all varchars with > 254 distinct values, as the number of buckets isn't sufficient to cover all values so a height balanced histogram isn't desired for this scenario (in general, height balanced histograms are almost useless for varchar2 columns).

Script which deletes all height balanced histograms for nvarchar2 columns with more than 254 distinct values
-- script which deletes all height balanced histograms for nvarchar2 columns with more than 254 distinct values
BEGIN
FOR COL_ITEM IN (
SELECT stats.OWNER, stats.TABLE_NAME, stats.COLUMN_NAME
FROM DBA_TAB_COL_STATISTICS stats
INNER JOIN DBA_TAB_COLUMNS cols
ON stats.TABLE_NAME = cols.TABLE_NAME
AND stats.COLUMN_NAME = cols.COLUMN_NAME
WHERE cols.DATA_TYPE = 'NVARCHAR2'
AND stats.HISTOGRAM <> 'NONE'
AND stats.NUM_DISTINCT > 254
AND stats.OWNER = cols.OWNER
AND stats.OWNER IN ('HBC_HPM_MAIN','HBC_HPM_MASTER','HBC_HPM_SUPPLIER')
ORDER BY stats.OWNER, stats.TABLE_NAME, stats.COLUMN_NAME
)
LOOP
DBMS_STATS.DELETE_COLUMN_STATS (OWNNAME=>COL_ITEM.OWNER, TABNAME=>'"' || COL_ITEM.TABLE_NAME || '"', COLNAME=>'"' || COL_ITEM.COLUMN_NAME || '"', COL_STAT_TYPE=>'HISTOGRAM');
END LOOP;
END;
/

Finally, to prevent Oracle's automatic statistics job from regenerating the histogram you need to call the set_table_prefs function.

Pass size 1 as the METHOD_OPT to prevent histogram creation:

--This does not work as column and table name are lowercase and must be passed with double quote.
--TODO is to fix this call to work with mixed case table and column names
BEGIN
DBMS_STATS.set_table_prefs('HBC_HPM_MASTER', '"ArticleAttribute"','METHOD_OPT',
'for all columns size skewonly for columns size 1 "NameInKeyLanguage"');
END;
/

After fixing this and a similar issue in the Res_Text250_05 column for table ArticleLang, we get the following plan. While the actual and estimated row counts are off, the join order is now more efficient (Good: ArticleLang, ArticleDetail, ArticleAttribute, ArticleAttributeValue vs. Bad: ArticleAttribute, ArticleDetail, ArticleLang, ArticleAttributeValue previously). For outer joins In general, the most limiting tables should be joined first, which for PIM means ArticleAttribute and ArticleAttributeValue should usually be joined last as they are larger than the Article table by some factor.

images/download/attachments/98009941/image2016-7-27_17_52_39.png