Tips to Write Effective Queries and Explain Plan

SQL Statement Processing Phases
Detect Driving Table
Several Tips to write better queries
Explain Plan
Explain Plan Operations Reference (Table Access Methids, Index Access Methods, Join Operations, Sort Operations)
AutoTrace

SQL Statement Processing Phases
The four statement processing phases in SQL are parsing binding, executing and fetching.

PARSE: During the Parse Phase, The oracle server searches for the statement in the shared pool, checks the syntax and semantics and then determines the execution plan. This is the most expensive phase.

BIND: It scans the statement for bind variables and assigns a value to each variable.

EXECUTE: The Server applies the parse tree to the data buffers, performs necessary I/O and sorts for DML statements.

FETCH: Retrieves rows for a SELECT statement during the fetch phase. Each fetch retrieves multiple rows, using an array fetch.

Optimizer Method and how to know the Driving Table.
A small "golden rule" is that your driving table should be the one that eliminates the must rows and you should choose that one first. But…. Where to specify the driving Table?
Oracle processes result sets a table at a time. It starts by retrieving all the data for the first (driving) table. Once this data is retrieved it is used to limit the number of rows processed for subsequent (driven) tables. In the case of multiple table joins, the driving table limits the rows processed for the first driven table. Once processed, this combined set of data is the driving set for the second driven table etc. Roughly translated into English, this means that it is best to process tables that will retrieve a small number of rows first. The optimizer will do this to the best of its ability regardless of the structure of the DML, but the following factors may help.
Both the Rule and Cost based optimizers select a driving table for each query. If a decision cannot be made, the order of processing is from the end of the FROM clause to the start. Therefore, you should always place your driving table at the end of the FROM clause. Subsequent driven tables should be placed in order so that those retrieving the most rows are nearer to the start of the FROM clause. Confusingly, the WHERE clause should be writen in the opposite order, with the driving tables conditions first and the final driven table last. ie.

RBO (Rule Based Optimizer) chooses the driving order by taking the tables in the FROM clause RIGHT to LEFT.
CBO (Cost Based Optimizer) determines join order from costs derived from gathered statistics. If there are no statistics, if the optimizer_mode IS COST then CBO chooses the driving order of tables from LEFT to RIGHT in the FROM clause, IF the optimizer_mode IS CHOOSE then RBO method will be used (RIGHT to LEFT).

Hence, it is important to know the driving table, which has the smallest number of rows in a query. The driving table is important because it is retrieved first, and the rows from the second table are then merged into the result set from the first table. If three tables are being joined, select the intersection table as the driving table. The intersection table is the table that has many tables dependent on it. So because usually you will use CBO (if you don't know ask your DBA!!!), you should do something like:

SELECT something
FROM detail, master
WHERE detail.key = master.key

Now, there are usually other conditions in the WHERE clause and the optimizer (RULE - based) will decide which table to use as a driving table depending on what conditions is present and what indexes is available. In this case the detail table will be the driving table because of the explicit condition on the key column in the table

SELECT something
FROM detail, master
WHERE detail.key = master.key
AND detail.key = 'some value'

While in this case the master table will be driving table

SELECT something
FROM detail, master
WHERE detail.key = master.key
AND master.key = 'some value'

So you have to look at whole statement and of course this is relevant for RULE based approach ONLY!

The case for the Cost Based Optimizer (CBO) is that if it has enough statistical information about the tables, it can construct the plan based on data, not just structure and might use another access path for the same statement as the RULE based approach would use. The best way to identify the driving table is to look at the Explain plan for the query.

The WHERE clause is the main decision maker about which indexes to use. You should always try to use your unique indexes first, and then if that is not possible then use a non-unique index. For a query to use an index, one or more fields from that index need to be mentioned in the WHERE clause. On concatenated indexes the index will only be used if the first field in the index is mentioned.On 10g that in not needed any more!!!
The more of its fields are mentioned in the where clause, the better an index is used.

The TIPS to write better queries
Although two SQL statements may produce the same result, Oracle may process one faster than the other. You can use the results of the EXPLAIN PLAN statement to compare the execution plans and costs of the two statements and determine which is more efficient. Following are some tips that help in writing efficient queries.

0. Existence of a row
Do not use ‘Select count(*)…’ to test the existence of a row. Instead, open an explicit cursor, fetch once, and then check cursor%NOTFOUND :

If you are going to insert a row or update one if that exists, instead of:
DECLARE
/* Declare variables which will be used in SQL statements */
v_LastName VARCHAR2(10) := 'Pafumi';
v_NewMajor VARCHAR2(10) := 'Computer';
v_exists number := 0;
BEGIN
   Select count(1) into v_exists from students
      WHERE last_name = v_LastName;

   If v_exists = 1 then
     /* Update the students table. */
     UPDATE students
       SET major = v_NewMajor
       WHERE last_name = v_LastName;
   else
    INSERT INTO students (ID, last_name, major)
      VALUES (10020, v_LastName, v_NewMajor);
END IF;
END;
/

Try to perform the following, is much faster !!!!
DECLARE
/* Declare variables which will be used in SQL statements */
v_LastName VARCHAR2(10) := 'Pafumi';
v_NewMajor VARCHAR2(10) := 'Computer';
BEGIN
/* Update the students table. */
UPDATE students
    SET major = v_NewMajor
    WHERE last_name = v_LastName;

/* Check to see if the record was found. If not, then we need to insert this record. */
IF SQL%NOTFOUND THEN
    INSERT INTO students (ID, last_name, major)
      VALUES (10020, v_LastName, v_NewMajor);
END IF;
END;
/

--Another Example if trying to insert values in a table with PK:
   INSERT INTO RecognitionLog(MachineName,StartDateTime)
            values(p_MachineName,p_StartDateTime);
   p_RowsAffected := SQL%ROWCOUNT;
   COMMIT;
   RETURN 0;
EXCEPTION
When Dup_val_on_index then
    UPDATE RecognitionLog
          SET EndDateTime = p_EndDateTime,
              TotalRecognized = p_TotalRecognized,
              TotalRecognitionFailed = p_TotalRecognitionFailed
          WHERE MachineName = p_MachineName
            AND StartDateTime = p_StartDateTime;
    p_RowsAffected := SQL%ROWCOUNT;
    commit;
    RETURN 0;
when others then
    rollback;
    p_RowsAffected := 0;
    return 1

1. Avoid the use of NULL or IS NOT NULL.
Instead of:
Select * from clients where phone_number is null;
Use:
Select * from clients where phone_number = 0000000000000000;

2. Select the data that you need ONLY!!!
When selecting from a table, be sure to only select the data that you need.
For example, if you only need 1 column from a 50 column table, be sure to do a
'select fld from table' and only retrieve what you need. If you do a
'select * from table' you will be fetching ALL columns of the table which increases network traffic and causes the system to perform unnecessary work to retrieve data that is not being used

3. Wich table first?
If you specify 2 tables in the FROM clause of a SELECT statement, the parser will process the tables depending on the Optimizer Method that you are using (see below for more details). Basically:
-RBO chooses the driving order by taking the tables in the FROM clause RIGHT to LEFT.
-CBO determines join order from costs derived from gathered statistics. If there are no statistics then CBO chooses the driving order of tables from LEFT to RIGHT in the FROM clause. This is OPOSITE to the RBO.
Both the Rule and Cost based optimizers select a driving table for each query. If a decision cannot be made, the order of processing is from the end of the FROM clause to the start. Therefore, you should always place your driving table at the end of the FROM clause. Subsequent driven tables should be placed in order so that those retrieving the most rows are nearer to the start of the FROM clause. Confusingly, the WHERE clause should be writen in the opposite order, with the driving tables conditions first and the final driven table last.

4. If three tables are being joined, select the intersection tables as the driving table.
The intersection table is the table that has many tables dependent on it.

5. Always use table alias and prefix
The parse phase for statements can be decreased by efficient use of aliasing. This helps the speed of parsing the statements in two ways:

If an alias is not present, the engine must resolve which tables own the specified columns.
A short alias is parsed more quickly than a long table name or alias. If possible, reduce the alias to a single letter.

6. IN and EXISTS

Correlated Queries
A subquery is said to be Correlated when it is joined to the outer query within the Subquery. An example of this is:
Select last_name, first_name
From Customer
Where customer.city = ‘Chicago’
     and Exists
           (Select customer_id
            From Sales where
            sales.total_sales_amt > 10000
            and sales.customer_id = customer.customer_id);
Notice that the last line in the above query is a join of the outer Customer table and inner Sales tables. This join is in the subquery. Given the query above, the outer query is read and for each Customer in Chicago, the outer row is joined to the Subquery. Therefore, in the case of a subquery, the inner query is executed once for every row read in the outer query.
This is efficient where a relatively small number of rows are processed by the query, but considerable overhead is incurred when a large number of rows are read.

Uncorrelated Queries (sub-query executes first)
A subquery is said to be uncorrelated (aka non-correlated) when the two tables are not joined together in the inner query. In effect, the inner (sub) query is processed first and then the result set is joined to the outer query. An example of this is:
Select last_name, first_name
From Customer
Where customer_id IN
          (Select customer_id
            From Sales where
            sales.total_sales_amt > 10000);
The Sales table will be processed first and then all entries with a total_sales_amt > 10000 will be joined to the Customer table. This is efficient where a large number of rows is being processed.

IN and EXISTS
Subqueries can be written using the ‘IN’ and ‘EXISTS’ clauses. In general, the IN clause is used in the case of an uncorrelated subquery where the inner query is processed first and the temporary result set table that is created is joined to the outer table. This is very efficient for queries that return a large number of rows. An example of the use of the IN clause is shown below:
Select last_name, first_name
From Customer
Where customer_id IN
          (Select customer_id
            From Sales where
            sales.total_sales_amt > 10000);

The EXISTS clause on the other hand, is used in the case of a correlated subquery where outer query is processed first and as each row from the outer query is retrieved, it is joined to the inner query. Therefore the inner query is performed once for each result row in the outer query (as opposed to the ‘IN’ query shown above where the inner query is performed only one time). EXISTS often result in a FULL TABLE SCAN. An example of the use of the EXISTS clause is shown below:
Select last_name, first_name
From Customer
Where EXISTS
          (Select customer_id
            From Sales where
            customer.customer_id = sales.customer_id
            sales.total_sales_amt > 10000);
The optimizer is more likely to translate an IN into a join. It is important to understand the number of rows to be returned by a query and then decide which approach to use.

EXISTS vs. IN

use a join where possible
use IN over EXISTS (i.e. non-correlated subquery vs. correlated subquery).
The optimizer is more likely to translate IN into a Join than it is with EXISTS
IN executes a subquery once while Exists executes it once per outer row
In is similar to a merge-scan while Exists is similar to a nested-loop join.
There are some cases where EXISTS can outperform IN, but in more cases IN will dramatically out-perform EXISTS. In general, IN is better than EXISTS.
EXISTS tries to satisfy the subquery as quickly as possible and returns ‘true’ if the subquery returns 1 or more rows – it should be indexed. Optimize the execution of the subquery.

Not In vs. Not Exists
Subqueries may be written using NOT IN and NOT EXISTS clauses. The NOT EXISTS clause is sometimes more efficient since the database only needs to verify non-existence. With NOT IN the entire result set must be materialized. Another consideration when using NOT IN, is if the subquery returns NULLS, the results may not be returned (at all). With NOT EXISTS, a value in the outer query that has a NULL value in the inner will be returned. Not In performs very well as an anti-join using the cost-based optimizer and often performs Not Exists when this access path is used. Outer joins can also be a very fast way to accomplish this.

7. Use Joins in place of EXISTS.
       SELECT *
       FROM   emp e
       WHERE EXISTS (SELECT d.deptno
        FROM   dept d
                    WHERE e.deptno = d.deptno
                        AND    d.dname = 'RESEARCH');

To improve performance use the following:
      SELECT *
       FROM   emp e, dept d
       WHERE e.deptno = d.deptno
       AND    d.dname = ‘RESEARCH’;

8. EXISTS in place of DISTINCT.
       SELECT DISTINCT d.deptno, d.dname ,
       FROM     dept d, emp e
       WHERE   d.deptno = e.deptno;

The following SQL statement is a better alternative.
      SELECT   d.deptno , d.dname
       FROM     dept d
       WHERE EXISTS (SELECT 'X'
        FROM    emp e
          WHERE   d.deptno = e.deptno);

9. Math Expressions.
The optimizer fully evaluates expressions whenever possible and translates certain syntactic constructs into equivalent constructs. This is done either because Oracle can more quickly evaluate the resulting expression than the original expression or because the original expression is merely a syntactic equivalent of the resulting expression.
Any computation of constants is performed only once when the statement is optimized rather than each time the statement is executed. Consider these conditions that test for monthly salaries greater than 2000:

   sal > 24000/12
   sal > 2000
   sal*12 > 24000

If a SQL statement contains the first condition, the optimizer simplifies it into the second condition.
Note that the optimizer does not simplify expressions across comparison operators. The optimizer does not simplify the third expression into the second. For this reason, application developers should write conditions that compare columns with constants whenever possible, rather than conditions with expressions involving columns.

   The Optimizer does not use index for the following statement.
       SELECT *
       FROM   emp
   WHERE sal*12 > 24000;

   Instead use the following statement.
       SELECT *
      FROM emp
      WHERE sal > 24000/12;

10. Never use NOT in an indexed column. Whenever Oracle encounters a NOT in an index column, it will perform full-table scan.

     SELECT *
       FROM emp
   WHERE NOT deptno = 0;

Instead use the following.
     SELECT *
       FROM emp
       WHERE deptno > 0;

11. Never use a function / calculation on an indexed column (unless you are SURE that you are using an Index Function Based new in Oracle 8i). If there is any function is used on an index column, optimizer will not use index. Use some other alternative. If you don’t have another choice, keep functions on the right hand side of the equal sign. The Concatenate || symbol will also disable indexes. Examples:

      /** Do not use **/
       SELECT * FROM emp WHERE SUBSTR (ENAME, 1,3) = ‘MIL’;

     /** Suggested Alternative **/
      Note: Optimizer uses the index only when optimizer_goal is set to FIRST_ROWS.
       SELECT * FROM   emp WHERE ENAME LIKE 'MIL%’;

     /** Do not use **/
       SELECT * FROM emp WHERE sal! = 0;
     Note: Index can tell you what is there in a table but not what is not in a table.
     Note: Optimizer uses the index only when optimizer_goal = FIRST_ROWS.

     /** Suggested Alternative **/
       SELECT * FROM emp WHERE sal > 0;

     /** Do not use **/
       SELECT * FROM   emp WHERE ename || job = ‘MILLERCLERK’;
     Note: || is the concatenate function. Like other functions it disables index.

     /** Suggested Alternative **/
      Note: Optimizer uses the index only when optimizer_goal=FIRST_ROWS.
       SELECT *
         FROM   emp
       WHERE ename = 'MILLER'
AND    job   = ‘CLERK’;

12. Whenever possible try to use bind variables

In Dynamic SQL, this is a MUST!!!

The next example would always require a hard parse when it is submitted:
create or replace procedure dsal(p_empno in number) as
begin
execute immediate 'update emp set sal = sal*2 where empno = '||p_empno;
commit;
end;
/

Is more effective to use bind variables on the EXECUTE IMMEDIATE command as follows:
create or replace procedure dsal(p_empno in number) as
begin
execute immediate 'update emp set sal=sal*2 where empno = :x' using p_empno;
commit;
end;
/

The Performance Killer
Just to give you a tiny idea of how huge of a difference this can make performance wise, you only need to run a very small test:

Here is the Performance Killer ....
SQL> alter system flush shared_pool;
SQL> set serveroutput on;

declare
   type rc is ref cursor;
   l_rc rc;
   l_dummy all_objects.object_name%type;
   l_start number default dbms_utility.get_time;
begin
   for i in 1 .. 1000
   loop
      open l_rc for
          'select object_name from all_objects
            where object_id = ' || i;
      fetch l_rc into l_dummy;
      close l_rc;
      -- dbms_output.put_line(l_dummy);
      end loop;
      dbms_output.put_line (round((dbms_utility.get_time-l_start)/100, 2) || ' Seconds...' );
end;
/
101.71 Seconds...

... and here is the Performance Winner:
declare
   type rc is ref cursor;
   l_rc rc;
   l_dummy all_objects.object_name%type;
   l_start number default dbms_utility.get_time;
begin
   for i in 1 .. 1000
   loop
      open l_rc for
      'select object_name from all_objects where object_id = :x' using i;
      fetch l_rc into l_dummy;
   close l_rc;
   -- dbms_output.put_line(l_dummy);
   end loop;
   dbms_output.put_line (round((dbms_utility.get_time-l_start)/100, 2) || ' Seconds...' );
end;
/
1.9 Seconds...

That is pretty dramatic. The fact is that not only does this execute much faster (we spent more time PARSING our queries then actually EXECUTING them!) it will let more users use your system simultaneously.

13. Use the same convention for all your queries.
Oracle will put all your SQL or PL/SQL code in memory and will reuse statements that are the same (saving parse time). So remember that:
Select * from emp where dept = :dept_no
Is different than
Select * from EMP where dept = :dept_no

14. Tuning the WHERE Clause:

- When using AND Clauses in the WHERE Clause, put the most stringent AND Clause furthest from the WHERE.
            - When using OR Clauses in the WHERE Clause, put the most stringent OR Clause closest to the WHERE.

15. Do not use the keyword HAVING use the keyword WHERE instead

The HAVING clause filters selected rows only after all rows have been fetched. Using a WHERE clause helps reduce overheads in sorting, summing, etc. HAVING clauses should only be used when columns with summary operations applied to them are restricted by the clause.

Given Query	Alternative
SELECT d.dname, AVG (e.sal) FROM emp e, dept d WHERE e.deptno = d.deptno GROUP BY d.dname HAVING dname != 'RESEAECH' AND dname != 'SALES';	SELECT d.dname, AVG (e.sal) FROM emp e, dept d WHERE e.deptno = d.deptno AND dname != 'RESEAECH' AND dname != 'SALES' GROUP BY d.dname;

16. Avoid Multiple Sub queries where possible
Instead of this:
     Update emp set emp_cat = (select max (category) from emp_categories),
              sal_range = (select max(sal_range) from emp_categories);
Use:
     Update emp set (emp_cat, sal_range) = (Select max (category), max (sal_range) from emp_categories) ;

17. Use IN in place of OR
Least Efficient:
    Select ….
    From location
    Where loc_id = 10 or loc_id=20 or loc_id = 30

Most Efficient
    Select ….
    From location
    Where loc_id in (10,20,30)

18. Do not Commit inside a Loop
Do not use a commit or DDL statements inside a loop or cursor, because that will make the undo segments needed by the cursor unavailable.
Many applications commit more frequently than necessary, and their performance suffers as a result. In isolation a commit is not a very expensive operation, but lots of unnecessary commits can nevertheless cause severe performance problems. While a few extra commits may not be noticed, the cumulative effect of thousands of extra commits is very noticeable. Look at this test. Insert 1,000 rows into a test table -- first as a single transaction, and then committing after every row. Your mileage may vary, but these are my results, on an otherwise idle system show a performance blowout of more than 100% when committing after every row.

create table t (n number);

--BAD METHOD
declare
   start_time number;
begin
   start_time := dbms_utility.get_time;
   for i in 0..999 loop
      insert into t values (i);
      commit;
   end loop;
   dbms_output.put_line(dbms_utility.get_time - start_time || ' centiseconds');
end;
/
102 centiseconds

truncate table t;

--GOOD METHOD
declare
   start_time number;
begin
   start_time := dbms_utility.get_time;
   for i in 0..999 loop
      insert into t values (i);
   end loop;
   commit;
   dbms_output.put_line(dbms_utility.get_time - start_time || ' centiseconds');
end;
/
44 centiseconds

19. Use UNION ALL instead of UNION

The problem is that in a UNION, Oracle finds all the qualifying rows and then "deduplicates" them. To see what I mean, you can simply compare the following queries:
select * from dual
union
select * from dual;
D
---
X

select * from dual
union ALL
select * from dual;
D
---
X X

Note how the first query returns only one record and the second returns two. A UNION forces a big sort and deduplication—a removal of duplicate values. Most of the time, this is wholly unnecessary. To see how this might affect you, I'll use the data dictionary tables to run a WHERE EXISTS query using UNION and UNION ALL and compare the results with TKPROF. The results are dramatic.

First, I'll do the UNION query:

SQL> select *
  2  from dual
  3 where exists 
  4 (select null from all_objects
  5    union
  6   select null from dba_objects
  7    union
  8   select null from all_users);

call    cnt    cpu    ela   query  
----    ---   ----    ---  ------
Parse     1   0.01   0.00      0 
Execute   1   2.78   2.75 192234 
Fetch     2   0.00   0.00      3 
-----  ----   ----   ----  ------ 
total     4   2.79   2.76 192237

As you can see, that was a lot of work—more than 192,000 I/Os just to see if I should fetch that row from DUAL. Now I add a UNION ALL to the query:

SQL> select *
  2   from dual
  3  where exists 
  4 (select null from all_objects
  5    union all
  6   select null from dba_objects
  7    union all
  8   select null from all_users);

call     cnt    cpu  ela   query
------   ----  ----  ----   ----- 
Parse      1   0.00  0.00      0 
Execute    1   0.01  0.00      9 
Fetch      2   0.00  0.00      3 
------   ----  ----  ----   ----- 
total      4   0.01  0.00     12

Quite a change! What happened here was that the WHERE EXISTS stopped running the subquery when it got the first row back, and because the database did not have to bother with that deduplicate step, getting the first row back was very fast indeed.

The bottom line: If you can use UNION ALL, by all means use it over UNION to avoid a costly deduplication step—a step that is probably not even necessary most of the time.

20. Check that your application is using the existing indexes

This is a CRITICAL point. So make use of Explain Plan!!!

21. Recommendation to work with dates.

If you need to get all the data for today's date, instead of:
SELECT ImportedDate, State
FROM IssueData
WHERE TRUNC(ImportedDate ) = TRUNC(SYSDATE);

Use the following:
SELECT ImportedDate, State
FROM IssueData
WHERE ImportedDate between trunc(SYSDATE) and TRUNC(SYSDATE) + .99999;

22. Anti Joins

An anti-join is used to return rows from a table that that are present in another table. It might be used for example between DEPT and EMP to return only those rows in DEPT that didn't join to anything in EMP;

SELECT *
FROM dept
WHERE deptno NOT IN (SELECT deptno FROM EMP);

SELECT dept.*
FROM dept, emp
WHERE dept.deptno = emp.deptno (+)
AND emp.ROWID IS NULL;

SELECT *
FROM dept
WHERE NOT EXISTS (SELECT NULL FROM emp WHERE emp.deptno = dept.deptno);

23. Full Outer Joins

Normally, an outer join of table A to table B would return every record in table A, and if it had a mate in table B, that would be returned as well. Every row in table A would be output, but some rows of table B might not appear in the result set. A full outer join would return ebery row in table A, as well as every row in table B. The syntax for a full outer join is new in Oracle 9i, but it is a syntactic convenience, it is possible to produce full outer joins sets using conventional SQL.

update emp set deptno = 9 where deptno = 10;
commit;

Conventional SQL

New Syntax

SELECT empno, ename, dept.deptno, dname
FROM emp, dept
WHERE emp.deptno(+) = dept.deptno
UNION ALL
SELECT empno, ename, emp.deptno, NULL
FROM emp, dept
WHERE emp.deptno = dept.deptno(+)
AND dept.deptno IS NULL
ORDER BY 1,2,3,4;

EMPNO ENAME    DEPTNO DNAME
------ ------- ------- ----------
7369 SMITH        20 RESEARCH
7499 ALLEN        30 SALES
7521 WARD         30 SALES
7566 JONES        20 RESEARCH
7654 MARTIN       30 SALES
7698 BLAKE        30 SALES
7782 CLARK         9
7788 SCOTT        20 RESEARCH
7839 KING          9
7844 TURNER       30 SALES
7876 ADAMS        20 RESEARCH
7900 JAMES        30 SALES
7902 FORD         20 RESEARCH
7934 MILLER        9
                    10 ACCOUNTING
                    40 OPERATIONS

SELECT empno, ename,
       NVL(dept.deptno,emp.deptno) deptno, dname
FROM emp FULL OUTER JOIN dept ON
      (emp.deptno = dept.deptno)
ORDER BY 1,2,3,4;

EMPNO ENAME    DEPTNO DNAME
------ ------- ------- ----------
7369 SMITH        20 RESEARCH
7499 ALLEN        30 SALES
7521 WARD         30 SALES
7566 JONES        20 RESEARCH
7654 MARTIN       30 SALES
7698 BLAKE        30 SALES
7782 CLARK         9
7788 SCOTT        20 RESEARCH
7839 KING          9
7844 TURNER       30 SALES
7876 ADAMS        20 RESEARCH
7900 JAMES        30 SALES
7902 FORD         20 RESEARCH
7934 MILLER        9
                    10 ACCOUNTING
                    40 OPERATIONS

Explain Plan

The Explain Plan command uses a table to store information about the execution plan chosen by the optimizer. The PLAN_TABLE is created by $ORACLE_HOME/rdbs/admin/utlxplan.sql.

EXPLAIN PLAN COMMAND
Perform the following to check it:
EXPLAIN PLAN FOR
your query.

Example
EXPLAIN PLAN FOR
SELECT * FROM emp e, dept d
WHERE e.deptno = d.deptno
AND e.ename = 'SMITH';

Finally use the DBMS_XPLAN.DISPLAY function to display the execution plan:
SET LINESIZE 130
SET PAGESIZE 0
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY);

-------------------------------------------------------------------
| Id | Operation                  | Name    | Rows | Bytes | Cost |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT           |         | 1    | 57    | 3    |
| 1 | NESTED LOOPS               |         | 1    | 57    | 3    |
|* 2 | TABLE ACCESS FULL          | EMP     | 1    | 37    | 2    |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT    | 1    | 20    | 1    |
|* 4 | INDEX UNIQUE SCAN          | PK_DEPT | 1    |       |      |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("E"."ENAME"='SMITH')
4 - access("E"."DEPTNO"="D"."DEPTNO")

The DBMS_XPLAN.DISPLAY function can accept 3 parameters:
1    table_name - Name of plan table, default value 'PLAN_TABLE'.
2    statement_id - Statement id of the plan to be displayed, default value NULL.
3    format - Controls the level of detail displayed, default value 'TYPICAL'. Other values include 'BASIC', 'ALL', 'SERIAL'.

EXPLAIN PLAN SET STATEMENT_ID='TSH' FOR
SELECT *
FROM emp e, dept d
WHERE e.deptno = d.deptno
AND e.ename = 'SMITH';

SET LINESIZE 130
SET PAGESIZE 0
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','TSH','BASIC'));

---------------------------------------------
| Id | Operation                  | Name    |
---------------------------------------------
| 0 | SELECT STATEMENT           |         |
| 1 | NESTED LOOPS               |         |
| 2 | TABLE ACCESS FULL          | EMP     |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT    |
| 4 | INDEX UNIQUE SCAN          | PK_DEPT |
---------------------------------------------

Explain plan Hierarchy
Sample explain plan:

Query Plan
-----------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1234
TABLE ACCESS FULL TPAIS [:Q65001] [ANALYZED]

The rightmost uppermost operation of an explain plan is the first thing that the explain plan will execute. In this case TABLE ACCESS FULL TPAIS is the first operation. This statement means we are doing a full table scan of table TPAIS When this operation completes then the resultant row source is passed up to the next level of the query for processing. In this case it is the SELECT STATEMENT, which is the top of the query.

[CHOOSE] is an indication of the optimizer_goal for the query. This DOES NOT necessarily indicates that plan has actually used this goal. The only way to confirm this is to check the cost= part of the explain plan as well. For example the following query indicates that the CBO has been used because there is a cost in the cost field:

SELECT STATEMENT [CHOOSE] Cost=1234

However the explain plan below indicates the use of the RBO because the cost field is blank:

SELECT STATEMENT [CHOOSE] Cost=

The cost field is a comparative cost that is used internally to determine the best cost for particular plans. The costs of different statements are not really directly comparable.

[:Q65001] indicates that this particular part of the query is being executed in parallel. This number indicates that the operation will be processed by a parallel query slave as opposed to being executed serially.

[ANALYZED] indicates that the object in question has been analyzed and there are currently statistics available for the CBO to use. There is no indication of the 'level' of analysis done.

Explain Plan Operations Reference

Table Access Methods
FULL TABLE SCAN (FTS) - Read every row in the table, every block up to the high water mark. The HWM marks the last block in the table that has ever had data written to it. If you have deleted all the rows then you will still read up to the HWM. Truncate is the only way to reset the HWM back to the start of the table. Buffers from FTS operations are placed on the Least Recently Used (LRU) end of the buffer cache so will be quickly aged out. FTS is not recommended for large tables unless you are reading >5-10% of it (or so) or you intends to run in parallel.

CLUSTER - Access via an index cluster.

HASH - A hash key is issued to access one or more rows in a table with a matching hash value.

BY ROWID - Access a single row in a table by specifying its ROWID. This is the fastest method for accessing a row usually from an index.
This is the quickest access method available. Oracle simply retrieves the block specified and extracts the rows it is interested in. Access by rowid :
SQL> explain plan for select * from dept where rowid = ':x';

Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID DEPT [ANALYZED]

Table is accessed by rowid following index lookup:

SQL> explain plan for
select empno,ename from emp where empno=10;

Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP [ANALYZED]
INDEX UNIQUE SCAN EMP_I1

INDEX LOOKUP - The data is accessed by looking up key values in an index and returning rowids. A rowid uniquely identifies an individual row in a particular data block. This block is read via single block I/O. In this example an index is used to find the relevant row(s) and then the table is accessed to lookup the ename column (which is not included in the index):

explain plan for
select empno,ename from emp where empno=10;

Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP [ANALYZED]
INDEX UNIQUE SCAN EMP_I1

Note the 'TABLE ACCESS BY ROWID' section. This indicates that the table data is not being accessed via a FTS operation but rather by a rowid lookup. In this case looking up values in the index first has produced the rowid. The index is being accessed by an 'INDEX UNIQUE SCAN' operation. This is explained below. The index name in this case is EMP_I1. If all the required data resides in the index then a table lookup may be unnecessary and all you will see is an index access with no table access.
In the following example all the columns (empno) are in the index. Notice that no table access takes place:

explain plan for
select empno from emp where empno=10;

Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX UNIQUE SCAN EMP_I1

Indexes are presorted so sorting may be unnecessary if the sort order required is the same as the index. e.g.

explain plan for select empno,ename from emp
where empno > 7876 order by empno;

Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP [ANALYZED]
INDEX RANGE SCAN EMP_I1 [ANALYZED]

In this case the index is sorted so the rows will be returned in the order of the index hence a sort is unnecessary.
explain plan for
select /*+ Full(emp) */ empno,ename from emp
where empno> 7876 order by empno;

Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=9
SORT ORDER BY
TABLE ACCESS FULL EMP [ANALYZED] Cost=1 Card=2 Bytes=66

Because we have forced a FTS the data is unsorted and so we must sort the data after it has been retrieved. There are 4 methods of index lookup:
b1.index unique scan
b2.index range scan
b3.index full scan
b4.index fast full scan

b1. Index unique scan
Method for looking up a single key value via a unique index. Always returns a single value. You must supply AT LEAST the leading column of the index to access data via the index, however this may return > 1 row as the uniqueness will not be guaranteed. Example explain plan:

SQL> explain plan for
select empno,ename from emp where empno=10;

Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP [ANALYZED]
INDEX UNIQUE SCAN EMP_I1

b2.Index range scan
Method for accessing multiple column values. You must supply AT LEAST the leading column of the index to access data via the index. Can be used for range operations (e.g. > < <> >= <= between). e.g.

SQL> explain plan for select empno,ename from emp
where empno > 7876 order by empno;

Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP [ANALYZED]
INDEX RANGE SCAN EMP_I1 [ANALYZED]

A non-unique index may return multiple values for the predicate col1 = 5 and will use an index range scan

SQL> explain plan for select mgr from emp where mgr = 5;

Query plan
--------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX RANGE SCAN EMP_I2 [ANALYZED]

b3.Index Full Scan
In certain circumstances it is possible for the whole index to be scanned as opposed to a range scan (i.e. where no constraining predicates are provided for a table). Full index scans are only available in the CBO as otherwise we are
unable to determine whether a full scan would be a good idea or not. We choose an index Full Scan when we have statistics that indicate that it is going to be more efficient than a Full table scan and a sort. For example we may do a Full index scan when we do an unbounded scan of an index and want the data to be ordered in the index order. The optimizer may decide that selecting all the information from the index and not sorting is more efficient than doing a FTS or a Fast Full Index Scan and then sorting. An Index full scan will perform single block i/o's and so it may prove to be inefficient. e.g. Index BE_IX is a concatenated index on big_emp (empno,ename)

SQL> explain plan for select empno,ename from big_emp order by empno,ename;

Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=26
INDEX FULL SCAN BE_IX [ANALYZED]

b4. Index Fast Full Scan (not very used)
Scans all the block in the index. Rows are not returned in sorted order Introduced in 7.3 and requires 733_PLANS_ENABLED=TRUE and CBO may be hinted using INDEX_FFS hint uses multiblock i/o can be executed in parallel can be used to access second column of concatenated indexes. This is because we are selecting the entire index.
Note that INDEX FAST FULL SCAN is the mechanism behind fast index create and recreate. E.g.
Index BE_IX is a concatenated index on big_emp (empno,ename)

SQL> explain plan for select empno,ename from big_emp;

Query Plan
------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX FAST FULL SCAN BE_IX [ANALYZED]

Selecting the 2nd column of concatenated index:

SQL> explain plan for select ename from big_emp;

Query Plan
------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX FAST FULL SCAN BE_IX [ANALYZED]

Index Access Methods
UNIQUE SCAN - A retrieval of a single ROWID from an index.
RANGE SCAN - A retrieval of one or more ROWIDs from an index. Indexed values are scanned in ascending order.
RANGE SCAN DESCENDING - A retrieval of one or more ROWIDs from an index. Indexed values are returned in descending order.
AND-EQUAL - An operation that accepts multiple sets of ROWIDs and returns the intersection of the sets, eliminating duplicates. This operation is user for single-column indexes access path.

Join Operations Techniques

There are three kinds of join conditions: nested loops, merge joins, and hash joins. Each has specific performance implications, and each should be used in different circumstances.
- Nested loops work from one table (preferably the smaller of the two), looking up the join criteria in the larger table. It’s helpful if the join column is indexed from the larger table. Nested loops are useful when joining a smaller table to a larger table and performs very well on smaller amounts of data.
- Merge joins work by selecting the result set from each table, and then merging these two (or more) results together. Merge joins are useful when joining two relatively large tables of about the same size together, the merge join starts out with more overhead but remains rather consistent.
- Hash joins read the smaller tables into a hash table in memory so the referenced records can be quickly accessed by the hash key. Hash joins are great in data warehouse scenarios where several smaller tables (with referential integrity defined) are being referenced in the same SQL query as a single larger or very large table.
The hash join has initial overhead (of creating the hash tables) but performs rather well no matter how many rows are involved.

NESTED LOOPS - An operation that accepts two sets of rows, an outer set and an inner set. Oracle compares each row of the outer set with each row of the inner set and returns those rows that satisfy a condition. Take the example of 2 tables joined as follows:
Select *
From Table1 T1, Table2 T2
Where T1.Table1_Id = T2.Table1_id;
In the case of the Nested Loop Join, the rows will be accessed with an outer table being chosen (say Table1 in this case) and for each row in the outer table, the inner table (Table2) will be accessed with an index to retrieve the matching rows. Once all matching rows from Table2 are found, then the next row on Table1 is retrieved and the matching to Table2 is performed again
It's important that efficient index access is used on the inner table (Table2 in this example) or that the inner table be a very small table. This is critical to prevent table scans from being performed on the inner table for each row of the outer table that was retrieved. Nested Loop joins are usually used in cases where relatively few rows are being accessed in the outer table and joined to the inner table using efficient index access. It is the most common join performed by transactional (OLTP) systems
OUTER - A nested loops operation to perform an outer join statement.

SORT MERGE JOIN or MERGE JOIN or Merge Scan - An operation that accepts two sets of rows, each sorted by a specific value, combines each row from one set with the matching rows from the other. Take an example of 2 tables being joined and returning a large number of rows (say, thousands) as follows:
Select *
From Table1 T1, Table2 T2
Where T1.Table1_Id = T2.Table1_id;
The Merge Scan join will be chosen because the database has detected that a large number of rows need to be processed and it may also notice that index access to the rows are not efficient since the data is not clustered (ordered) efficiently for this join. The steps followed to perform this type of join are as follows:
1) Pick an inner and outer table
2) Access the inner table, choose the rows that match the predicates in the Where clause of the SQL statement.
3) Sort the rows retrieved from the inner table by the joining columns and store these as a Temporary table. This step may not be performed if data is ordered by the keys and efficient index access can be performed.
4) The outer table may also need to be sorted by the joining columns so that both tables to be joined are sorted in the same manner. This step is also optional and dependent on whether the outer table is already well ordered by the keys and whether efficient index access can be used.
5) Read both outer and inner tables (these may be the sorted temporary tables created in previous steps), choosing rows that match the join criteria. This operation is very quick since both tables are sorted in the same manner and Database Prefetch can be used.
6) Optionally sort the data one more time if a Sort was performed (e.g. an 'Order By' clause) using columns that are not the same as were used to perform the join.
The Merge Join can be deceivingly fast due to database multi-block fetch (helped by initialization parameter db_file_multiblock_read_count) capabilities and the fact that each table is accessed only one time each. These are only used for equi-joins. The other init.ora parm that can be tuned to help performance is sort_area_size.
OUTER - A merge join operation to perform an outer join statement.
        -ANTI - A merge anti-join.
    -SEMI - A merge semi-join.

HASH JOIN - An operation that joins two sets of rows and returns the same result.
-ANTI - A hash anti-join.
-SEMI - A hash semi-join.
The Hash Join is is a very efficient join when used in the right situation. With the hash join, one Table is chosen as the Outer table. This is the larger of the two tables in the Join - and the other is chosen as the Inner Table. Both tables are broken into sections and the inner Tables join columns are stored in memory (if hash_area_size is large enough) and 'hashed'. This hashing provides an algorithmic pointer that makes data access very efficient. Oracle attempts to keep the inner table in memory since it will be 'scanned' many times. The Outer rows that match the query predicates are then selected and for each Outer table row chosen, hashing is performed on the key and the hash value is used to quickly find the matching row in the Inner Table. This join can often outperform a Sort Merge join, particularly when 1 table is much larger than another. No sorting is performed and index access can be avoided since the hash algorithm is used to locate the block where the inner row is stored. Hash-joins are also only used for equi-joins. Other important init.ora parms are: hash_join_enabled, sort_area_size and hash_multiblock_io_count.

Operations
Operations that show up in explain plans
a. Sort
b. filter
c. view

a. Sorts
There are a number of different operations that promote sorts

Order by clauses
Group by
Sort merge join

Sorts are expensive operations especially on large tables where the rows do not fit in memory and spill to disk. By default sort blocks are placed into the buffer cache. This may result in aging out of other blocks that may be reread by other processes. To avoid this you can use the parameter:
<Parameter:SORT_DIRECT_WRITES> which does not place sort blocks into the buffer cache.

b. Filter
Has a number of different meanings used to indicate partition elimination may also indicate an actual filter step where one row source is filtering another functions such as min may introduce filter steps into query plans
In this example there are 2 filter steps. The first is effectively like a NL except that it stops when it gets something that it doesn't like (i.e. a bounded NL). This is there because of the not in. The second is filtering out the min value:

SQL> explain plan for
select * from emp
where empno not in (select min(empno) from big_emp group by empno);

Query Plan
------------------
SELECT STATEMENT [CHOOSE] Cost=1
FILTER **** This is like a bounded nested loops
TABLE ACCESS FULL EMP [ANALYZED]
FILTER **** This filter is introduced by the min
SORT GROUP BY NOSORT
INDEX FULL SCAN BE_IX

This example is also interesting in that it has a NOSORT function. The group by does not need to sort because the index row source is already pre sorted.

c. Views
When a view cannot be merged into the main query you will often see a projection view operation. This indicates that the 'view' will be selected from directly as opposed to being broken down into joins on the base tables. A number of constructs make a view non mergeable. Inline views are also non mergeable.
In the following example the select contains an inline view that cannot be merged:

SQL> explain plan for
select ename,tot
from emp,
(select empno,sum(empno) tot from big_emp group by empno) tmp
where emp.empno = tmp.empno;

Query Plan
------------------------
SELECT STATEMENT [CHOOSE]
HASH JOIN
TABLE ACCESS FULL EMP [ANALYZED]
VIEW
SORT GROUP BY
INDEX FULL SCAN BE_IX

In this case the inline view tmp that contains an aggregate function cannot be merged into the main query. The explain plan shows this as a view step

AUTOTRACE

To use the autotrace facility
As sys run the following:

@$ORACLE_HOME/sqlplus/admin/plustrce;
@$ORACLE_HOME/rdbms/admin/utlxplan;
grant plustrace to USER;

Then each user must run

@$ORACLE_HOME/rdbms/admin/utlxplan;

and then they can use the Sentence

set autotrace on

Workarounds:
select * from dba_role_privs where grantee='ATLANTIS';
drop role plustrace;
create role plustrace;
grant select on v_$sesstat to plustrace;
grant select on v_$statname to plustrace;
grant select on v_$session to plustrace;
grant plustrace to dba with admin option;
grant connect to
grant resource to
grant plustrace to

After connection to xxxx user perform select from session_roles and make sure plustrace is listed. If it isn't, issue a SET ROLE command and see if that works.

select granted_role from user_role_privs;
GRANTED_ROLE
------------
CONNECT
PLUSTRACE
RESOURCE

set autotrace on explain;

grant SELECT ANY DICTIONARY to atlantis;
SET ROLE PLUSTRACE

We can obtain the execution plan and some additional statistics on running a SQL command automatically using AUTOTRACE.
SET AUTOTRACE <OPTIONS> <EXPLAIN or STATISTICS>

Options
OFF - Disables autotracing SQL statements
ON - Enables autotracing SQL Statements
TRACEONLY - Enables auto tracing SQL Statements, and Suppresses Statement Output
EXPLAIN - Displays execution plans, but does not display statistics

To use the EXPLAIN option, we must first create the PLAN_TABLE in our schema before using the EXPLAIN by running utlxplan.sql or manually.

SET AUTOTRACE TRACEONLY STATISTICS
set autotrace traceonly explain
select * from dual;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'DUAL'

STATISTICS Displays statistics, but does not display execution plans.
To access STATISTICS data, we must have access to dynamic performance tables.
This can be granted by granting the role plustrace created in plustrce.sql script. The DBA can run the script as internal or sys and then grant the role to the users using the AUTOTRACE option.

SET AUTOTRACE TRACEONLY STATISTICS

The most important results are the db block gets, consistent gets, physical reads, redo size, sorts (memory) and sorts (disk). Details about these statiscics are HERE
NOTE: If both Explain and Statistics are omitted, execution plans and statistics will be displayed by default.

Contents

23. Full Outer Joins

Explain Plan Operations Reference

AUTOTRACE