Tips to Write Effective Queries and Explain Plan
SQL
Statement Processing
Phases
The four statement processing phases in SQL are parsing binding,
executing and fetching.
BIND: It scans the statement for bind variables and assigns a value to each variable.
EXECUTE: The Server applies the parse tree to the data buffers, performs necessary I/O and sorts for DML statements.
FETCH: Retrieves rows for a SELECT statement during the fetch phase. Each fetch retrieves multiple rows, using an array fetch.
Optimizer Method and how to
know the Driving Table.
A small "golden rule" is that your driving table should
be the one that eliminates the must rows and you should choose that one
first. But…. Where to specify the
driving Table?
Oracle processes result sets a table at a time. It starts by retrieving
all the data for the first (driving) table. Once this data is
retrieved it is used to limit the number of rows processed for
subsequent (driven) tables. In the case of multiple table joins, the
driving table limits
the rows processed for the first driven table. Once processed, this
combined set of data is the driving set for the second driven table
etc. Roughly translated into English, this means that it is best to
process tables that will retrieve a small number of rows
first.
The optimizer will do this to the best of its ability regardless of
the structure of the DML, but the following factors may help.
Both the Rule and Cost based optimizers select a driving table for each
query. If a decision cannot be made, the order of
processing is from the end of the FROM clause to the start. Therefore,
you should always place your driving table at the end of the FROM
clause.
Subsequent driven tables should be placed in order so that those
retrieving the most rows are nearer to the start of the FROM clause.
Confusingly, the WHERE
clause should be writen in the opposite order, with the driving tables
conditions first and the final driven table last. ie.
RBO (Rule Based Optimizer) chooses the driving
order by
taking the tables in the FROM clause RIGHT to LEFT.
CBO (Cost Based Optimizer)
determines join order from costs derived from gathered statistics. If
there are
no statistics, if the optimizer_mode IS COST then CBO chooses the
driving order
of tables from LEFT to RIGHT in the FROM clause, IF the optimizer_mode
IS
CHOOSE then RBO method will be used
(RIGHT to LEFT).
SELECT something
FROM detail, master
WHERE detail.key = master.key
Now, there are usually other conditions in the
WHERE clause
and the optimizer (RULE - based) will decide which table to use as a
driving
table depending on what conditions is present and what indexes is
available.
FROM detail,
master
WHERE
detail.key = master.key
AND
detail.key = 'some value'
While in this case the master table will be
driving table
FROM detail,
master
WHERE
detail.key = master.key
AND
master.key = 'some value'
So you have to look at whole statement and of course this is relevant for RULE based approach ONLY!
The case for the Cost Based Optimizer (CBO) is
that if it
has enough statistical information about the tables, it can construct
the plan
based on data, not just structure and might use another access path for
the
same statement as the RULE based approach would use.
The WHERE clause is the main decision maker about
which
indexes to use. You should always try to
use your unique indexes first, and then if that is not possible then
use a
non-unique index. For a query to use an index, one or more fields from
that
index need to be mentioned in the WHERE clause. On
concatenated indexes the index will only be used if the first
field
in the index is mentioned.On 10g that in not needed any
more!!!
The more of its
fields are mentioned in the where clause,
the better an index is used.
The TIPS to write better queries
Although two SQL statements may produce the same result,
Oracle may process one faster than the other. You can use the results
of the
EXPLAIN PLAN statement to compare the execution plans and costs of the
two
statements and determine which is more efficient. Following
are some tips that help in writing
efficient queries.
0.
Existence
of a row
Do not use ‘Select count(*)…’ to test the existence of a
row. Instead, open an explicit cursor, fetch once, and then check
cursor%NOTFOUND :
1.
Avoid the use of NULL or IS NOT NULL.
Instead of:
Select * from clients where phone_number is null;
Use:
Select * from
clients where phone_number =
0000000000000000;
2. Select the data
that you need ONLY!!!
When selecting from a table, be sure to only select the data that you
need.
For example, if you only need 1 column from a 50 column table, be sure
to do a
'select fld from table' and
only retrieve what you need. If you do a
'select * from table' you
will be
fetching ALL columns of the table which increases network traffic and
causes the system to perform unnecessary work to retrieve data that is
not being used
3. Wich table first?
If you specify 2 tables in the FROM clause of a SELECT
statement, the parser will process the tables depending on the
Optimizer Method
that you are using (see below for more details). Basically:
-RBO chooses the driving order by
taking the tables in the FROM clause RIGHT to LEFT.
-CBO determines join order from
costs derived from gathered statistics. If there are no statistics then
CBO
chooses the driving order of tables from LEFT to RIGHT in the FROM
clause. This is OPOSITE to the RBO.
Both the Rule and Cost based optimizers select a driving table for each
query. If a decision cannot be made, the order of
processing is from the end of the FROM clause to the start. Therefore,
you should always place your driving table at the end of the FROM
clause.
Subsequent driven tables should be placed in order so that those
retrieving the most rows are nearer to the start of the FROM clause.
Confusingly, the WHERE
clause should be writen in the opposite order, with the driving tables
conditions first and the final driven table last.
4. If three tables are being joined, select the intersection tables as
the
driving table.
The intersection
table is the table that has many tables dependent on it.
5. Always use
table alias and prefix
The parse phase for statements can be decreased by efficient use of
aliasing. This helps the speed of parsing the statements in two ways:
6. IN and EXISTS
7.
Use Joins in place of EXISTS.
SELECT *
FROM emp e
WHERE
EXISTS
(SELECT d.deptno
FROM
dept d
WHERE
e.deptno = d.deptno
AND d.dname =
'RESEARCH');
SELECT *
FROM emp e, dept d
WHERE e.deptno = d.deptno
AND d.dname
= ‘RESEARCH’;
8.
EXISTS in place
of DISTINCT.
SELECT
DISTINCT d.deptno, d.dname ,
FROM dept d, emp e
WHERE d.deptno
= e.deptno;
SELECT d.deptno , d.dname
FROM dept d
WHERE
EXISTS
(SELECT 'X'
FROM emp e
WHERE d.deptno = e.deptno);
9.
Math Expressions.
The optimizer fully evaluates expressions whenever possible
and translates certain syntactic constructs into equivalent constructs.
This is
done either because Oracle can more quickly evaluate the resulting
expression
than the original expression or because the original expression is
merely a syntactic
equivalent of the resulting expression.
Any computation of constants is performed only once when the
statement is optimized rather than each time the statement is executed.
Consider these conditions that test for monthly salaries greater than
2000:
sal > 24000/12
sal > 2000
sal*12 > 24000
If a SQL statement
contains the first condition, the optimizer simplifies it into the
second
condition.
Note that the
optimizer does not simplify expressions across comparison operators.
The optimizer
does not simplify the third expression into the second.
For this reason, application developers
should write conditions that compare columns with constants whenever
possible,
rather than conditions with expressions involving columns.
The Optimizer
does
not use index for the following statement.
SELECT
*
FROM emp
WHERE sal*12
> 24000;
Instead use
the
following statement.
SELECT
*
FROM emp
WHERE sal
> 24000/12;
10. Never use NOT in an
indexed column. Whenever Oracle encounters a NOT in an index
column, it will perform full-table scan.
SELECT *
FROM emp
WHERE NOT
deptno = 0;
SELECT
*
FROM emp
WHERE deptno
> 0;
11. Never use a function / calculation on an indexed column (unless you are SURE that you are using an Index Function Based new in Oracle 8i). If there is any function is used on an index column, optimizer will not use index. Use some other alternative. If you don’t have another choice, keep functions on the right hand side of the equal sign. The Concatenate || symbol will also disable indexes. Examples:
/**
Do not use
**/
SELECT * FROM
emp WHERE SUBSTR (ENAME, 1,3) = ‘MIL’;
/** Suggested Alternative
**/
Note: Optimizer
uses the index only when optimizer_goal is set to FIRST_ROWS.
SELECT *
FROM emp WHERE ENAME LIKE 'MIL%’;
/**
Do not use
**/
SELECT *
FROM emp WHERE sal! = 0;
Note: Index can
tell you what is there in a table but not what is not in a table.
Note: Optimizer
uses the index only when optimizer_goal = FIRST_ROWS.
/** Suggested
Alternative **/
SELECT *
FROM emp WHERE sal > 0;
/**
Do not use
**/
SELECT *
FROM emp WHERE ename || job =
‘MILLERCLERK’;
Note: || is the
concatenate function. Like other
functions it disables index.
/** Suggested
Alternative **/
Note: Optimizer
uses the index only when optimizer_goal=FIRST_ROWS.
SELECT *
FROM emp
WHERE ename
= 'MILLER'
AND job
= ‘CLERK’;
12. Whenever possible try to use
bind variables
create or replace procedure
dsal(p_empno in number) as
begin
execute immediate 'update
emp set sal = sal*2 where empno =
'||p_empno;
commit;
end;
/
create or replace procedure
dsal(p_empno in number) as
begin
execute immediate 'update
emp set sal=sal*2 where empno =
:x' using p_empno;
commit;
end;
/
Just to give you a tiny idea of how huge of a difference this can make
performance wise, you only need to run a very small test:
SQL> alter system flush shared_pool;
SQL> set serveroutput on;
declare
type rc is ref
cursor;
l_rc rc;
l_dummy
all_objects.object_name%type;
l_start number
default dbms_utility.get_time;
begin
for i in 1 .. 1000
loop
open l_rc for
'select object_name from all_objects
where object_id = ' || i;
fetch l_rc into l_dummy;
close l_rc;
--
dbms_output.put_line(l_dummy);
dbms_output.put_line (round((dbms_utility.get_time-l_start)/100, 2) ||
' Seconds...' );
end;
/
101.71 Seconds...
... and here is the Performance Winner:
declare
type rc is ref
cursor;
begin
end;
/
1.9 Seconds...
13.
Use the same convention for all your queries.
Oracle will put all your SQL or PL/SQL code in memory and will
reuse statements that are the same (saving parse time). So remember that:
Select * from emp
where dept = :dept_no
Is different than
Select * from EMP
where dept = :dept_no
14.
Tuning the WHERE Clause:
- When using AND
Clauses in the
WHERE Clause, put the most stringent AND Clause furthest from the WHERE.
- When using
OR Clauses in the
WHERE Clause, put the most stringent OR Clause closest to the WHERE.
The HAVING clause filters selected rows only after all rows have been fetched. Using a WHERE clause helps reduce overheads in sorting, summing, etc. HAVING clauses should only be used when columns with summary operations applied to them are restricted by the clause.
Given Query | Alternative |
SELECT d.dname, AVG
(e.sal) |
SELECT d.dname, AVG
(e.sal) |
Instead of this:
Update emp set
emp_cat =
(select max (category) from
emp_categories),
sal_range = (select max(sal_range) from
emp_categories);
Use:
Update emp set (emp_cat,
sal_range) = (Select max
(category), max (sal_range) from emp_categories) ;
Least Efficient:
Select ….
From location
Where loc_id =
10 or loc_id=20 or loc_id = 30
Most Efficient
Select ….
From location
Where loc_id
in (10,20,30)
18. Do not Commit inside a
Do not use a commit or DDL statements inside a loop or cursor, because
that will make the undo segments needed by the cursor unavailable.
Many applications commit more frequently than necessary, and their
performance suffers as a result. In isolation a commit is not a very
expensive operation, but lots of unnecessary commits can nevertheless
cause severe performance problems. While a few extra commits may not be
noticed, the cumulative effect of thousands of extra commits is very
noticeable. Look at this test. Insert 1,000 rows into a test table --
first as a single transaction, and then committing after every row.
Your mileage may vary, but these are my results, on an otherwise idle
system show a performance blowout of more than 100% when committing
after every row.
create table
t (n number);
--BAD METHOD
declare
start_time number;
begin
start_time := dbms_utility.get_time;
for i in 0..999 loop
insert into t values (i);
commit;
end loop;
dbms_output.put_line(dbms_utility.get_time - start_time ||
' centiseconds');
end;
/
102 centiseconds
truncate
table t;
--GOOD METHOD
declare
start_time number;
begin
start_time := dbms_utility.get_time;
for i in 0..999 loop
insert into t values (i);
end loop;
commit;
dbms_output.put_line(dbms_utility.get_time - start_time ||
' centiseconds');
end;
/
44 centiseconds
19. Use UNION ALL instead of UNION
The problem is that in a UNION, Oracle finds all the qualifying rows and then
"deduplicates" them. To see what I mean, you can simply compare the
following queries:
select
* from
dual
union
select * from dual;
D
---
X
select * from dual
union ALL
select * from dual;
D
---
X
X
Note how the first query returns only one record and the second returns two. A UNION forces a big sort and deduplication—a removal of duplicate values. Most of the time, this is wholly unnecessary. To see how this might affect you, I'll use the data dictionary tables to run a WHERE EXISTS query using UNION and UNION ALL and compare the results with TKPROF. The results are dramatic.
First, I'll do the UNION query:
SQL> select *
2 from dual
3 where exists
4 (select null from all_objects
5 union
6 select null from dba_objects
7 union
8 select null from all_users);
call cnt cpu ela query
---- --- ---- --- ------
Parse 1 0.01 0.00 0
Execute 1 2.78 2.75 192234
Fetch 2 0.00 0.00 3
----- ---- ---- ---- ------
total 4 2.79 2.76 192237
As you can see, that was a lot of work—more than 192,000 I/Os just to see if I should fetch that row from DUAL. Now I add a UNION ALL to the query:
SQL> select *
2 from dual
3 where exists
4 (select null from all_objects
5 union all
6 select null from dba_objects
7 union all
8 select null from all_users);
call cnt cpu ela query
------ ---- ---- ---- -----
Parse 1 0.00 0.00 0
Execute 1 0.01 0.00 9
Fetch 2 0.00 0.00 3
------ ---- ---- ---- -----
total 4 0.01 0.00 12
Quite a change! What happened here was that the WHERE EXISTS stopped running the subquery when it got the first row back, and because the database did not have to bother with that deduplicate step, getting the first row back was very fast indeed.
The bottom line: If you can use UNION
ALL, by all means use it over UNION to avoid a costly deduplication step—a step that is
probably not even necessary most of the time.
20. Check
that your application is using the existing indexes
This is a CRITICAL point. So make use of
Explain Plan!!!
SELECT ImportedDate, State
FROM IssueData
WHERE
TRUNC(ImportedDate ) = TRUNC(SYSDATE);
SELECT ImportedDate, State
FROM IssueData
WHERE
ImportedDate between trunc(SYSDATE) and TRUNC(SYSDATE) + .99999;
SELECT *
FROM dept
WHERE deptno NOT IN (SELECT deptno FROM EMP);SELECT dept.*
FROM dept, emp
WHERE dept.deptno = emp.deptno (+)
AND emp.ROWID IS NULL;SELECT *
FROM dept
WHERE NOT EXISTS (SELECT NULL FROM emp WHERE emp.deptno = dept.deptno);
Normally, an outer join of table A to table B would return every record in table A, and if it had a mate in table B, that would be returned as well. Every row in table A would be output, but some rows of table B might not appear in the result set. A full outer join would return ebery row in table A, as well as every row in table B. The syntax for a full outer join is new in Oracle 9i, but it is a syntactic convenience, it is possible to produce full outer joins sets using conventional SQL.
update
emp set deptno = 9 where deptno = 10;
commit;
Conventional SQL | New Syntax |
SELECT empno, ename, dept.deptno, dname EMPNO ENAME DEPTNO DNAME |
SELECT empno, ename, |
The Explain Plan command uses a table to store information about the execution plan chosen by the optimizer. The PLAN_TABLE is created by $ORACLE_HOME/rdbs/admin/utlxplan.sql.
Perform the following to check it:
EXPLAIN PLAN FOR
your query.
Example
EXPLAIN PLAN FOR
SELECT
* FROM emp e, dept d
WHERE e.deptno = d.deptno
AND e.ename = 'SMITH';
Finally use the DBMS_XPLAN.DISPLAY function to display the execution
plan:
SET LINESIZE 130
SET PAGESIZE 0
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY);
-------------------------------------------------------------------
| Id |
Operation
| Name | Rows | Bytes | Cost |
-------------------------------------------------------------------
| 0 | SELECT
STATEMENT
| | 1
| 57 | 3 |
| 1 | NESTED
LOOPS
| | 1
| 57 | 3 |
|* 2 | TABLE ACCESS
FULL |
EMP | 1 |
37 | 2 |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT |
1 | 20 | 1 |
|* 4 | INDEX UNIQUE
SCAN | PK_DEPT |
1 |
| |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("E"."ENAME"='SMITH')
4 - access("E"."DEPTNO"="D"."DEPTNO")
The DBMS_XPLAN.DISPLAY function can accept 3 parameters:
1 table_name - Name of plan table, default value
'PLAN_TABLE'.
2 statement_id - Statement id of the plan to be
displayed, default value NULL.
3 format - Controls the level of detail displayed,
default value 'TYPICAL'. Other values include 'BASIC', 'ALL', 'SERIAL'.
EXPLAIN
PLAN SET STATEMENT_ID='TSH' FOR
SELECT *
FROM emp e, dept d
WHERE e.deptno = d.deptno
AND e.ename = 'SMITH';
SET LINESIZE 130
SET PAGESIZE 0
SELECT *
FROM
TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','TSH','BASIC'));
---------------------------------------------
| Id |
Operation
| Name |
---------------------------------------------
| 0 | SELECT
STATEMENT
| |
| 1 | NESTED
LOOPS
| |
| 2 | TABLE ACCESS
FULL |
EMP |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT |
| 4 | INDEX UNIQUE
SCAN | PK_DEPT |
---------------------------------------------
Query Plan
-----------------------------------------
SELECT STATEMENT [CHOOSE]
Cost=1234
TABLE ACCESS FULL TPAIS [:Q65001]
[ANALYZED]
The rightmost uppermost operation of an explain plan is the first thing that the explain plan will execute. In this case TABLE ACCESS FULL TPAIS is the first operation. This statement means we are doing a full table scan of table TPAIS When this operation completes then the resultant row source is passed up to the next level of the query for processing. In this case it is the SELECT STATEMENT, which is the top of the query.
[CHOOSE] is an indication of the optimizer_goal for the query. This DOES NOT necessarily indicates that plan has actually used this goal. The only way to confirm this is to check the cost= part of the explain plan as well. For example the following query indicates that the CBO has been used because there is a cost in the cost field:
SELECT STATEMENT [CHOOSE] Cost=1234
However the explain plan below indicates the use of the RBO because the cost field is blank:
SELECT STATEMENT [CHOOSE] Cost=
The cost field is a comparative cost that is used internally to determine the best cost for particular plans. The costs of different statements are not really directly comparable.
[:Q65001] indicates that this particular part of
the query is being executed in parallel. This number indicates that the
operation will be processed by a parallel query slave as opposed to
being executed serially.
[ANALYZED] indicates that the object in question has been analyzed and there are currently statistics available for the CBO to use. There is no indication of the 'level' of analysis done.
FULL TABLE SCAN (FTS) -
Read every row in the table, every block up
to the high water mark. The HWM marks the last block in the table that
has ever had data
written
to it. If you have deleted all the rows then you will still read up to
the HWM.
Truncate is the only way to reset the HWM back to the start of the
table. Buffers from FTS operations are placed on the
Least
Recently Used (LRU) end of the buffer cache so will be quickly aged
out. FTS is
not recommended for large tables unless you are reading >5-10% of it
(or so)
or you intends to run in parallel.
CLUSTER - Access via an index cluster.
HASH - A hash key is issued to access one or more
rows in a table with a matching hash value.
BY ROWID - Access a single row in a table
by
specifying its ROWID. This is the fastest method for accessing a row
usually
from an index.
This is the quickest access method available. Oracle simply
retrieves
the block specified and extracts the rows it is interested in. Access
by rowid
:
SQL>
explain plan for select * from dept where rowid =
':x';
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID DEPT [ANALYZED]
Table is accessed by rowid following index lookup:
SQL> explain plan for
select empno,ename from emp where
empno=10;
INDEX LOOKUP - The
data is accessed by looking up key values in an
index and
returning rowids. A rowid uniquely identifies an individual row in a
particular
data block. This block is read via single block I/O. In this example an
index
is used to find the relevant row(s) and then the table is accessed to
lookup
the ename column (which is not included in the index):
explain plan for
select empno,ename from emp where
empno=10;
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP
[ANALYZED]
INDEX UNIQUE SCAN EMP_I1
Note the 'TABLE ACCESS BY ROWID' section. This
indicates
that the table data is not being accessed via a FTS operation but
rather by a
rowid lookup. In this case looking up values in the index first has
produced
the rowid. The index is being accessed by an 'INDEX UNIQUE SCAN'
operation. This
is
explained below. The index name in this case is EMP_I1. If all the
required
data resides in the index then a table lookup may be unnecessary and
all you
will see is an index access with no table access.
In the following example all the columns (empno) are in the index.
Notice that
no table access takes place:
explain
plan for
select empno from emp where empno=10;
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX UNIQUE SCAN EMP_I1
Indexes are presorted so sorting may be unnecessary if the sort order required is the same as the index. e.g.
explain plan for select
empno,ename from emp
where empno > 7876 order by empno;
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP
[ANALYZED]
In this case the index is sorted so the rows will
be
returned in the order of the index hence a sort is unnecessary.
explain plan for
select /*+ Full(emp) */
empno,ename from emp
where empno> 7876 order by
empno;
Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=9
SORT ORDER BY
TABLE ACCESS FULL EMP [ANALYZED] Cost=1 Card=2 Bytes=66
b1. Index unique scan
Method for looking up a single key value via a unique index.
Always
returns a single value. You must supply AT LEAST the leading column of
the
index to access data via the index, however this may return > 1 row
as the
uniqueness will not be guaranteed. Example explain plan:
SQL> explain plan for
select empno,ename from emp where
empno=10;
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP [ANALYZED]
INDEX UNIQUE SCAN EMP_I1
b2.Index range scan
Method for accessing multiple column values. You must supply AT LEAST
the
leading column of the index to access data via the index. Can be used
for range
operations (e.g. > < <> >= <= between). e.g.
SQL> explain plan for select
empno,ename from emp
where empno > 7876 order by
empno;
Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP [ANALYZED]
A non-unique index may return multiple values for the predicate col1 = 5 and will use an index range scan
SQL> explain plan for select mgr from emp where mgr = 5;
Query plan
--------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX RANGE SCAN EMP_I2 [ANALYZED]
b3.Index Full Scan
In certain circumstances it is possible for the whole index to
be
scanned as opposed to a range scan (i.e. where no constraining
predicates
are provided for a table). Full index scans are only available in the
CBO as
otherwise we are
unable to determine whether a full scan would be a good idea or not. We
choose
an index Full Scan when we have statistics that indicate that it is
going to be
more efficient than a Full table scan and a sort. For example we may do
a Full
index scan when we do an unbounded scan of an index and want the data
to be
ordered in the index order. The optimizer may decide that selecting all
the
information from the index and not sorting is more efficient than doing
a FTS
or a Fast Full Index Scan and then sorting. An Index full scan
will perform single block i/o's and so it
may prove to be inefficient. e.g. Index BE_IX is a concatenated index
on big_emp (empno,ename)
SQL> explain plan for select empno,ename from big_emp order by empno,ename;
Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=26
INDEX FULL SCAN BE_IX [ANALYZED]
b4. Index Fast Full Scan (not very used)
Scans all the block in the index. Rows are not returned in
sorted order
Introduced in 7.3 and requires 733_PLANS_ENABLED=TRUE and CBO may be
hinted
using INDEX_FFS hint uses multiblock i/o can be executed in parallel
can be
used to access second column of concatenated indexes. This is because
we are
selecting the entire index.
Note that INDEX FAST FULL SCAN is the mechanism behind fast index
create and
recreate. E.g.
Index BE_IX is a concatenated index on big_emp (empno,ename)
SQL> explain plan for select empno,ename from big_emp;
Query Plan
------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX FAST FULL SCAN BE_IX [ANALYZED]
Selecting the 2nd column of concatenated index:
SQL> explain plan for select ename from big_emp;
Query Plan
Index Access
Methods
UNIQUE SCAN - A retrieval of a single ROWID from an
index.
RANGE SCAN - A retrieval of one or more ROWIDs from
an index. Indexed values are scanned in ascending order.
RANGE SCAN DESCENDING - A retrieval of one or more
ROWIDs from an index. Indexed values are returned in descending order.
AND-EQUAL - An operation that accepts multiple sets
of ROWIDs and returns the intersection of the sets, eliminating
duplicates. This operation is user for
single-column indexes access path.
There are three kinds of join conditions: nested
loops, merge joins, and hash joins. Each has specific performance
implications, and each should be used in different circumstances.
- Nested loops work from one table (preferably the smaller of the two),
looking up the join criteria in the larger table. It’s helpful if the
join column is indexed from the larger table. Nested loops are useful
when joining a smaller table to a larger table and performs very well
on smaller amounts of data.
- Merge joins work by selecting the result set from each table,
and then merging these two (or more) results together. Merge
joins are useful when joining two relatively large tables of about the
same size together, the merge join starts out with more overhead but
remains rather consistent.
- Hash joins read the smaller tables into a hash table in memory so the
referenced records can be quickly accessed by the hash key. Hash joins
are great in data warehouse scenarios where several smaller tables
(with referential integrity defined) are being referenced in the same
SQL query as a single larger or very large table.
The hash join has initial overhead (of creating the hash tables) but
performs rather well no matter how many rows are involved.
NESTED LOOPS - An operation that accepts
two sets of
rows, an outer set and an inner set. Oracle compares each row of the
outer set
with each row of the inner set and returns those rows that satisfy a
condition. Take the example of 2 tables joined as follows:
Select *
From Table1 T1, Table2 T2
Where T1.Table1_Id = T2.Table1_id;
In the case of the Nested Loop Join, the rows will be accessed with an
outer table being chosen (say Table1 in this case) and for each row in
the outer table, the inner table (Table2) will be accessed with an
index to retrieve the matching rows. Once all matching rows from Table2
are found, then the next row on Table1 is retrieved and the matching to
Table2 is performed again
It's important that efficient index access is used on the inner table
(Table2 in this example) or that the inner table be a very small table.
This is critical to prevent table scans from being performed on the
inner table for each row of the outer table that was retrieved. Nested
Loop joins are usually used in cases where relatively few rows are
being accessed in the outer table and joined to the inner table using
efficient index access. It is the most common join performed by
transactional (OLTP) systems
OUTER - A nested loops operation to perform
an outer
join statement.
SORT MERGE JOIN or MERGE JOIN or Merge Scan
- An operation that accepts two sets of
rows, each sorted by a specific value, combines each row from one set
with the
matching rows from the other. Take an example of 2
tables being joined and returning a large number of rows (say,
thousands) as follows:
Select *
From Table1 T1, Table2 T2
Where T1.Table1_Id = T2.Table1_id;
The Merge Scan join will be chosen because the database has detected
that a large number of rows need to be processed and it may also notice
that index access to the rows are not efficient since the data is not
clustered (ordered) efficiently for this join. The steps followed to
perform this type of join are as follows:
1) Pick an inner and outer table
2) Access the inner table, choose the rows that match the predicates in
the Where clause of the SQL statement.
3) Sort the rows retrieved from the inner table by the joining columns
and store these as a Temporary table. This step may not be performed if
data is ordered by the keys and efficient index access can be
performed.
4) The outer table may also need to be sorted by the joining columns so
that both tables to be joined are sorted in the same manner. This step
is also optional and dependent on whether the outer table is already
well ordered by the keys and whether efficient index access can be
used.
5) Read both outer and inner tables (these may be the sorted temporary
tables created in previous steps), choosing rows that match the join
criteria. This operation is very quick since both tables are sorted in
the same manner and Database Prefetch can be used.
6) Optionally sort the data one more time if a Sort was performed (e.g.
an 'Order By' clause) using columns that are not the same as were used
to perform the join.
The Merge Join can be deceivingly fast due to database multi-block
fetch (helped by initialization parameter
db_file_multiblock_read_count) capabilities and the fact that each
table is accessed only one time each. These are only used for
equi-joins. The other init.ora parm that can be tuned to help
performance is sort_area_size.
OUTER - A merge join operation to perform an outer
join statement.
-ANTI - A merge anti-join.
-SEMI - A merge semi-join.
HASH JOIN - An operation that joins
two sets of rows
and returns the same result.
-ANTI - A hash anti-join.
-SEMI - A hash semi-join.
The Hash Join is is a very efficient join when used in the right
situation. With the hash join, one Table is chosen as the Outer table.
This is the larger of the two tables in the Join - and the other is
chosen as the Inner Table. Both tables are broken into sections and the
inner Tables join columns are stored in memory (if hash_area_size is
large enough) and 'hashed'. This hashing provides an algorithmic
pointer that makes data access very efficient. Oracle attempts to keep
the inner table in memory since it will be 'scanned' many times. The
Outer rows that match the query predicates are then selected and for
each Outer table row chosen, hashing is performed on the key and the
hash value is used to quickly find the matching row in the Inner Table.
This join can often outperform a Sort Merge join, particularly when 1
table is much larger than another. No sorting is performed and index
access can be avoided since the hash algorithm is used to locate the
block where the inner row is stored. Hash-joins are also only used for
equi-joins. Other important init.ora parms are: hash_join_enabled,
sort_area_size and hash_multiblock_io_count.
a. Sorts
There are a number of different operations that promote sorts
Sorts are expensive operations especially on large
tables
where the rows do not fit in memory and spill to disk. By default sort
blocks
are placed into the buffer cache. This may result in aging out of other
blocks
that may be reread by other processes. To avoid this you can use the
parameter:
<Parameter:SORT_DIRECT_WRITES> which does not place sort blocks
into the
buffer cache.
b. Filter
Has a number of different meanings used to indicate partition
elimination may also indicate an actual filter step where one row
source is
filtering another functions such as min may introduce filter steps into
query
plans
In this example there are 2 filter steps. The first is effectively like
a NL
except that it stops when it gets something that it doesn't like (i.e.
a
bounded NL). This is there because of the not in. The second is
filtering out
the min value:
SQL> explain plan for
select * from emp
where empno not in (select
min(empno) from big_emp group by empno);
Query Plan
------------------
SELECT STATEMENT [CHOOSE] Cost=1
FILTER **** This is like a bounded nested loops
TABLE ACCESS FULL EMP [ANALYZED]
FILTER **** This filter is introduced by the min
SORT GROUP BY NOSORT
INDEX FULL SCAN BE_IX
This example is also interesting in that it has a
NOSORT
function. The group by does not need to sort because the index row
source is
already pre sorted.
c. Views
When a view cannot be merged into the main query you will often see a
projection view operation. This indicates that the 'view' will be
selected from
directly as opposed to being broken down into joins on the base tables.
A
number of constructs make a view non mergeable. Inline views are also
non
mergeable.
In the following example the select contains an inline view that cannot
be
merged:
SQL> explain plan for
select ename,tot
from emp,
(select empno,sum(empno) tot from
big_emp group by empno) tmp
where emp.empno = tmp.empno;
Query Plan
------------------------
SELECT STATEMENT [CHOOSE]
HASH JOIN
TABLE ACCESS FULL EMP [ANALYZED]
VIEW
SORT GROUP BY
INDEX FULL SCAN BE_IX
In this case the inline view tmp that contains an aggregate function cannot be merged into the main query. The explain plan shows this as a view step
To use
the autotrace facility
As sys run the following:
To use the EXPLAIN option, we must first create the PLAN_TABLE in our schema before using the EXPLAIN by running utlxplan.sql or manually.
SET AUTOTRACE
TRACEONLY STATISTICS
set autotrace traceonly explain
select * from dual;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'DUAL'
STATISTICS Displays statistics, but does not
display
execution plans.
To access STATISTICS data, we must have access to dynamic
performance tables.
This can be granted by granting the role plustrace created
in plustrce.sql script. The DBA can run the script as internal or sys
and then
grant the role to the users using the AUTOTRACE option.
SET
AUTOTRACE TRACEONLY STATISTICS
The most important results are the db block gets, consistent
gets, physical reads, redo size, sorts (memory) and sorts (disk).
Details about these statiscics are HERE
NOTE: If both Explain and Statistics are omitted, execution
plans and statistics will be displayed by default.