Working with CBO and the DBMS_STATS Package

NOTE: With Italic are the most important values


The CBO makes its explain-plan decisions based on statistics.  The CBO tries to create an explain-plan that will result in the lowest amount of physical I/O.
Statistics provide critical input in order for CBO to work properly; these are generated for data storing objects and include information such as the number of rows in a table, distinct values in a column, number of leaf blocks in an index, etc. The more accurate the statistics, the more efficient the results provided by Optimizer.
Statistics may be exact or estimated. Statistics generated with a COMPUTE clause analyzes all of the data in the object. This gives the optimizer accurate information to work on and arrive at a good execution plan.
Statistics generated with an ESTIMATE clause analyzes data in the object to the extent of sample size mentioned. Sample size may be specified as number of rows or percentage of rows that should be randomly analyzed to generate the statistics. Optionally block sampling may also be specified. This saves on time if there are many huge tables in the system. The guarantee of good execution plans will depend on how close the estimated value is to the exact values.
The CBO doesn’t use the rules of the RBO but does use the object order in the FROM clause and/or the object_id to break ties. The CBO can consider a maximum of explain plan variations up to the init.ora setting “OPTIMIZER_MAX_PERMUTATIONS”.
Statistics are stored in a data dictionary in tables owned by SYS user. The following views display the statistics collected for tables, columns and indexes.

For Tables

- Number of rows
- Number of blocks below HW (used or not)
- Number of empty blocks
- Average free space (in bytes) in blocks allocated to the table. All empty and free blocks are considered for this
- Number of chained rows (used by ANALYZE command only)
- Average row length in bytes
LAST_ANALYZED - Date when the table was last analyzed
- Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE
- For partitioned tables, YES - statistics collected as a whole, NO - statistics are estimated from statistics
- Set to YES if user has explicitly set the statistics for the table

Statistics for individual partitions of a table can be seen from DBA_TAB_PARTITIONS. Cluster statistics is available from DBA_CLUSTERS.

For Columns

- Number of distinct values
- Lowest value
- Highest value
- Density of the column
- Number of records with null value for the concerned column
- Number of buckets in histograms. Refer Histograms section
- Sample size provided for ESTIMATE statistics. Equal to total rows if COMPUTE
- Date when the table was last analyzed
DBA_TAB_COL_STATISTICS shows similar data. Partitioned Table column statistics can be viewed from DBA_PART_COL_STATISTICS and DBA_SUBPART_COL_STATISTICS

For Indexes

- Depth of the index, from root to leaf
- Number of leaf blocks
- Number of distinct index values
- Average number of leaf blocks in which each distinct key appears, should be 1 for unique indexes
- Average number of blocks in the table that are pointed to by a distinct key
- A count that determines the ordering of the index. Index is ordered if count is closer to the number of blocks, i.e entries in single leaf tend to point to rows in same blocks in the table. Index is randomly ordered if closer to the number of rows, i.e., entries in single leaf are pointing to rows spread across multiple blocks
- Number of rows indexed
- Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE
- Date when the table was last analyzed
- For partitioned indexes, YES - statistics collected as a whole, NO - statistics are estimated from statistics
- Set to YES if user has explicitly set the statistics for the index
PCT_DIRECT_ACCESS - For secondary indexes on IOTs, percentage of rows with valid guess

Statistics for individual partitions of indexes can be seen from DBA_IND_PARTITIONS and DBA_IND_SUBPARTITIONS.
Dictionary tables related to Histogram information are discussed later.

Available CBO Modes

CBO has two available modes in which to run, ALL_ROWS(default) and FIRST_ROWS.
FIRST_ROWS aims at returning the first row(s) of the statement as soon as possible. This mode tells optimizer to give response time prime importance. It prefers nested-loop joins. FIRST_ROWS uses cost as well as some thumb rules to process the first set of rows. Examples of thumb rules - Plans using indexes are preferred over plans having full table scans as access path, ORDER BY clause can induce index access, etc. As of release 9i, the number of rows to be returned in the first hit can also be mentioned in the parameter, FIRST_ROWS_n (n could be 1, 10, 100 or 1000). This could be set as per the application requirements.

ALL_ROWS processes all rows for a given query before returning the output. It forces optimizer to consider minimal use of resources and best throughput. ALL_ROWS prefers sort-merge joins.

For an OLTP system, FIRST_ROWS would be the ideal option for fast response time. ALL_ROWS is meant for batch-oriented applications. Note, a plan that produces the first n rows with the fastest response time might not be an optimal plan if requirement is to obtain the entire result, so decide as per the need of the application

Basic CBO Terms

The following terms will be used quite often when analyzing statements in CBO.

The COST computed in CBO is a unit of expense involved with each operation. The logic as to how the cost is actually derived is not documented or made external. Moreover, this may change across releases.

The number of rows in the table or number of distinct row links in the index. The cardinality of a query is the number of rows that is expected to be returned by it.

The number of distinct values. The distinct values of a column being indexed are known as its selectivity. For example, if a table has 10000 rows and an index is created on a column having 4000 distinct values, then the selectivity of the index is (4000/10000) * 100 = 40%. Unique index on not null columns have a selectivity of 100%.

It is a process of generating additional predicates for a query by CBO. This enables optimizer to consider additional execution paths. For example if predicates are provided in query of the type A=B and B=C, the optimizer may add an additional predicate that indicates A=C.

Much required information gathered for various data holding objects. This information is vital for the CBO to decide on execution plans.

Join Methods
Oracle uses joins like Hash, sort-merge and nested loops. A query may run faster using one type of join as compared to other methods. This should be evaluated for individual queries.

FTS or Full Table Scan relates to a query sequentially scanning a table from the first block to the last allocated block. This could be very expensive for big tables and should be avoided.

Index scan
Relates to random access of a table by use of one or more indexes on the table.

Initialization parameters that affect the CBO
Apart from generating statistics, the setup of the below mentioned parameters play a very crucial role in dictating how your system will work. Setting these will depend a lot on what kind of environment you want to create, Online, Batch processing, Data warehousing or a combination of more than one.

This will decide the mode in which the optimizer engine should run in. Valid values are RULE, CHOOSE, ALL_ROWS, FIRST_ROWS (_n).
Optionally, the OPTIMIZER_MODE can be set to CHOOSE. This is kind of an intermediate option between RBO and CBO. In fact, it tries to run the query in either CBO or RBO depending on the availability or unavailability of statistics. Therefore, if the tables present in the query have statistics generated on them, CBO (ALL_ROWS only) is preferred or RBO is taken up.
e.g.: optimizer_mode = first_rows

This is set to a version number such as- 8.1.5, 8.1.7, 9.0.0. Since new features and functionality are being added to CBO in every release, its behavior may change and result in different execution plans. You can set this to a version number for which your application is tuned. Please note setting it to a lower version will prevent the use of new features that have come in later versions.
e.g.: optimizer_features_enable = 8.1.7

This parameter specifies the maximum number of permutations that should be considered for queries with joins, to choose an execution plan. This will influence the parse time of queries. This parameter should be set to a lower value. Make sure the other parameters mentioned in this section are set properly so that the optimizer finds an optimal execution plan within the specified limits. It defaults to 80000 in Oracle 8, which means no limits! In Oracle 9i it is defaulted to 2000.
e.g.: optimizer_max_permutations = 2000

Optimizer_Index_Cost_Ad takes a value between 1 and 10000 (default 100). Setting this value makes the index more or less friendly to the optimizer. For example setting it to 50 tells the optimizer that an execution plan that uses index(es) is half as expensive as other execution plans that are not using them. The lower the value (less than 100), the less full table scan executions will take place in the system.
Use of indexes in OLTP is a crucial factor for deciding the response time, but sometimes, full table scans provide better throughput. Nested loop joins are influenced by this parameter value. Set this parameter to a value that is optimal for your application. I recommend set this parameter between 10 - 20 for OLTP and 50 for DSS Systems..
If you are having slow performance because the CBO first_rows optimizer mode is favoring too many full-table scans, you can reset the optimizer_index_cost_adj parameter to immediately tune all of the SQL in your database to favor index scans over full-table scans. This is a "silver bullet" that can improve the performance of an entire database in cases where the database is OTLP and you have verified that the full-table scan costing is too low.

This tells optimizer to favor nested loops and IN-list iterators over sort-merge and hash joins. The default value is 100 and makes nested loops and IN-list operator look less expensive and more appealing to Optimizer. The value basically indicates the percentage of index blocks that the optimizer should assume are present in cache.
I recommend set this parameter to 90.

OPTIMIZER_PERCENT_PARALLEL takes a value between 0 and 100. A low value favors indexes and a higher value will favor full table scans. The optimizer uses this parameter in working out the cost of a full table scan. A value of 100 makes use of degree of parallelism set at object level. I prefer setting it to zero to favor use of indexes and prevent use of parallel query in computing the costing.
It is renamed to _OPTIMIZER_PERCENT_PARALLEL in Oracle 9i and its value
          should not be altered unless recommended by Oracle support.
e.g.: optimizer_percent_parallel = 0

This parameter is used to provide backward compatibility with earlier releases. This may also restrict the use of some new features. CBO has undergone lot of changes in release 8. It is advisable to set this parameter to 8.1.0 or higher. Only three digits are required to be specified, however, you can specify more for record purposes.
e.g.: compatible = 8

This parameter determines the number of database blocks read in one input/output operation during a full table scan. The value set is used in computing the cost of full table scans. OLTP systems will not benefit by setting a high value for this parameter as the computed cost for full table scan execution plans would reduced. The maximum size is Operating system dependent.
e.g.: db_file_multiblock_read_count = 1 (may be set to a higher value)

This parameter defines the maximum memory space allocated per user process that requires sorting of data and for insert and updates to bitmap indexes. Optimizer uses the set value for determining the cost of sorting in a query. This defaults to 64K. Normally a size of 64K to 1M is appropriate for OLTP systems.
Sort-merge joins are influenced by this parameter value. The bigger the size the more appealing will be sort-merge joins over nested-loop and hash joins. The lower the size the more sorting will take place in temporary segments. So assign a value that has been evaluated for your setup.
Oracle recommends the use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g: sort_area_size = 1048576

This specifies the number of database blocks to be read each time a sort performs a read from a temporary segment. The default value is 2. General recommendation is to keep the default value. However, if you intend to hike up the SORT_AREA_SIZE considerably, you may evaluate the improvement by changing this parameter.
e.g.: sort_multiblock_read_count = 2

Hash joins are available only in CBO. In hash joins, a hash table is created on the join key of the smallest sized table. It then joins the other tables to find the match. Hash joins may prove to be faster than other type of joins in some conditions, especially when the index is missing or search criteria is not very selective. Hash joins require a large amount of memory as the hash tables are retained; this may sometimes result in memory swapping.
Nested-loop joins return the first row faster than sort-merge and hash joins and are preferred for OLTP, but other types of joins cannot be ignored for running other aspects of the applications.
e.g.: hash_join_enabled = true

This specifies the maximum amount of memory in bytes to be used for a hash join per process. It is defaulted to 2 times SORT_AREA_SIZE.
Oracle recommends the use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g.: hash_area_size = 2097152
Setting this to a very low number may sometimes result in the following error.
ORA-6580: Hash Join ran out of memory while keeping large rows in memory.

This specifies how many sequential blocks a hash join reads and writes in one Input/Output activity. Oracle recommends not changing or assigning a value to this parameter; this will let oracle decide on the appropriate value for each individual query. In such casees, the value of the parameter will appear as 0 in the V$PARAMETER view.
This parameter is renamed to _HASH_MULTIBLOCK_IO_COUNT as of Oracle 9i.
e.g.: hash_multi_block_io_count = 0

This parameter is relevant for systems using bitmap indexes. It specifies the amount of memory Oracle uses to merge bitmaps retrieved from a range scan of a bitmap index. The default value is 1 MB, which is considered sufficient for most setups.
Oracle recommends use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g.: bitmap_merge_area_size = 1048576

This parameter is relevant for systems using Materialized views, Function based indexes and stored outlines. Setting this parameter enables query rewrite for the database. The materialized views should also be query rewrite enabled to allow the optimizer to redirect a query to it rather than the actual tables listed in the FROM clause. Query rewriting is internally done by the optimizer based on what results are expected and whether these are already present in materialized view form.
e.g.: query_rewrite_enabled = true

This parameter determines the extent to which the optimizer must enforce query rewriting to Materialized views. This determines the accuracy of the query rewrite. It can be set to ENFORCED, TRUSTED or STALE_TOLERATED. ENFORCED option uses Materialized views if they contain fresh data, thus guarantying the integrity and consistency of data. TRUSTED uses relationships that have been defined in the system for rewriting the query. STALE_TOLERATED allows the use of relationships that are not enforced and may use stale data. For OLTP systems, ENFORCED would be the ideal setup, as reports need to be up-to-date.
e.g.: query_rewrite_integrity = enforced

This parameter specifies the join method for anti-joins, for example when a NOT IN operator is present in your query. It can be set to NESTED_LOOPS, MERGE or HASH. It is defaulted to NESTED_LOOPS in Oracle 8i and CHOOSE in Oracle 9i.
This parameter is renamed to _ALWAYS_ANTI_JOIN as of Oracle 9i.
e.g.: always_anti_join = nested_loops

This parameter specifies the join method for semi-joins. These types of joins are carried out by Optimizer after transforming a query. In such joins, duplicate values from the inner table are removed and then the type of join specified in the parameter is used to perform a semi-join. It can be set to NESTED_LOOPS, MERGE or HASH. In Oracle 8i, it is defaulted to STANDARD and in Oracle 9i it is defaulted to CHOOSE, to pick up an appropriate join.
This parameter is renamed to _ALWAYS_SEMI_JOIN as of Oracle 9i.
e.g.: always_semi_join = nested_loops

This specifies whether query transformation will be applied to star queries. It can be set to TRUE, FALSE or TEMP_DISABLE (transformation will take place but will not use temporary tables). I presently set it to FALSE due to some known issues of sub-optimal queries being generated. If you intend to use this, please upgrade your version to and above.
e.g.: star_transformation_enabled = false

This parameter refers to parallel executions in cluster databases. It is meant for improving hash and sort-merge join operations where a very large result set is joined with a very small result set. When this option is enabled, the optimizer broadcasts a copy of all rows in the smaller result set to all cluster databases that are processing some rows of the larger result set. 
It is obsolete in release 9.2.0.
e.g.: parallel_broadcast_enabled = false

This parameter is introduced in release 9i. It is meant for situations where tables are not analyzed. As CBO depends heavily on statistics, the parameter tells the optimizer to sample the unanalyzed tables that are being used in a query. A level of 0 to 10 can be specified, the higher the value the more time optimizer spends in sampling.
e.g.: optimizer_dynamic_sampling = 1

This parameter is meant for backward compatibility to support partition views. Oracle recommends use of partition tables rather than partition views. If you are migrating to CBO, chances are that you may not be using partition views.
e.g.: partition_view_enabled = false

This parameter determines what kind of SQL statements can share the same cursor. It can be set to FORCE, SIMILAR or EXACT. FORCE will try to squeeze statements that may differ in some literals to share the same cursor. SIMILAR is somewhat the same but will try to maintain the plan optimization for identical statements. EXACT allows statements with exact identical text to share a cursor.
Using FORCE may sometimes result in unexpected results.
e.g.: cursor_sharing = exact

Introduced in Oracle 9i, this parameter specifies the aggregate PGA memory available to all server processes attached to an instance. This parameter can be set for automatic sizing of SQL working areas. It replaces other existing parameters like SORT_AREA_SIZE, BITMAP_MERGE_AREA_SIZE and HASH_AREA_SIZE.
It can be set to a value between 10 MB to 4000 GB, depending on the setup requirement.

Coding habits and Technical Guidelines

In CBO, adapt to ordering from left-to-right, left being the driving table. The ORDERED hint used in CBO picks up tables left-to-right for processing. Take a pick. Avoid RBO style coding techniques. CBO has advanced features such as function-based and bitmap indexes, control processing of queries with proper where clauses and hints.

Use bind variables

For example, below is an existing legacy system code that always uses a predicate with a hard coded value 1666. Though this may not change in the lifetime of the code, best practice is to avoid such coding and define this as a variable.

Original code:
 select sum(qty) qty into rec.qty
   from jncdm_ra_transactions a, ra_customer_trx_all b
   where a.calendar_month_id = pi_calendar_month_id
     and b.customer_trx_id = a.customer_trx_id
     and b.cust_trx_type_id != 1666
     and b.type = 'INV';

Change it to use a variable:
 l_exclude_trx_type_id number := '1666';
 l_mod_type varchar2(3) := 'INV';
 select sum(qty) qty into rec.qty
   from jncdm_ra_transactions a, ra_customer_trx_all b,
     where calendar_month_id = pi_calendar_month_id
     and b.customer_trx_id = a.customer_trx_id
     and b.cust_trx_type_id != l_exclude_trx_type_id
     and c.type = l_mod_type;


Oracle provides more than one way of generating statistics.

  2. ANALYZE command
NOTE= Oracle strongly recommends that you use the DBMS_STATS package rather than ANALYZE to collect optimizer statistics. That package lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways. Further, the cost-based optimizer will eventually use only statistics that have been collected by DBMS_STATS

However, you must use the ANALYZE statement rather than DBMS_STATS for statistics collection not related to the cost-based optimizer, such as:
    * To use the VALIDATE or LIST CHAINED ROWS clauses
    * To collect information on freelist blocks

In order to use the DBMS_STATS Package, the user needs to get the execute privilege. Run the following as SYS:
    grant execute on DBMS_STATS to user_xyz;

Generating Statistics

Oracle recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving necessary statistical accuracy. AUTO_SAMPLE_SIZE lets Oracle determine the best sample size for good statistics. For example, to collect table and column statistics for all tables in the OE schema with auto-sampling:
By default DBMS_STATS will generate statistics for the table and not it's indexes (By default CASCADE => FALSE).

How much to estimate?

If you intend to ESTIMATE statistics because of time constraints, try to arrive at an optimal sample size that will yield excellent results for your database. In general, a sample size of 5% to 10% generates adequate statistics! 

Gathering statistics on tables requires sorting to be done and this takes up resources. Gathering statistics on indexes does not require sorting. Considering this benefit, you may COMPUTE statistics on indexes for accurate data.

You may also consider generating statistics in Parallel. Below is an example of arriving at an appropriate ESTIMATE sample size.


The table INV.MATERIAL_TRX has around 4.5 million records. Below are statistics at various sample sizes.

Commands used:

dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX'); --compute
dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX', estimate_percent => 5);
dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX', estimate_percent => 10);
dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX', estimate_percent => 51);

Final statistics:

compute| 4591474| 193230| 251| 4591474|27-JUL-03|2 hr
at 5% | 4582460| 193230| 247| 229123|27-JUL-03|8 min
at 10% | 4587520| 193230| 249| 458752|27-JUL-03|17 min
at 20% | 4591635| 193230| 250| 918327|27-JUL-03|32 min
at 51% | 4590890.2| 193230| 250| 2341354|27-JUL-03|1 hr 56 min

What should the time interval be?

When there is a 10-20% change in data, the general convention is to generate fresh statistics. You can start of with a general rule of estimating statistics on a weekly basis. If the tables are giving real bad hits because of heavy activity, you may consider using the DML Monitoring option to update statistics every few hours for such tables. Statistics are not incremental and are regenerated every time. If there is no considerable change in data, there is no advantage in generating statistics too frequently.

Statistics locks?

A table that is in the process of being analyzed cannot undergo DDL changes but DML activities can be carried on. Analyzing an index puts a shared lock on the related table; hence, neither DML nor DDL can be performed. Preferably avoid all activities during the statistics generation phase

Generate adequate statistics at proper intervals

Use DBMS_STATS for generating statistics on a periodic basis. You may also categorize objects under various groups depending on their frequency of data change. For example, a daily transaction table will always change and statistics may be generated more often. Master tables change subtly and statistics could be generated less often.
In the case of ESTIMATE statistics, arrive at an optimal sample size. Oracle recommends a percentage of 5%. Consider using COMPUTE statistics for indexes and index-organized tables.
If queries on a particular set of tables always require more accurate statistics to behave properly, and otherwise fail to perform, consider using hints to direct the optimizer and avoid dependency on statistics generation.
Whenever a heavy upload of data is done, consider explicit generation of statistics on the concerned tables.

Statistics for SYS schema

The general rule is to not generate statistics for SYS schema if you are on Oracle 8i. In Oracle 9i, you will have to test this out in your setup to arrive at a conclusion. The dictionary will need to be analyzed in Oracle 10i, as RBO will be unsupported then.


This package allows you to insert your application specific information into the dynamic dictionary tables. This is very handy for implementing certain logics and for analyzing and tuning. I mention this package here mainly for analyzing and tuning purposes. By using this feature, you can integrate your application into the database and find vital information at any point of time, such as what is presently running in the database and from which part of the application a particular query is being executed and what the user sessions are doing.
Any time a performance issue arises, look at the application specific information to identify the problematic area instantly! Application specific information is set in dictionary tables such as V$SESSION (MODULE, ACTION, CLIENT_INFO columns).

Gathering statistics with DBMS_STATS

The following lists the procedures in the DBMS_STATS package for gathering statistics:

 Statistics Gathering Procedures in the DBMS_STATS Package
 Procedure Collects


Index statistics


Table, column, and index statistics


Statistics for all objects in a schema


Statistics for all objects in a database


CPU and I/O statistics for the system

Gathering System Statistics

System statistics enable the optimizer to consider a system's I/O and CPU performance and utilization. For each plan candidate, the optimizer computes estimates for I/O and CPU costs. It is important to know the system characteristics to pick the most efficient plan with optimal proportion between I/O and CPU cost.

System I/O characteristics depend on many factors and do not stay constant all the time. Using system statistics management routines, database administrators can capture statistics in the interval of time when the system has the most common workload. For example, database applications can process OLTP transactions during the day and run OLAP reports at night. Administrators can gather statistics for both states and activate appropriate OLTP or OLAP statistics when needed. This enables the optimizer to generate relevant costs with respect to available system resource plans.

When Oracle generates system statistics, it analyzes system activity in a specified period of time. Unlike table, index, or column statistics, Oracle does not invalidate already parsed SQL statements when system statistics get updated. All new SQL statements are parsed using new statistics. Oracle Corporation highly recommends that you gather system statistics.

The DBMS_STATS.GATHER_SYSTEM_STATS routine collects system statistics in a user-defined timeframe. You can also set system statistics values explicitly using DBMS_STATS.SET_SYSTEM_STATS. Use DBMS_STATS.GET_SYSTEM_STATS to verify system statistics.

The following example shows database applications processing OLTP transactions during the day and running reports at night. First, system statistics must be collected. The values in this example are user-defined; in other words, you must determine an appropriate time interval and name for your environment.

Generating System Statistics

Gather statistics during the day. Gathering ends after 720 minutes and is stored in the mystats table:
gathering_mode => 'interval',
interval => 720,
stattab => 'mystats',
statid => 'OLTP');

Gather statistics during the night. Gathering ends after 720 minutes and is stored in the mystats table:

gathering_mode => 'interval',
interval => 720,
stattab => 'mystats',
statid => 'OLAP');

If appropriate, you can switch between the statistics gathered. It is possible to automate this process by submitting a job to update the dictionary with appropriate statistics. During the day, the following jobs import the OLTP statistics for the daytime run:

VARIABLE jobno number;

During the night, the following jobs import the OLAP statistics for the nighttime run:

SYSDATE + 0.5, 'SYSDATE + 1');

Can gather statistics on all the tables and indexes in a database.
Some parameters  :

- estimate_percent
o NULL means compute
o The valid range is [0.000001,100].
o DBMS_STATS.AUTO_SAMPLE_SIZE so Oracle determines the best sample size for good statistics (Recommended value 9i).

- block_sample
o Only pertinent when doing an estimate statistics.
o Use or not random block sampling instead of random row sampling.
o Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated.

- method_opt
o FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...],
o size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}
. integer Number of histogram buckets. Valid Range [1,254]
. REPEAT Collects histograms only on the columns that already have histograms (9i).
. AUTO Oracle determines the columns to collect histograms based on data distribution and the workload of the columns (9i).
. SKEWONLY Oracle determines the columns to collect histograms based on the data distribution of the columns (9i).

- degree
o Degree of parallelism.
o NULL means use the table default value
o DBMS_STATS.DEFAULT_DEGREE means value based on the init parameters (9i).

- granularity
o Only pertinent if the table is partitioned.
o DEFAULT: Gather global- and partition-level statistics.
o GLOBAL: Gather global statistics.
o PARTITION: Gather partition-level statistics.
o SUBPARTITION: Gather subpartition-level statistics.
o ALL: Gather all (subpartition, partition, and global)stats.

- cascade
o Default (CASCADE => FALSE). No Index Gathering.
o Gathers statistics on the indexes as well (TRUE).
o Index statistics gathering is not parallelized except in 9iR2
o Equivalent to gather_index_stats procedure on each of the indexes in the database in addition to gathering table and column statistics.

- options
o GATHER: Default. Gathers statistics on all objects.
o GATHER AUTO: Gathers all necessary statistics automatically.
Oracle implicitly determines which objects need new statistics, and determines how to gather those statistics (9i).
o GATHER STALE: Gathers statistics on stale objects as determined by looking at the *_tab_modifications views.
o GATHER EMPTY: Gathers statistics on objects which currently have no statistics.

- gather_sys
o Gathers statistics on the objects owned by the 'SYS' user (TRUE)

SQL Source - Dynamic Method for TABLES

sql_stmt    VARCHAR2(1024);
  FOR tab_rec IN (SELECT owner,table_name
                    FROM all_tables WHERE owner like UPPER('&1') ) LOOP
        sql_stmt := 'BEGIN dbms_stats.gather_table_stats 
                     (ownname  => :1,
                      tabname    => :2,
                      partname   => null,
                      estimate_percent => 10,
                      degree => 3 ,
                      cascade => true);  END;'  ;
      EXECUTE IMMEDIATE sql_stmt USING tab_rec.owner, tab_rec.table_name ;


Gathering Index Statistics

Oracle can gather some statistics automatically while creating or rebuilding a B-tree or bitmap index. The COMPUTE STATISTICS option of CREATE INDEX or ALTER INDEX ... REBUILD enables this gathering of statistics.

The statistics that Oracle gathers for the COMPUTE STATISTICS option depend on whether the index is partitioned or nonpartitioned.

To ensure correctness of the statistics, Oracle always uses base tables when creating an index with the COMPUTE STATISTICS option, even if another index is available that could be used to create the index.
If you do not use the COMPUTE STATISTICS clause, or if you have made significant changes to the data, then use the DBMS_STATS.GATHER_INDEX_STATS procedure to collect index statistics.

Gathering Statistics for Function-Based Indexes
You should analyze the table after creating a function-based index, to allow Oracle to collect column statistics equivalent information for the expression. Optionally, you can collect histograms for the index expressions by specifying for all hidden columns size number_of_buckets in the METHOD_OPT argument to the DBMS_STATS procedures.

Gathering New Optimizer Statistics

Before gathering new statistics for a particular schema, use the DBMS_STATS.EXPORT_SCHEMA_STATS procedure to extract and save existing statistics. Then use DBMS_STATS.GATHER_SCHEMA_STATS to gather new statistics. You can implement both of these with a single call to the GATHER_SCHEMA_STATS procedure, by specifying additional parameters.

Designating Tables for Monitoring and Automated Statistics Gathering

Before you can utilize automated statistics gathering for a particular table, you must bring either the tables of a specific schema or a complete database into the monitoring mode. Do this with the DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING or DBMS_STATS.ALTER_DATABASE_TAB_MONITORING procedures. Alternatively, you can enable the monitoring attribute using the MONITORING keyword. This keyword is part of the CREATE TABLE and ALTER TABLE statement syntax. Monitoring tracks the approximate number of INSERTs, UPDATEs, and DELETEs for that table since the last time statistics were gathered. Oracle uses this data to identify tables with stale statistics. Then, you can enable automated statistics gathering by setting up a recurring job (perhaps by using job queues) that invokes DBMS_STATS.GATHER_TABLE_STATS with the GATHER STALE option at an appropriate interval for your application.

Objects are considered stale when 10% of the total rows have been changed. When you issue GATHER_TABLE_STATS with GATHER STALE, the procedure checks the USER_TAB_MODIFICATIONS view. If a monitored table has been modified more than 10%, then statistics are gathered again. The information about changes of tables, as shown in the USER_TAB_MODIFICATIONS view, can be flushed from the SGA into the data dictionary with the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure.


There can be a few minutes delay while Oracle propagates information to this view.

To disable monitoring, use the DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING or DBMS_STATS.ALTER_DATABASE_TAB_MONITORING procedures, or use the NOMONITORING keyword. Example:

SQL> exec dbms_stats.ALTER_SCHEMA_TABLE_MONITORING('<owner>',TRUE);


Oracle9i Examples:
In Oracle9i, only 1 statement is required to gather statistics on objects which are new or stale. This is because GATHER AUTO is no

Oracle9i - Gathering statistics for objects in a schema:
exec dbms_stats.gather_schema_stats(
ownname => NULL,
estimate_percent => 100, --
The estimate_percent can be automatically determined in 9i using: estimate_percent => dbms_stats.auto_sample_size,
granularity => 'ALL',
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
options => 'GATHER AUTO',
cascade => TRUE

Oracle9i - Gathering statistics for objects in the database:
exec dbms_stats.gather_database_stats(
estimate_percent => 100, --
The estimate_percent can be automatically determined in 9i using: estimate_percent => dbms_stats.auto_sample_size,
granularity => 'ALL',
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
options => 'GATHER AUTO',
cascade => TRUE

By using the constant DBMS_STATS.AUTO_SAMPLE_SIZE, Oracle automatically determines the best sample size for good statistics.

Enabling Automated Statistics Gathering

The GATHER STALE option gathers statistics only for tables that have stale statistics and for which you have enabled monitoring.

The GATHER STALE option maintains up-to-date statistics for the cost-based optimizer. Using this option at regular intervals also avoids the overhead associated with gathering statistics on all tables at one time. The GATHER option can incur much more overhead, because this option generally gathers statistics for a greater number of tables than GATHER STALE.

Use a script or job scheduling tool for the GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures to establish a frequency of statistics collection that is appropriate for the application. The frequency of collection intervals should balance the task of providing accurate statistics for the optimizer against the processing overhead incurred by the statistics collection process

Verifying Index Statistics


Typical output is:

-------------------------- -------- -------- ----------- ------- ------- ----------
CUSTOMERS_PK 50000 50000 454 4405 2 1
PRODUCTS_PK 10000 10000 90 1552 1 1
PRODUCTS_PROD_CAT_IX 10000 4 99 4422 1 24
PRODUCTS_PROD_SUBCAT_IX 10000 37 170 6148 2 4
SALES_PROD_BIX 6287 909 1480 6287 1 1
SALES_PROMO_BIX 4727 459 570 4727 1 1

Verifying Column Statistics


This returns the following data:

------------------------------ ------------ ---------- ----------- ----------
BURDEN_COST 4300 71957 1 .000232558
BURDEN_COST_RATE 675 7376401 1 .001481481
CONVERTED_FLAG 1 16793903 1 1 poor
COST_IND_COMPILED_SET_ID 87 6153143 1 .011494253
EXPENDITURE_ID 1171831 0 1 8.5337E-07 GOOD
TASK_ID 8648 0 1 .000115634
TRANSFERRED_FROM_EXP_ITEM_ID 1233787 15568891 1 8.1051E-07 GOOD

Using Histograms

The cost-based optimizer can use data value histograms to get accurate estimates of the distribution of column data. A histogram partitions the values in the column into bands, so that all column values in a band fall within the same range. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with nonuniform data distributions.

One of the fundamental tasks of the cost-based optimizer is determining the selectivity of predicates that appear in queries. Selectivity estimates are used to decide when to use an index and the order in which to join tables. Some attribute domains (a table's columns) are not uniformly distributed.

The cost-based optimizer uses height-based histograms on specified attributes to describe the distributions of nonuniform domains. In a height-based histogram, the column values are divided into bands so that each band contains approximately the same number of values. The useful information that the histogram provides, then, is where in the range of values the endpoints fall.

When to Use Histograms

Histograms can affect performance and should be used only when they substantially improve query plans. Histogram statistics data is persistent, so the space required to save the data depends on the sample size. In general, create histograms on columns that are used frequently in WHERE clauses of queries and have a highly skewed data distribution. For uniformly distributed data, the cost-based optimizer can make fairly accurate guesses about the cost of executing a particular statement without the use of histograms.

Histograms, like all other optimizer statistics, are static. They are useful only when they reflect the current data distribution of a given column. (The data in the column can change as long as the distribution remains constant.) If the data distribution of a column changes frequently, you must recompute its histogram frequently.

Histograms are not useful for columns with the following characteristics:

Creating Histograms

You generate histograms by using the DBMS_STATS package. You can generate histograms for columns of a table or partition. For example, to create a 10-bucket histogram on the SAL column of the emp table, issue the following statement:

('scott','emp', METHOD_OPT => 'FOR COLUMNS SIZE 10 sal');

The SIZE keyword declares the maximum number of buckets for the histogram. You would create a histogram on the SAL column if there were an unusually high number of employees with the same salary and few employees with other salaries. You can also collect histograms for a single partition of a table.

Oracle Corporation recommends using the DBMS_STATS package to have the database automatically decide which columns need histograms. This is done by specifying SIZE AUTO.

Example Using a Histogram to Improve an Execution Plan

UPDATE so_lines l 
SET open_flag=null,
WHERE l.line_type_code in ('REGULAR','DETAIL','RETURN') AND
l.open_flag = 'Y' AND NVL(l.shipped_quantity, 0)=0 OR
NVL(l.shipped_quantity, 0) != 0 AND
l.shipped_quantity +NVL(l.cancelled_quantity, 0)= l.ordered_quantity)) AND

This query shows the skewed distribution of data values for s6. In this case, there are two distinct non-null values: 10 and 18. The majority of the rows consists of s6 = 10 (1,589,464), while a small number of rows consist of s6 = 18 (13,091).

S6:      COUNT(*)
10 1,589,464
18 13,091
NULL 21,889

The selectivity of column s6, where s6 = 18:

S = 13,091 / (13,091 + 1,589,464) = 0.008

If No Histogram is Used: The selectivity of column s6 is assumed to be 50%, uniformly distributed across 10 and 18. This is not selective; therefore, s6 is not an ideal choice for use as an index.

If a Histogram is Used: The data distribution information is stored in the dictionary. This allows the optimizer to use this information and compute the correct selectivity based on the data distribution. In this example, the selectivity, based on the histogram data, is 0.008. This is a relatively high, or good, selectivity, which leads the optimizer to use an index on column s6 in the execution plan.


DBMS_STATS.GATHER_TABLE_STATS gathers statistics for a table and its columns, and optionally the associated indexes. Syntax:
dbms_stats.gather_table_stats(ownname, tabname, partname,
                                              estimate_percent, block_sample,
                                              method_opt, degree, granularity,
                                              cascade, stattab, statid, statown);

The first two parameters are mandatory, the rest are defaulted to a value.

e.g.: Estimate statistics for a table and its columns
exec dbms_stats.gather_table_stats (ownname => 'INV', tabname => 'MTL_MATERIAL_TRX', estimate_percent => 5);

e.g.: Estimate statistics for a table, its columns and indexes.
exec dbms_stats.gather_table_stats(ownname => 'APPS', tabname => 'AM21', estimate_percent => 5, cascade => true);

e.g.: Estimate statistics in parallel, the following uses 8 threads to complete the task

Session - A
exec dbms_stats.gather_table_stats(ownname => 'INV', tabname => 'MTL_MATERIAL_TRX', estimate_percent => 5, degree => 8);

Session - B (When the above process is running)
select * from v$px_process;

SERV|STATUS   |       PID|SPID     |       SID|  SERIAL#
P000|IN USE | 50|9684 | 7| 50586
P001|IN USE | 65|9686 | 60| 51561
P002|IN USE | 66|9688 | 17| 2694
P003|IN USE | 67|9690 | 30| 39243
P004|IN USE | 68|9692 | 74| 11017
P005|IN USE | 69|9694 | 48| 4253
P006|IN USE | 70|9696 | 76| 17
P007|IN USE | 71|9698 | 68| 1285

Estimate statistics for columns in a table, this will also generate statistics for tables.
exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM01', estimate_percent => 5, method_opt => 'FOR ALL COLUMNS');

The below example allows generation of column statistics in parallel. The degree of the table is initially set to 8 and the "SIZE 1" makes use of this. Refer to the Histogram section below to find out about the SIZE
exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM01',estimate_percent => 5, method_opt => 'FOR ALL COLUMNS SIZE 1');

Statistics for Global Temporary tables

No statistics are collected for Global Temporary Tables; handle these with care. Make use of hints to drive queries on these. Provide explicit statistics by using DBMS_STATS.SET_TABLE_STATS if your temporary table is used for processing huge amounts of data.
select num_rows, blocks, avg_row_len, temporary, user_stats
from dba_tables
where table_name = 'AM21';
---------- ---------- ----------- - ---

exec dbms_stats.gather_table_stats('DATASTAGE', 'TMP_GL_VALUE_DIFF');
ERROR at line 1:
ORA-20000: Gathering statistics for a temporary table is not supported

exec dbms_stats.set_table_stats(ownname => 'SYS', tabname => 'AM21', numrows => 3000, numblks => 300, avgrlen => 50);

select num_rows, blocks, avg_row_len, temporary, user_stats
from dba_tables
where table_name = 'AM21';

---------- ---------- ----------- - ---
3000 300 50 Y YES


DBMS_STATS.GATHER_INDEX_STATS gathers statistics for indexes. Index statistics cannot be generated in parallel. In addition, the block sampling option available in tables is not available for indexes. Syntax:
dbms_stats.gather_index_stats(ownname, indname, partname, estimate_percent, stattab, statid, statown);

exec dbms_stats.gather_index_stats (ownname => 'INV', indname => 'MTL_SYSTEM_ITEMS_JHN99');


DBMS_STATS.GATHER_SCHEMA_STATS gathers statistics for a given schema. Syntax:
dbms_stats.gather_schema_stats(ownname, estimate_percent, block_sample,
                                                             method_opt, degree, granularity, cascade, stattab, statid, options, objlist, statown);

e.g.: Gather schema statistics, for tables and indexes at 5% estimate.
exec dbms_stats.gather_schema_stats(ownname => 'SCOTT', estimate_percent => 5, cascade => true, options => 'GATHER');

e.g.: Gather statistics for objects with no statistics. The cascade option given below does not make a difference as the GATHER EMPTY options generates for all objects without any statistics.

exec dbms_stats.gather_schema_stats(ownname => 'QP', estimate_percent => 5, cascade => true, options => 'GATHER EMPTY');

e.g.: To identify a list of objects without any statistics.
l_owner varchar2(30) := 'QP';
l_emptylst dbms_stats.objecttab;
dbms_stats.gather_schema_stats(ownname => l_owner,
options => 'LIST EMPTY', objlist => l_emptylst);
for i in nvl(l_emptylst.first, 0) .. nvl(l_emptylst.last, 0) loop
dbms_output.put_line(l_emptylst(i).objtype || '/' || l_emptylst(i).objname);
end loop;


DBMS_STATS.GATHER_DATABASE_STATS gathers statistics for the complete database. In 8i, this will generate statistics for the SYS schema also. This has been rectified in Oracle 9i. For Oracle 8i, an alternative is to generate statistics for individual schemas or delete SYS schema statistics after generating statistics at the database level. Syntax:
dbms_stats.gather_database_stats(estimate_percent, block_sample, method_opt, degree, granularity,
                                 cascade, stattab, statid, options, objlist, statown);

Retrieveing statistics with DBMS_STATS


Get table statistics. Syntax:

dbms_stats.get_table_stats(ownname, tabname, partname, stattab, statid, numrows,
                                                 numblks, avgrlen, statown);

e.g.: getting table statistics data.

l_numrows number;
l_numblks number;
l_avgrlen number;
dbms_stats.get_table_stats(ownname => 'SYS', tabname => 'AM01',
numrows => l_numrows, numblks => l_numblks, avgrlen => l_avgrlen);
dbms_output.put_line('No. of rows: ' || l_numrows);
dbms_output.put_line('No. of blks: ' || l_numblks);
dbms_output.put_line('Avg row length: ' || l_avgrlen);

No. of rows: 4106860
No. of blks: 6219
Avg row length: 3


Get column statistics present in the dictionary. Syntax:

dbms_stats.get_column_stats(ownname, tabname, colname, partname, stattab, statid,
                                                     distcnt, density, nullcnt, srec, avgclen, statown);

e.g.: getting statistics for a column.

l_distcnt number;
l_density number;
l_nullcnt number;
l_srec dbms_stats.statrec;
l_avgclen number;
dbms_stats.get_column_stats(ownname => 'SYS', tabname => 'AM01',
colname => 'COL1', distcnt => l_distcnt, density => l_density,
nullcnt => l_nullcnt, srec => l_srec, avgclen => l_avgclen);
dbms_output.put_line('No. of distinct values: ' || l_distcnt);
dbms_output.put_line('Density: ' || l_density);
dbms_output.put_line('Count of nulls: ' || l_nullcnt);
dbms_output.put_line('Avg. column length: ' || l_avgclen);

No. of distinct values: 2
Density: .5
Count of nulls: 0
Avg. column length: 3


Get index statistics. Syntax

dbms_stats.get_index_stats(ownname, indname, partname, stattab, statid, 
numrows, numlblks, numdist, avglblk, avgdblk,
clstfct, indlevel, statown);

e.g.: getting an index statistics.

l_numrows number;
l_numlblks number;
l_numdist number;
l_avglblk number;
l_avgdblk number;
l_clstfct number;
l_indlevel number;
dbms_stats.get_index_stats(ownname => 'SYS', indname => 'AM01_N1',
numrows => l_numrows, numlblks => l_numlblks,
numdist => l_numdist, avglblk => l_avglblk,
avgdblk => l_avgdblk, clstfct => l_clstfct,
indlevel => l_indlevel);
dbms_output.put_line('No. of rows: ' || l_numrows);
dbms_output.put_line('No. of blks: ' || l_numlblks);
dbms_output.put_line('No. of distinct values: ' || l_numdist);
dbms_output.put_line('Avg leaf blocks for distinct keys: ' || l_avglblk);
dbms_output.put_line('Avg data blocks pointed to in the table: ' || l_avgdblk);
dbms_output.put_line('Clustering factor: ' || l_clstfct);
dbms_output.put_line('Index height: ' || l_indlevel);

No. of rows: 3819711
No. of blks: 11092
No. of distinct values: 1
Avg leaf blocks for distinct keys: 11092
Avg data blocks pointed to in the table: 14616
Clustering factor: 14616
Index height: 2

Exporting and importing statistics with DBMS_STATS

DBMS_STATS also includes routines for gathering statistics and storing them outside the dictionary. This does not influence the optimizer. Most of the procedures in this package as mentioned above have three common parameters - STATID, STATTAB and STATOWN that are related to user processing of statistics.

Advantages of this feature:

1. Estimated statistics at different percentages could be stored and used for testing.

2. Statistics generated on one database could be transferred to another database.

Automated Implementation Example

The following example uses the scott/tiger tables to demonstrate how to implement automated table monitoring and stale statistics gathering.

-- Perform a quick analyze to load in base statistics
-- (for illustrative purposes only!)
-- This table is small, we'll compute
          method_opt=>'FOR ALL COLUMNS',
          degree=>null,                        -- No parallelism used in this example


-- Examine the current statistics
select table_name, num_rows, blocks, avg_row_len   
     from user_tables where table_name='EMP';

TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
------------------------------ ---------- ---------- -----------
EMP                                    14          1          37


-- Turn monitoring on for the EMP table
alter table emp monitoring;

Table altered.

-- I could also choose to turn on monitoring for all of the tables within
-- Scott's schema by simply calling:

-- begin
--  dbms_stats.ALTER_SCHEMA_TAB_MONITORING('scott',true);
-- end;
-- Or even at the database level with:
-- begin
--   dbms_stats.alter_database_tab_monitoring(
--   monitoring=>true,
--   sysobjs=>false);      -- Don't set to true, see note below!
-- end;
-- Note: Although the option to collect statistics for SYS objects is available
-- via ALTER_DATABASE_TAB_MONITORING, Oracle continues to recommend
-- against this practice until the next major release after 9i rel 2


-- Verify that monitoring is turned on
-- (noting the fact that we didn't actually run the other
-- two monitoring statements in the notes above (commented out))

select table_name, monitoring from user_tables
  2  where owner='SCOTT'   
  3  order by 2;

TABLE_NAME                     MON
------------------------------ ---
BONUS                          NO
DEPT                           NO
DUMMY                          NO
SALGRADE                       NO
EMP                            YES

6 rows selected.

-- Add some rows
insert into emp select * from emp;

14 rows created.

Commit complete.


-- Wait until the monitoring data is flushed (after timeout or db shutdown)
-- or if you're really impatient, run the following (as a DBA-priviledged account):
exec dbms_stats.flush_database_monitoring_info;
PL/SQL procedure successfully completed.

-- As scott/tiger, let's check USER_TAB_MODIFICATIONS to see what it has collected:
select * from user_tab_modifications;

---------- -------------- ----------------- ------- ------- ------- --------- ---------
EMP 14 0 0 05-JUN-02 NO
-- Execute DBMS_STATS to gather stats on all stale tables
estimate_percent=>null, -- This table is small, we'll compute
method_opt=>'FOR ALL COLUMNS',
degree=>null, -- No parallelism used in this example
options=>'GATHER STALE');

-- Verify that the table is no longer listed in USER_TAB_MODIFICATIONS
select * from user_tab_modifications;

no rows selected.

-- Examine some of new statistics collected
select table_name, num_rows, blocks, avg_row_len   
     from user_tables where table_name='EMP';

TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
------------------------------ ---------- ---------- -----------
EMP                                    28          1          37