Working with CBO and the DBMS_STATS Package

NOTE: With Italic are the most important values

Statistics

The CBO makes its explain-plan decisions based on statistics.  The CBO tries to create an explain-plan that will result in the lowest amount of physical I/O.
Statistics provide critical input in order for CBO to work properly; these are generated for data storing objects and include information such as the number of rows in a table, distinct values in a column, number of leaf blocks in an index, etc. The more accurate the statistics, the more efficient the results provided by Optimizer.
Statistics may be exact or estimated. Statistics generated with a COMPUTE clause analyzes all of the data in the object. This gives the optimizer accurate information to work on and arrive at a good execution plan.
Statistics generated with an ESTIMATE clause analyzes data in the object to the extent of sample size mentioned. Sample size may be specified as number of rows or percentage of rows that should be randomly analyzed to generate the statistics. Optionally block sampling may also be specified. This saves on time if there are many huge tables in the system. The guarantee of good execution plans will depend on how close the estimated value is to the exact values.
The CBO doesn’t use the rules of the RBO but does use the object order in the FROM clause and/or the object_id to break ties. The CBO can consider a maximum of explain plan variations up to the init.ora setting “OPTIMIZER_MAX_PERMUTATIONS”.
Statistics are stored in a data dictionary in tables owned by SYS user. The following views display the statistics collected for tables, columns and indexes.

For Tables

DBA_TABLES
NUM_ROWS
- Number of rows
BLOCKS
- Number of blocks below HW (used or not)
EMPTY_BLOCKS
- Number of empty blocks
AVG_SPACE
- Average free space (in bytes) in blocks allocated to the table. All empty and free blocks are considered for this
CHAIN_CNT
- Number of chained rows (used by ANALYZE command only)
AVG_ROW_LEN
- Average row length in bytes
LAST_ANALYZED - Date when the table was last analyzed
SAMPLE_SIZE
- Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE
GLOBAL_STATS
- For partitioned tables, YES - statistics collected as a whole, NO - statistics are estimated from statistics
USER_STATS
- Set to YES if user has explicitly set the statistics for the table

Statistics for individual partitions of a table can be seen from DBA_TAB_PARTITIONS. Cluster statistics is available from DBA_CLUSTERS.

For Columns

DBA_TAB_COLUMNS
NUM_DISTINCT
- Number of distinct values
LOW_VALUE
- Lowest value
HIGH_VALUE
- Highest value
DENSITY
- Density of the column
NUM_NULLS
- Number of records with null value for the concerned column
NUM_BUCKETS
- Number of buckets in histograms. Refer Histograms section
SAMPLE_SIZE
- Sample size provided for ESTIMATE statistics. Equal to total rows if COMPUTE
LAST_ANALYZED
- Date when the table was last analyzed
DBA_TAB_COL_STATISTICS shows similar data. Partitioned Table column statistics can be viewed from DBA_PART_COL_STATISTICS and DBA_SUBPART_COL_STATISTICS

For Indexes

DBA_INDEXES
BLEVEL
- Depth of the index, from root to leaf
LEAF_BLOCKS
- Number of leaf blocks
DISTINCT KEYS
- Number of distinct index values
AVG_LEAF_BLOCKS_PER_KEY
- Average number of leaf blocks in which each distinct key appears, should be 1 for unique indexes
AVG_DATA_BLOCKS_PER_KEY
- Average number of blocks in the table that are pointed to by a distinct key
CLUSTERING_FACTOR
- A count that determines the ordering of the index. Index is ordered if count is closer to the number of blocks, i.e entries in single leaf tend to point to rows in same blocks in the table. Index is randomly ordered if closer to the number of rows, i.e., entries in single leaf are pointing to rows spread across multiple blocks
NUM_ROWS
- Number of rows indexed
SAMPLE_SIZE
- Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE
LAST_ANALYZED
- Date when the table was last analyzed
GLOBAL_STATS
- For partitioned indexes, YES - statistics collected as a whole, NO - statistics are estimated from statistics
USER_STATS
- Set to YES if user has explicitly set the statistics for the index
PCT_DIRECT_ACCESS - For secondary indexes on IOTs, percentage of rows with valid guess

Statistics for individual partitions of indexes can be seen from DBA_IND_PARTITIONS and DBA_IND_SUBPARTITIONS.
Dictionary tables related to Histogram information are discussed later.


Available CBO Modes

CBO has two available modes in which to run, ALL_ROWS(default) and FIRST_ROWS.
FIRST_ROWS aims at returning the first row(s) of the statement as soon as possible. This mode tells optimizer to give response time prime importance. It prefers nested-loop joins. FIRST_ROWS uses cost as well as some thumb rules to process the first set of rows. Examples of thumb rules - Plans using indexes are preferred over plans having full table scans as access path, ORDER BY clause can induce index access, etc. As of release 9i, the number of rows to be returned in the first hit can also be mentioned in the parameter, FIRST_ROWS_n (n could be 1, 10, 100 or 1000). This could be set as per the application requirements.

ALL_ROWS processes all rows for a given query before returning the output. It forces optimizer to consider minimal use of resources and best throughput. ALL_ROWS prefers sort-merge joins.

For an OLTP system, FIRST_ROWS would be the ideal option for fast response time. ALL_ROWS is meant for batch-oriented applications. Note, a plan that produces the first n rows with the fastest response time might not be an optimal plan if requirement is to obtain the entire result, so decide as per the need of the application


Basic CBO Terms

The following terms will be used quite often when analyzing statements in CBO.

Cost
The COST computed in CBO is a unit of expense involved with each operation. The logic as to how the cost is actually derived is not documented or made external. Moreover, this may change across releases.

Cardinality
The number of rows in the table or number of distinct row links in the index. The cardinality of a query is the number of rows that is expected to be returned by it.

Selectivity
The number of distinct values. The distinct values of a column being indexed are known as its selectivity. For example, if a table has 10000 rows and an index is created on a column having 4000 distinct values, then the selectivity of the index is (4000/10000) * 100 = 40%. Unique index on not null columns have a selectivity of 100%.

Transitivity
It is a process of generating additional predicates for a query by CBO. This enables optimizer to consider additional execution paths. For example if predicates are provided in query of the type A=B and B=C, the optimizer may add an additional predicate that indicates A=C.

Statistics
Much required information gathered for various data holding objects. This information is vital for the CBO to decide on execution plans.

Join Methods
Oracle uses joins like Hash, sort-merge and nested loops. A query may run faster using one type of join as compared to other methods. This should be evaluated for individual queries.

FTS
FTS or Full Table Scan relates to a query sequentially scanning a table from the first block to the last allocated block. This could be very expensive for big tables and should be avoided.

Index scan
Relates to random access of a table by use of one or more indexes on the table.


Initialization parameters that affect the CBO
Apart from generating statistics, the setup of the below mentioned parameters play a very crucial role in dictating how your system will work. Setting these will depend a lot on what kind of environment you want to create, Online, Batch processing, Data warehousing or a combination of more than one.

1) OPTIMIZER_MODE
This will decide the mode in which the optimizer engine should run in. Valid values are RULE, CHOOSE, ALL_ROWS, FIRST_ROWS (_n).
Optionally, the OPTIMIZER_MODE can be set to CHOOSE. This is kind of an intermediate option between RBO and CBO. In fact, it tries to run the query in either CBO or RBO depending on the availability or unavailability of statistics. Therefore, if the tables present in the query have statistics generated on them, CBO (ALL_ROWS only) is preferred or RBO is taken up.
e.g.: optimizer_mode = first_rows

2) OPTIMIZER_FEATURES_ENABLE
This is set to a version number such as- 8.1.5, 8.1.7, 9.0.0. Since new features and functionality are being added to CBO in every release, its behavior may change and result in different execution plans. You can set this to a version number for which your application is tuned. Please note setting it to a lower version will prevent the use of new features that have come in later versions.
e.g.: optimizer_features_enable = 8.1.7

3) OPTIMIZER_MAX_PERMUTATIONS
This parameter specifies the maximum number of permutations that should be considered for queries with joins, to choose an execution plan. This will influence the parse time of queries. This parameter should be set to a lower value. Make sure the other parameters mentioned in this section are set properly so that the optimizer finds an optimal execution plan within the specified limits. It defaults to 80000 in Oracle 8, which means no limits! In Oracle 9i it is defaulted to 2000.
e.g.: optimizer_max_permutations = 2000

4) OPTIMIZER_INDEX_COST_ADJ
Optimizer_Index_Cost_Ad takes a value between 1 and 10000 (default 100). Setting this value makes the index more or less friendly to the optimizer. For example setting it to 50 tells the optimizer that an execution plan that uses index(es) is half as expensive as other execution plans that are not using them. The lower the value (less than 100), the less full table scan executions will take place in the system.
Use of indexes in OLTP is a crucial factor for deciding the response time, but sometimes, full table scans provide better throughput. Nested loop joins are influenced by this parameter value. Set this parameter to a value that is optimal for your application. I recommend set this parameter between 10 - 20 for OLTP and 50 for DSS Systems..
If you are having slow performance because the CBO first_rows optimizer mode is favoring too many full-table scans, you can reset the optimizer_index_cost_adj parameter to immediately tune all of the SQL in your database to favor index scans over full-table scans. This is a "silver bullet" that can improve the performance of an entire database in cases where the database is OTLP and you have verified that the full-table scan costing is too low.

5) OPTIMIZER_INDEX_CACHING
This tells optimizer to favor nested loops and IN-list iterators over sort-merge and hash joins. The default value is 100 and makes nested loops and IN-list operator look less expensive and more appealing to Optimizer. The value basically indicates the percentage of index blocks that the optimizer should assume are present in cache.
I recommend set this parameter to 90.

6) OPTIMIZER_PERCENT_PARALLEL
OPTIMIZER_PERCENT_PARALLEL takes a value between 0 and 100. A low value favors indexes and a higher value will favor full table scans. The optimizer uses this parameter in working out the cost of a full table scan. A value of 100 makes use of degree of parallelism set at object level. I prefer setting it to zero to favor use of indexes and prevent use of parallel query in computing the costing.
It is renamed to _OPTIMIZER_PERCENT_PARALLEL in Oracle 9i and its value
          should not be altered unless recommended by Oracle support.
e.g.: optimizer_percent_parallel = 0

7) COMPATIBLE
This parameter is used to provide backward compatibility with earlier releases. This may also restrict the use of some new features. CBO has undergone lot of changes in release 8. It is advisable to set this parameter to 8.1.0 or higher. Only three digits are required to be specified, however, you can specify more for record purposes.
e.g.: compatible = 8

8) DB_FILE_MULTIBLOCK_READ_COUNT
This parameter determines the number of database blocks read in one input/output operation during a full table scan. The value set is used in computing the cost of full table scans. OLTP systems will not benefit by setting a high value for this parameter as the computed cost for full table scan execution plans would reduced. The maximum size is Operating system dependent.
e.g.: db_file_multiblock_read_count = 1 (may be set to a higher value)

9) SORT_AREA_SIZE
This parameter defines the maximum memory space allocated per user process that requires sorting of data and for insert and updates to bitmap indexes. Optimizer uses the set value for determining the cost of sorting in a query. This defaults to 64K. Normally a size of 64K to 1M is appropriate for OLTP systems.
Sort-merge joins are influenced by this parameter value. The bigger the size the more appealing will be sort-merge joins over nested-loop and hash joins. The lower the size the more sorting will take place in temporary segments. So assign a value that has been evaluated for your setup.
Oracle recommends the use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g: sort_area_size = 1048576

10) SORT_MULTIBLOCK_READ_COUNT
This specifies the number of database blocks to be read each time a sort performs a read from a temporary segment. The default value is 2. General recommendation is to keep the default value. However, if you intend to hike up the SORT_AREA_SIZE considerably, you may evaluate the improvement by changing this parameter.
e.g.: sort_multiblock_read_count = 2

11) HASH_JOIN_ENABLED
Hash joins are available only in CBO. In hash joins, a hash table is created on the join key of the smallest sized table. It then joins the other tables to find the match. Hash joins may prove to be faster than other type of joins in some conditions, especially when the index is missing or search criteria is not very selective. Hash joins require a large amount of memory as the hash tables are retained; this may sometimes result in memory swapping.
Nested-loop joins return the first row faster than sort-merge and hash joins and are preferred for OLTP, but other types of joins cannot be ignored for running other aspects of the applications.
e.g.: hash_join_enabled = true

12) HASH_AREA_SIZE
This specifies the maximum amount of memory in bytes to be used for a hash join per process. It is defaulted to 2 times SORT_AREA_SIZE.
Oracle recommends the use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g.: hash_area_size = 2097152
Setting this to a very low number may sometimes result in the following error.
ORA-6580: Hash Join ran out of memory while keeping large rows in memory.

13) HASH_MULTIBLOCK_IO_COUNT
This specifies how many sequential blocks a hash join reads and writes in one Input/Output activity. Oracle recommends not changing or assigning a value to this parameter; this will let oracle decide on the appropriate value for each individual query. In such casees, the value of the parameter will appear as 0 in the V$PARAMETER view.
This parameter is renamed to _HASH_MULTIBLOCK_IO_COUNT as of Oracle 9i.
e.g.: hash_multi_block_io_count = 0

14) BITMAP_MERGE_AREA_SIZE
This parameter is relevant for systems using bitmap indexes. It specifies the amount of memory Oracle uses to merge bitmaps retrieved from a range scan of a bitmap index. The default value is 1 MB, which is considered sufficient for most setups.
Oracle recommends use of PGA_AGGREGATE_TARGET instead of this parameter from Oracle 9i.
e.g.: bitmap_merge_area_size = 1048576

15) QUERY_REWRITE_ENABLED
This parameter is relevant for systems using Materialized views, Function based indexes and stored outlines. Setting this parameter enables query rewrite for the database. The materialized views should also be query rewrite enabled to allow the optimizer to redirect a query to it rather than the actual tables listed in the FROM clause. Query rewriting is internally done by the optimizer based on what results are expected and whether these are already present in materialized view form.
e.g.: query_rewrite_enabled = true

16) QUERY_REWRITE_INTEGRITY
This parameter determines the extent to which the optimizer must enforce query rewriting to Materialized views. This determines the accuracy of the query rewrite. It can be set to ENFORCED, TRUSTED or STALE_TOLERATED. ENFORCED option uses Materialized views if they contain fresh data, thus guarantying the integrity and consistency of data. TRUSTED uses relationships that have been defined in the system for rewriting the query. STALE_TOLERATED allows the use of relationships that are not enforced and may use stale data. For OLTP systems, ENFORCED would be the ideal setup, as reports need to be up-to-date.
e.g.: query_rewrite_integrity = enforced

17) ALWAYS_ANTI_JOIN
This parameter specifies the join method for anti-joins, for example when a NOT IN operator is present in your query. It can be set to NESTED_LOOPS, MERGE or HASH. It is defaulted to NESTED_LOOPS in Oracle 8i and CHOOSE in Oracle 9i.
This parameter is renamed to _ALWAYS_ANTI_JOIN as of Oracle 9i.
e.g.: always_anti_join = nested_loops

18) ALWAYS_SEMI_JOIN
This parameter specifies the join method for semi-joins. These types of joins are carried out by Optimizer after transforming a query. In such joins, duplicate values from the inner table are removed and then the type of join specified in the parameter is used to perform a semi-join. It can be set to NESTED_LOOPS, MERGE or HASH. In Oracle 8i, it is defaulted to STANDARD and in Oracle 9i it is defaulted to CHOOSE, to pick up an appropriate join.
This parameter is renamed to _ALWAYS_SEMI_JOIN as of Oracle 9i.
e.g.: always_semi_join = nested_loops

19) STAR_TRANSFORMATION_ENABLED
This specifies whether query transformation will be applied to star queries. It can be set to TRUE, FALSE or TEMP_DISABLE (transformation will take place but will not use temporary tables). I presently set it to FALSE due to some known issues of sub-optimal queries being generated. If you intend to use this, please upgrade your version to 8.1.7.4 and above.
e.g.: star_transformation_enabled = false

20) PARALLEL_BROADCAST_ENABLED
This parameter refers to parallel executions in cluster databases. It is meant for improving hash and sort-merge join operations where a very large result set is joined with a very small result set. When this option is enabled, the optimizer broadcasts a copy of all rows in the smaller result set to all cluster databases that are processing some rows of the larger result set. 
It is obsolete in release 9.2.0.
e.g.: parallel_broadcast_enabled = false

21) OPTIMIZER_DYNAMIC_SAMPLING
This parameter is introduced in release 9i. It is meant for situations where tables are not analyzed. As CBO depends heavily on statistics, the parameter tells the optimizer to sample the unanalyzed tables that are being used in a query. A level of 0 to 10 can be specified, the higher the value the more time optimizer spends in sampling.
e.g.: optimizer_dynamic_sampling = 1

22) PARTITION_VIEW_ENABLED
This parameter is meant for backward compatibility to support partition views. Oracle recommends use of partition tables rather than partition views. If you are migrating to CBO, chances are that you may not be using partition views.
e.g.: partition_view_enabled = false

23) CURSOR_SHARING
This parameter determines what kind of SQL statements can share the same cursor. It can be set to FORCE, SIMILAR or EXACT. FORCE will try to squeeze statements that may differ in some literals to share the same cursor. SIMILAR is somewhat the same but will try to maintain the plan optimization for identical statements. EXACT allows statements with exact identical text to share a cursor.
Using FORCE may sometimes result in unexpected results.
e.g.: cursor_sharing = exact

24) PGA_AGGREGATE_TARGET
Introduced in Oracle 9i, this parameter specifies the aggregate PGA memory available to all server processes attached to an instance. This parameter can be set for automatic sizing of SQL working areas. It replaces other existing parameters like SORT_AREA_SIZE, BITMAP_MERGE_AREA_SIZE and HASH_AREA_SIZE.
It can be set to a value between 10 MB to 4000 GB, depending on the setup requirement.

Coding habits and Technical Guidelines

In CBO, adapt to ordering from left-to-right, left being the driving table. The ORDERED hint used in CBO picks up tables left-to-right for processing. Take a pick. Avoid RBO style coding techniques. CBO has advanced features such as function-based and bitmap indexes, control processing of queries with proper where clauses and hints.

Use bind variables

For example, below is an existing legacy system code that always uses a predicate with a hard coded value 1666. Though this may not change in the lifetime of the code, best practice is to avoid such coding and define this as a variable.

Original code:
begin
 ...
 select sum(qty) qty into rec.qty
   from jncdm_ra_transactions a, ra_customer_trx_all b
   where a.calendar_month_id = pi_calendar_month_id
     and b.customer_trx_id = a.customer_trx_id
     and b.cust_trx_type_id != 1666
     and b.type = 'INV';
 ...
end;

Change it to use a variable:
declare
 ...
 l_exclude_trx_type_id number := '1666';
 l_mod_type varchar2(3) := 'INV';
begin
 ...
 select sum(qty) qty into rec.qty
   from jncdm_ra_transactions a, ra_customer_trx_all b,
     where calendar_month_id = pi_calendar_month_id
     and b.customer_trx_id = a.customer_trx_id
     and b.cust_trx_type_id != l_exclude_trx_type_id
     and c.type = l_mod_type;
 ...
end;

DBMS_STATS

Oracle provides more than one way of generating statistics.

  1. DBMS_UTILITY
  2. ANALYZE command
  3. DBMS_DDL
  4. DBMS_STATS
NOTE= Oracle strongly recommends that you use the DBMS_STATS package rather than ANALYZE to collect optimizer statistics. That package lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways. Further, the cost-based optimizer will eventually use only statistics that have been collected by DBMS_STATS

However, you must use the ANALYZE statement rather than DBMS_STATS for statistics collection not related to the cost-based optimizer, such as:
    * To use the VALIDATE or LIST CHAINED ROWS clauses
    * To collect information on freelist blocks

In order to use the DBMS_STATS Package, the user needs to get the execute privilege. Run the following as SYS:
    grant execute on DBMS_STATS to user_xyz;

Generating Statistics

Oracle recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving necessary statistical accuracy. AUTO_SAMPLE_SIZE lets Oracle determine the best sample size for good statistics. For example, to collect table and column statistics for all tables in the OE schema with auto-sampling:
EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS('OE',DBMS_STATS.AUTO_SAMPLE_SIZE);
By default DBMS_STATS will generate statistics for the table and not it's indexes (By default CASCADE => FALSE).

How much to estimate?

If you intend to ESTIMATE statistics because of time constraints, try to arrive at an optimal sample size that will yield excellent results for your database. In general, a sample size of 5% to 10% generates adequate statistics! 

Gathering statistics on tables requires sorting to be done and this takes up resources. Gathering statistics on indexes does not require sorting. Considering this benefit, you may COMPUTE statistics on indexes for accurate data.

You may also consider generating statistics in Parallel. Below is an example of arriving at an appropriate ESTIMATE sample size.

Example

The table INV.MATERIAL_TRX has around 4.5 million records. Below are statistics at various sample sizes.

Commands used:

dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX'); --compute
dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX', estimate_percent => 5);
dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX', estimate_percent => 10);
dbms_stats.gather_table_stats('INV', 'MTL_MATERIAL_TRX', estimate_percent => 51);

Final statistics:

 MODE  |  NUM_ROWS|    BLOCKS|AVG_ROW_LEN|SAMPLE_SIZE|LAST_ANAL|Time taken
_______|__________|__________|___________|___________|_________|__________
compute| 4591474| 193230| 251| 4591474|27-JUL-03|2 hr
at 5% | 4582460| 193230| 247| 229123|27-JUL-03|8 min
at 10% | 4587520| 193230| 249| 458752|27-JUL-03|17 min
at 20% | 4591635| 193230| 250| 918327|27-JUL-03|32 min
at 51% | 4590890.2| 193230| 250| 2341354|27-JUL-03|1 hr 56 min


What should the time interval be?

When there is a 10-20% change in data, the general convention is to generate fresh statistics. You can start of with a general rule of estimating statistics on a weekly basis. If the tables are giving real bad hits because of heavy activity, you may consider using the DML Monitoring option to update statistics every few hours for such tables. Statistics are not incremental and are regenerated every time. If there is no considerable change in data, there is no advantage in generating statistics too frequently.

Statistics locks?

A table that is in the process of being analyzed cannot undergo DDL changes but DML activities can be carried on. Analyzing an index puts a shared lock on the related table; hence, neither DML nor DDL can be performed. Preferably avoid all activities during the statistics generation phase

Generate adequate statistics at proper intervals

Use DBMS_STATS for generating statistics on a periodic basis. You may also categorize objects under various groups depending on their frequency of data change. For example, a daily transaction table will always change and statistics may be generated more often. Master tables change subtly and statistics could be generated less often.
In the case of ESTIMATE statistics, arrive at an optimal sample size. Oracle recommends a percentage of 5%. Consider using COMPUTE statistics for indexes and index-organized tables.
If queries on a particular set of tables always require more accurate statistics to behave properly, and otherwise fail to perform, consider using hints to direct the optimizer and avoid dependency on statistics generation.
Whenever a heavy upload of data is done, consider explicit generation of statistics on the concerned tables.

Statistics for SYS schema

The general rule is to not generate statistics for SYS schema if you are on Oracle 8i. In Oracle 9i, you will have to test this out in your setup to arrive at a conclusion. The dictionary will need to be analyzed in Oracle 10i, as RBO will be unsupported then.

Using DBMS_APPLICATION_INFO

This package allows you to insert your application specific information into the dynamic dictionary tables. This is very handy for implementing certain logics and for analyzing and tuning. I mention this package here mainly for analyzing and tuning purposes. By using this feature, you can integrate your application into the database and find vital information at any point of time, such as what is presently running in the database and from which part of the application a particular query is being executed and what the user sessions are doing.
Any time a performance issue arises, look at the application specific information to identify the problematic area instantly! Application specific information is set in dictionary tables such as V$SESSION (MODULE, ACTION, CLIENT_INFO columns).

Gathering statistics with DBMS_STATS

The following lists the procedures in the DBMS_STATS package for gathering statistics:

 Statistics Gathering Procedures in the DBMS_STATS Package
 Procedure Collects

GATHER_INDEX_STATS

Index statistics

GATHER_TABLE_STATS

Table, column, and index statistics

GATHER_SCHEMA_STATS

Statistics for all objects in a schema

GATHER_DATABASE_STATS

Statistics for all objects in a database

GATHER_SYSTEM_STATS

CPU and I/O statistics for the system


Gathering System Statistics

System statistics enable the optimizer to consider a system's I/O and CPU performance and utilization. For each plan candidate, the optimizer computes estimates for I/O and CPU costs. It is important to know the system characteristics to pick the most efficient plan with optimal proportion between I/O and CPU cost.

System I/O characteristics depend on many factors and do not stay constant all the time. Using system statistics management routines, database administrators can capture statistics in the interval of time when the system has the most common workload. For example, database applications can process OLTP transactions during the day and run OLAP reports at night. Administrators can gather statistics for both states and activate appropriate OLTP or OLAP statistics when needed. This enables the optimizer to generate relevant costs with respect to available system resource plans.

When Oracle generates system statistics, it analyzes system activity in a specified period of time. Unlike table, index, or column statistics, Oracle does not invalidate already parsed SQL statements when system statistics get updated. All new SQL statements are parsed using new statistics. Oracle Corporation highly recommends that you gather system statistics.

The DBMS_STATS.GATHER_SYSTEM_STATS routine collects system statistics in a user-defined timeframe. You can also set system statistics values explicitly using DBMS_STATS.SET_SYSTEM_STATS. Use DBMS_STATS.GET_SYSTEM_STATS to verify system statistics.

The following example shows database applications processing OLTP transactions during the day and running reports at night. First, system statistics must be collected. The values in this example are user-defined; in other words, you must determine an appropriate time interval and name for your environment.

Generating System Statistics

Gather statistics during the day. Gathering ends after 720 minutes and is stored in the mystats table:
BEGIN
DBMS_STATS.GATHER_SYSTEM_STATS(
gathering_mode => 'interval',
interval => 720,
stattab => 'mystats',
statid => 'OLTP');
END;
/

Gather statistics during the night. Gathering ends after 720 minutes and is stored in the mystats table:

BEGIN
DBMS_STATS.GATHER_SYSTEM_STATS(
gathering_mode => 'interval',
interval => 720,
stattab => 'mystats',
statid => 'OLAP');
END;
/

If appropriate, you can switch between the statistics gathered. It is possible to automate this process by submitting a job to update the dictionary with appropriate statistics. During the day, the following jobs import the OLTP statistics for the daytime run:

VARIABLE jobno number;
BEGIN
DBMS_JOB.SUBMIT(:jobno,
'DBMS_STATS.IMPORT_SYSTEM_STATS(''mystats'',''OLTP'');'
SYSDATE, 'SYSDATE + 1');
COMMIT;
END;
/

During the night, the following jobs import the OLAP statistics for the nighttime run:

BEGIN
DBMS_JOB.SUBMIT(:jobno,
'DBMS_STATS.IMPORT_SYSTEM_STATS(''mystats'',''OLAP'');'
SYSDATE + 0.5, 'SYSDATE + 1');
COMMIT;
END;
/



DBMS_STATS.GATHER_DATABASE_STATS
Can gather statistics on all the tables and indexes in a database.
Some parameters  :

- estimate_percent
o NULL means compute
o The valid range is [0.000001,100].
o DBMS_STATS.AUTO_SAMPLE_SIZE so Oracle determines the best sample size for good statistics (Recommended value 9i).

- block_sample
o Only pertinent when doing an estimate statistics.
o Use or not random block sampling instead of random row sampling.
o Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated.

- method_opt
o DEFAULT 'FOR ALL COLUMNS SIZE 1',
o FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
o FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...],
o size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}
. integer Number of histogram buckets. Valid Range [1,254]
. REPEAT Collects histograms only on the columns that already have histograms (9i).
. AUTO Oracle determines the columns to collect histograms based on data distribution and the workload of the columns (9i).
. SKEWONLY Oracle determines the columns to collect histograms based on the data distribution of the columns (9i).

- degree
o Degree of parallelism.
o NULL means use the table default value
o DBMS_STATS.DEFAULT_DEGREE means value based on the init parameters (9i).

- granularity
o Only pertinent if the table is partitioned.
o DEFAULT: Gather global- and partition-level statistics.
o GLOBAL: Gather global statistics.
o PARTITION: Gather partition-level statistics.
o SUBPARTITION: Gather subpartition-level statistics.
o ALL: Gather all (subpartition, partition, and global)stats.

- cascade
o Default (CASCADE => FALSE). No Index Gathering.
o Gathers statistics on the indexes as well (TRUE).
o Index statistics gathering is not parallelized except in 9iR2
o Equivalent to gather_index_stats procedure on each of the indexes in the database in addition to gathering table and column statistics.

- options
o GATHER: Default. Gathers statistics on all objects.
o GATHER AUTO: Gathers all necessary statistics automatically.
Oracle implicitly determines which objects need new statistics, and determines how to gather those statistics (9i).
o GATHER STALE: Gathers statistics on stale objects as determined by looking at the *_tab_modifications views.
o GATHER EMPTY: Gathers statistics on objects which currently have no statistics.

- gather_sys
o Gathers statistics on the objects owned by the 'SYS' user (TRUE)



SQL Source - Dynamic Method for TABLES

DECLARE
sql_stmt    VARCHAR2(1024);
BEGIN
  FOR tab_rec IN (SELECT owner,table_name
                    FROM all_tables WHERE owner like UPPER('&1') ) LOOP
        sql_stmt := 'BEGIN dbms_stats.gather_table_stats 
                     (ownname  => :1,
                      tabname    => :2,
                      partname   => null,
                      estimate_percent => 10,
                      degree => 3 ,
                      cascade => true);  END;'  ;
      EXECUTE IMMEDIATE sql_stmt USING tab_rec.owner, tab_rec.table_name ;

  END LOOP;
END;
/


Gathering Index Statistics

Oracle can gather some statistics automatically while creating or rebuilding a B-tree or bitmap index. The COMPUTE STATISTICS option of CREATE INDEX or ALTER INDEX ... REBUILD enables this gathering of statistics.

The statistics that Oracle gathers for the COMPUTE STATISTICS option depend on whether the index is partitioned or nonpartitioned.

To ensure correctness of the statistics, Oracle always uses base tables when creating an index with the COMPUTE STATISTICS option, even if another index is available that could be used to create the index.
If you do not use the COMPUTE STATISTICS clause, or if you have made significant changes to the data, then use the DBMS_STATS.GATHER_INDEX_STATS procedure to collect index statistics.


Gathering Statistics for Function-Based Indexes
You should analyze the table after creating a function-based index, to allow Oracle to collect column statistics equivalent information for the expression. Optionally, you can collect histograms for the index expressions by specifying for all hidden columns size number_of_buckets in the METHOD_OPT argument to the DBMS_STATS procedures.

Gathering New Optimizer Statistics

Before gathering new statistics for a particular schema, use the DBMS_STATS.EXPORT_SCHEMA_STATS procedure to extract and save existing statistics. Then use DBMS_STATS.GATHER_SCHEMA_STATS to gather new statistics. You can implement both of these with a single call to the GATHER_SCHEMA_STATS procedure, by specifying additional parameters.

Designating Tables for Monitoring and Automated Statistics Gathering

Before you can utilize automated statistics gathering for a particular table, you must bring either the tables of a specific schema or a complete database into the monitoring mode. Do this with the DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING or DBMS_STATS.ALTER_DATABASE_TAB_MONITORING procedures. Alternatively, you can enable the monitoring attribute using the MONITORING keyword. This keyword is part of the CREATE TABLE and ALTER TABLE statement syntax. Monitoring tracks the approximate number of INSERTs, UPDATEs, and DELETEs for that table since the last time statistics were gathered. Oracle uses this data to identify tables with stale statistics. Then, you can enable automated statistics gathering by setting up a recurring job (perhaps by using job queues) that invokes DBMS_STATS.GATHER_TABLE_STATS with the GATHER STALE option at an appropriate interval for your application.

Objects are considered stale when 10% of the total rows have been changed. When you issue GATHER_TABLE_STATS with GATHER STALE, the procedure checks the USER_TAB_MODIFICATIONS view. If a monitored table has been modified more than 10%, then statistics are gathered again. The information about changes of tables, as shown in the USER_TAB_MODIFICATIONS view, can be flushed from the SGA into the data dictionary with the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure.


Note:

There can be a few minutes delay while Oracle propagates information to this view.


To disable monitoring, use the DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING or DBMS_STATS.ALTER_DATABASE_TAB_MONITORING procedures, or use the NOMONITORING keyword. Example:

SQL> exec dbms_stats.ALTER_SCHEMA_TABLE_MONITORING('<owner>',TRUE);

SQL> exec dbms_stats.ALTER_DATABASE_TABLE_MONITORING(TRUE);


==================
Oracle9i Examples:
==================
In Oracle9i, only 1 statement is required to gather statistics on objects which are new or stale. This is because GATHER AUTO is no

Oracle9i - Gathering statistics for objects in a schema:
--------------------------------------------------------
exec dbms_stats.gather_schema_stats(
ownname => NULL,
estimate_percent => 100, --
The estimate_percent can be automatically determined in 9i using: estimate_percent => dbms_stats.auto_sample_size,
granularity => 'ALL',
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
degree => DBMS_STATS.DEFAULT_DEGREE,
options => 'GATHER AUTO',
cascade => TRUE
);

Oracle9i - Gathering statistics for objects in the database:
------------------------------------------------------------
exec dbms_stats.gather_database_stats(
estimate_percent => 100, --
The estimate_percent can be automatically determined in 9i using: estimate_percent => dbms_stats.auto_sample_size,
granularity => 'ALL',
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
degree => DBMS_STATS.DEFAULT_DEGREE,
options => 'GATHER AUTO',
cascade => TRUE
);

By using the constant DBMS_STATS.AUTO_SAMPLE_SIZE, Oracle automatically determines the best sample size for good statistics.

Enabling Automated Statistics Gathering

The GATHER STALE option gathers statistics only for tables that have stale statistics and for which you have enabled monitoring.

The GATHER STALE option maintains up-to-date statistics for the cost-based optimizer. Using this option at regular intervals also avoids the overhead associated with gathering statistics on all tables at one time. The GATHER option can incur much more overhead, because this option generally gathers statistics for a greater number of tables than GATHER STALE.

Use a script or job scheduling tool for the GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures to establish a frequency of statistics collection that is appropriate for the application. The frequency of collection intervals should balance the task of providing accurate statistics for the optimizer against the processing overhead incurred by the statistics collection process

Verifying Index Statistics

SELECT INDEX_NAME "NAME", NUM_ROWS, DISTINCT_KEYS "DISTINCT",
LEAF_BLOCKS, CLUSTERING_FACTOR "CF", BLEVEL "LEVEL",
AVG_LEAF_BLOCKS_PER_KEY "ALFBPKEY"
FROM DBA_INDEXES
WHERE OWNER = 'SH'
ORDER BY INDEX_NAME;

Typical output is:

NAME                       NUM_ROWS DISTINCT LEAF_BLOCKS      CF   LEVEL   ALFBPKEY
-------------------------- -------- -------- ----------- ------- ------- ----------
CUSTOMERS_PK 50000 50000 454 4405 2 1
PRODUCTS_PK 10000 10000 90 1552 1 1
PRODUCTS_PROD_CAT_IX 10000 4 99 4422 1 24
PRODUCTS_PROD_SUBCAT_IX 10000 37 170 6148 2 4
SALES_PROD_BIX 6287 909 1480 6287 1 1
SALES_PROMO_BIX 4727 459 570 4727 1 1


Verifying Column Statistics

SELECT COLUMN_NAME, NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS, DENSITY
FROM DBA_TAB_COL_STATISTICS
WHERE TABLE_NAME ="PA_EXPENDITURE_ITEMS_ALL"
ORDER BY COLUMN_NAME;

This returns the following data:

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS NUM_BUCKETS    DENSITY 
------------------------------ ------------ ---------- ----------- ----------
BURDEN_COST 4300 71957 1 .000232558
BURDEN_COST_RATE 675 7376401 1 .001481481
CONVERTED_FLAG 1 16793903 1 1 poor
COST_BURDEN_DISTRIBUTED_FLAG 2 15796 1 .5 poor
COST_DISTRIBUTED_FLAG 2 0 1 .5 poor
COST_IND_COMPILED_SET_ID 87 6153143 1 .011494253
EXPENDITURE_ID 1171831 0 1 8.5337E-07 GOOD
TASK_ID 8648 0 1 .000115634
TRANSFERRED_FROM_EXP_ITEM_ID 1233787 15568891 1 8.1051E-07 GOOD



Using Histograms

The cost-based optimizer can use data value histograms to get accurate estimates of the distribution of column data. A histogram partitions the values in the column into bands, so that all column values in a band fall within the same range. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with nonuniform data distributions.

One of the fundamental tasks of the cost-based optimizer is determining the selectivity of predicates that appear in queries. Selectivity estimates are used to decide when to use an index and the order in which to join tables. Some attribute domains (a table's columns) are not uniformly distributed.

The cost-based optimizer uses height-based histograms on specified attributes to describe the distributions of nonuniform domains. In a height-based histogram, the column values are divided into bands so that each band contains approximately the same number of values. The useful information that the histogram provides, then, is where in the range of values the endpoints fall.

When to Use Histograms

Histograms can affect performance and should be used only when they substantially improve query plans. Histogram statistics data is persistent, so the space required to save the data depends on the sample size. In general, create histograms on columns that are used frequently in WHERE clauses of queries and have a highly skewed data distribution. For uniformly distributed data, the cost-based optimizer can make fairly accurate guesses about the cost of executing a particular statement without the use of histograms.

Histograms, like all other optimizer statistics, are static. They are useful only when they reflect the current data distribution of a given column. (The data in the column can change as long as the distribution remains constant.) If the data distribution of a column changes frequently, you must recompute its histogram frequently.

Histograms are not useful for columns with the following characteristics:

Creating Histograms

You generate histograms by using the DBMS_STATS package. You can generate histograms for columns of a table or partition. For example, to create a 10-bucket histogram on the SAL column of the emp table, issue the following statement:

EXECUTE DBMS_STATS.GATHER_TABLE_STATS 
('scott','emp', METHOD_OPT => 'FOR COLUMNS SIZE 10 sal');

The SIZE keyword declares the maximum number of buckets for the histogram. You would create a histogram on the SAL column if there were an unusually high number of employees with the same salary and few employees with other salaries. You can also collect histograms for a single partition of a table.

Oracle Corporation recommends using the DBMS_STATS package to have the database automatically decide which columns need histograms. This is done by specifying SIZE AUTO.

Example Using a Histogram to Improve an Execution Plan

UPDATE so_lines l 
SET open_flag=null,
s6=10,
s6_date=sysdate,
WHERE l.line_type_code in ('REGULAR','DETAIL','RETURN') AND
l.open_flag = 'Y' AND NVL(l.shipped_quantity, 0)=0 OR
NVL(l.shipped_quantity, 0) != 0 AND
l.shipped_quantity +NVL(l.cancelled_quantity, 0)= l.ordered_quantity)) AND
l.s6=18

This query shows the skewed distribution of data values for s6. In this case, there are two distinct non-null values: 10 and 18. The majority of the rows consists of s6 = 10 (1,589,464), while a small number of rows consist of s6 = 18 (13,091).

S6:      COUNT(*)
======================
10 1,589,464
18 13,091
NULL 21,889

The selectivity of column s6, where s6 = 18:

S = 13,091 / (13,091 + 1,589,464) = 0.008

If No Histogram is Used: The selectivity of column s6 is assumed to be 50%, uniformly distributed across 10 and 18. This is not selective; therefore, s6 is not an ideal choice for use as an index.

If a Histogram is Used: The data distribution information is stored in the dictionary. This allows the optimizer to use this information and compute the correct selectivity based on the data distribution. In this example, the selectivity, based on the histogram data, is 0.008. This is a relatively high, or good, selectivity, which leads the optimizer to use an index on column s6 in the execution plan.


- DBMS_STATS.GATHER_TABLE_STATS

DBMS_STATS.GATHER_TABLE_STATS gathers statistics for a table and its columns, and optionally the associated indexes. Syntax:
dbms_stats.gather_table_stats(ownname, tabname, partname,
                                              estimate_percent, block_sample,
                                              method_opt, degree, granularity,
                                              cascade, stattab, statid, statown);

The first two parameters are mandatory, the rest are defaulted to a value.

e.g.: Estimate statistics for a table and its columns
exec dbms_stats.gather_table_stats (ownname => 'INV', tabname => 'MTL_MATERIAL_TRX', estimate_percent => 5);

e.g.: Estimate statistics for a table, its columns and indexes.
exec dbms_stats.gather_table_stats(ownname => 'APPS', tabname => 'AM21', estimate_percent => 5, cascade => true);

e.g.: Estimate statistics in parallel, the following uses 8 threads to complete the task

Session - A
exec dbms_stats.gather_table_stats(ownname => 'INV', tabname => 'MTL_MATERIAL_TRX', estimate_percent => 5, degree => 8);

Session - B (When the above process is running)
select * from v$px_process;

SERV|STATUS   |       PID|SPID     |       SID|  SERIAL#
____|_________|__________|_________|__________|__________
P000|IN USE | 50|9684 | 7| 50586
P001|IN USE | 65|9686 | 60| 51561
P002|IN USE | 66|9688 | 17| 2694
P003|IN USE | 67|9690 | 30| 39243
P004|IN USE | 68|9692 | 74| 11017
P005|IN USE | 69|9694 | 48| 4253
P006|IN USE | 70|9696 | 76| 17
P007|IN USE | 71|9698 | 68| 1285



e.g.:
Estimate statistics for columns in a table, this will also generate statistics for tables.
exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM01', estimate_percent => 5, method_opt => 'FOR ALL COLUMNS');


e.g.:
The below example allows generation of column statistics in parallel. The degree of the table is initially set to 8 and the "SIZE 1" makes use of this. Refer to the Histogram section below to find out about the SIZE
option.
exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM01',estimate_percent => 5, method_opt => 'FOR ALL COLUMNS SIZE 1');

Statistics for Global Temporary tables

No statistics are collected for Global Temporary Tables; handle these with care. Make use of hints to drive queries on these. Provide explicit statistics by using DBMS_STATS.SET_TABLE_STATS if your temporary table is used for processing huge amounts of data.
select num_rows, blocks, avg_row_len, temporary, user_stats
from dba_tables
where table_name = 'AM21';
NUM_ROWS     BLOCKS  AVG_ROW_LEN  T USE
---------- ---------- ----------- - ---
Y NO


exec dbms_stats.gather_table_stats('DATASTAGE', 'TMP_GL_VALUE_DIFF');
*
ERROR at line 1:
ORA-20000: Gathering statistics for a temporary table is not supported

exec dbms_stats.set_table_stats(ownname => 'SYS', tabname => 'AM21', numrows => 3000, numblks => 300, avgrlen => 50);

select num_rows, blocks, avg_row_len, temporary, user_stats
from dba_tables
where table_name = 'AM21';

NUM_ROWS BLOCKS AVG_ROW_LEN T USE
---------- ---------- ----------- - ---
3000 300 50 Y YES



- DBMS_STATS.GATHER_INDEX_STATS

DBMS_STATS.GATHER_INDEX_STATS gathers statistics for indexes. Index statistics cannot be generated in parallel. In addition, the block sampling option available in tables is not available for indexes. Syntax:
dbms_stats.gather_index_stats(ownname, indname, partname, estimate_percent, stattab, statid, statown);

e.g.:
exec dbms_stats.gather_index_stats (ownname => 'INV', indname => 'MTL_SYSTEM_ITEMS_JHN99');



DBMS_STATS.GATHER_SCHEMA_STATS

DBMS_STATS.GATHER_SCHEMA_STATS gathers statistics for a given schema. Syntax:
dbms_stats.gather_schema_stats(ownname, estimate_percent, block_sample,
                                                             method_opt, degree, granularity, cascade, stattab, statid, options, objlist, statown);

e.g.: Gather schema statistics, for tables and indexes at 5% estimate.
exec dbms_stats.gather_schema_stats(ownname => 'SCOTT', estimate_percent => 5, cascade => true, options => 'GATHER');

e.g.: Gather statistics for objects with no statistics. The cascade option given below does not make a difference as the GATHER EMPTY options generates for all objects without any statistics.

exec dbms_stats.gather_schema_stats(ownname => 'QP', estimate_percent => 5, cascade => true, options => 'GATHER EMPTY');

e.g.: To identify a list of objects without any statistics.
declare
l_owner varchar2(30) := 'QP';
l_emptylst dbms_stats.objecttab;
begin
dbms_stats.gather_schema_stats(ownname => l_owner,
options => 'LIST EMPTY', objlist => l_emptylst);
for i in nvl(l_emptylst.first, 0) .. nvl(l_emptylst.last, 0) loop
dbms_output.put_line(l_emptylst(i).objtype || '/' || l_emptylst(i).objname);
end loop;
end;
/


DBMS_STATS.GATHER_DATABASE_STATS

DBMS_STATS.GATHER_DATABASE_STATS gathers statistics for the complete database. In 8i, this will generate statistics for the SYS schema also. This has been rectified in Oracle 9i. For Oracle 8i, an alternative is to generate statistics for individual schemas or delete SYS schema statistics after generating statistics at the database level. Syntax:
dbms_stats.gather_database_stats(estimate_percent, block_sample, method_opt, degree, granularity,
                                 cascade, stattab, statid, options, objlist, statown);

Retrieveing statistics with DBMS_STATS

- DBMS_STATS.GET_TABLE_STATS

Get table statistics. Syntax:

dbms_stats.get_table_stats(ownname, tabname, partname, stattab, statid, numrows,
                                                 numblks, avgrlen, statown);

e.g.: getting table statistics data.

declare
l_numrows number;
l_numblks number;
l_avgrlen number;
begin
dbms_stats.get_table_stats(ownname => 'SYS', tabname => 'AM01',
numrows => l_numrows, numblks => l_numblks, avgrlen => l_avgrlen);
dbms_output.put_line('No. of rows: ' || l_numrows);
dbms_output.put_line('No. of blks: ' || l_numblks);
dbms_output.put_line('Avg row length: ' || l_avgrlen);
end;
/

No. of rows: 4106860
No. of blks: 6219
Avg row length: 3

- DBMS_STATS.GET_COLUMN_STATS

Get column statistics present in the dictionary. Syntax:

dbms_stats.get_column_stats(ownname, tabname, colname, partname, stattab, statid,
                                                     distcnt, density, nullcnt, srec, avgclen, statown);

e.g.: getting statistics for a column.

 declare
l_distcnt number;
l_density number;
l_nullcnt number;
l_srec dbms_stats.statrec;
l_avgclen number;
begin
dbms_stats.get_column_stats(ownname => 'SYS', tabname => 'AM01',
colname => 'COL1', distcnt => l_distcnt, density => l_density,
nullcnt => l_nullcnt, srec => l_srec, avgclen => l_avgclen);
dbms_output.put_line('No. of distinct values: ' || l_distcnt);
dbms_output.put_line('Density: ' || l_density);
dbms_output.put_line('Count of nulls: ' || l_nullcnt);
dbms_output.put_line('Avg. column length: ' || l_avgclen);
end;
/

No. of distinct values: 2
Density: .5
Count of nulls: 0
Avg. column length: 3


- DBMS_STATS.GET_INDEX_STATS

Get index statistics. Syntax

dbms_stats.get_index_stats(ownname, indname, partname, stattab, statid, 
numrows, numlblks, numdist, avglblk, avgdblk,
clstfct, indlevel, statown);

e.g.: getting an index statistics.

 declare
l_numrows number;
l_numlblks number;
l_numdist number;
l_avglblk number;
l_avgdblk number;
l_clstfct number;
l_indlevel number;
begin
dbms_stats.get_index_stats(ownname => 'SYS', indname => 'AM01_N1',
numrows => l_numrows, numlblks => l_numlblks,
numdist => l_numdist, avglblk => l_avglblk,
avgdblk => l_avgdblk, clstfct => l_clstfct,
indlevel => l_indlevel);
dbms_output.put_line('No. of rows: ' || l_numrows);
dbms_output.put_line('No. of blks: ' || l_numlblks);
dbms_output.put_line('No. of distinct values: ' || l_numdist);
dbms_output.put_line('Avg leaf blocks for distinct keys: ' || l_avglblk);
dbms_output.put_line('Avg data blocks pointed to in the table: ' || l_avgdblk);
dbms_output.put_line('Clustering factor: ' || l_clstfct);
dbms_output.put_line('Index height: ' || l_indlevel);
end;
/

No. of rows: 3819711
No. of blks: 11092
No. of distinct values: 1
Avg leaf blocks for distinct keys: 11092
Avg data blocks pointed to in the table: 14616
Clustering factor: 14616
Index height: 2


Exporting and importing statistics with DBMS_STATS

DBMS_STATS also includes routines for gathering statistics and storing them outside the dictionary. This does not influence the optimizer. Most of the procedures in this package as mentioned above have three common parameters - STATID, STATTAB and STATOWN that are related to user processing of statistics.

Advantages of this feature:

1. Estimated statistics at different percentages could be stored and used for testing.

2. Statistics generated on one database could be transferred to another database.



Automated Implementation Example

The following example uses the scott/tiger tables to demonstrate how to implement automated table monitoring and stale statistics gathering.

-- Perform a quick analyze to load in base statistics
-- (for illustrative purposes only!)
begin
          DBMS_STATS.GATHER_SCHEMA_STATS(
          ownname=>'scott',
          estimate_percent=>null,             
-- This table is small, we'll compute
          block_sample=>false,
          method_opt=>'FOR ALL COLUMNS',
          degree=>null,                        -- No parallelism used in this example
          granularity=>'ALL',
          cascade=>true,
          options=>'GATHER'
          );
end;
/

 

-- Examine the current statistics
select table_name, num_rows, blocks, avg_row_len   
     from user_tables where table_name='EMP';

TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
------------------------------ ---------- ---------- -----------
EMP                                    14          1          37

 

-- Turn monitoring on for the EMP table
alter table emp monitoring;

Table altered.
 

-- EXTRA NOTES:
--
-- I could also choose to turn on monitoring for all of the tables within
-- Scott's schema by simply calling:

-- begin
--  dbms_stats.ALTER_SCHEMA_TAB_MONITORING('scott',true);
-- end;
--
-- Or even at the database level with:
-- begin
--   dbms_stats.alter_database_tab_monitoring(
--   monitoring=>true,
--   sysobjs=>false);      -- Don't set to true, see note below!
-- end;
--
-- Note: Although the option to collect statistics for SYS objects is available
-- via ALTER_DATABASE_TAB_MONITORING, Oracle continues to recommend
-- against this practice until the next major release after 9i rel 2

/

-- Verify that monitoring is turned on
-- (noting the fact that we didn't actually run the other
-- two monitoring statements in the notes above (commented out))

select table_name, monitoring from user_tables
  2  where owner='SCOTT'   
  3  order by 2;


TABLE_NAME                     MON
------------------------------ ---
BONUS                          NO
DEPT                           NO
DUMMY                          NO
SALGRADE                       NO
EMP                            YES

6 rows selected.
 

-- Add some rows
insert into emp select * from emp;

14 rows created.

commit;
Commit complete.

 

-- Wait until the monitoring data is flushed (after timeout or db shutdown)
-- or if you're really impatient, run the following (as a DBA-priviledged account):
exec dbms_stats.flush_database_monitoring_info;
PL/SQL procedure successfully completed.
 

-- As scott/tiger, let's check USER_TAB_MODIFICATIONS to see what it has collected:
select * from user_tab_modifications;


TABLE_NAME PARTITION_NAME SUBPARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRUNCATED
---------- -------------- ----------------- ------- ------- ------- --------- ---------
EMP 14 0 0 05-JUN-02 NO
-- Execute DBMS_STATS to gather stats on all stale tables
begin
DBMS_STATS.GATHER_SCHEMA_STATS(
ownname=>'scott',
estimate_percent=>null, -- This table is small, we'll compute
block_sample=>false,
method_opt=>'FOR ALL COLUMNS',
degree=>null, -- No parallelism used in this example
granularity=>'ALL',
cascade=>true,
options=>'GATHER STALE');
end;
/

-- Verify that the table is no longer listed in USER_TAB_MODIFICATIONS
select * from user_tab_modifications;

no rows selected.

-- Examine some of new statistics collected
select table_name, num_rows, blocks, avg_row_len   
     from user_tables where table_name='EMP';


TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
------------------------------ ---------- ---------- -----------
EMP                                    28          1          37