out of the Utlestat Report
This document is a collection of information I have pieced together from many different sources. I have reviewed all of it and I believe that the information in it can be extremely useful to everyone who wants to use utlestat/utlbstat utility and report.
The library cache
The first section of the report contains statistics generated from the V$LIBRARYCACHE view and deals with the performance of the library cache. The library cache contains the Shared SQL and PL/SQL areas, which are represented by the BODY, SQL AREA, TABLE/PROCEDURE, and TRIGGER. They contain all of the SQL and PL/SQL statements that are cached in memory. The other names--CLUSTER, INDEX, OBJECT, and PIPE--are areas that Oracle uses. The first section of the report is shown in Figure A. The column definitions are shown beneath the report.
Figure A: The first section of the report deals with the performance of the Library Cache.
This section of the report contains all the systemwide statistics. We'll try to cover as many of these statistics as possible. However, some statistics shown in Table C won't appear on your report, and you may have statistics we don't list. This variance is partly the result of having different parameters set in the init.ora. Adding the consistent gets and db block gets yields the number of logical reads. You can use the following equation to calculate the Data Cache Hit Ratio:
Hit Ratio = (Logical Reads - Physical Reads) / Logical Reads
Table C: These are the report's systemwide statistics.
This hit ratio should be above 90 percent; if it's less than 90 percent, you should increase the size of the data cache by increasing the db_block_buffers parameter in the init.ora.
Systemwide wait events
You can view the events shown in Table D by querying the V$SYSTEM_EVENT view. Alternatively, you can query the V$SESSION_EVENT view to see events as they occur.
Table D: These are the report's systemwide events.
Some of the events you'll normally find in this section are the smon timer, pmon timer, client message, and rdbms ipc message. A few that you would hope not to find are free buffer waits, buffer busy waits, and enqueue. The free buffer waits event indicates either that the DBWR(s) could not keep up or that you need to adjust how often checkpoints occur.
The buffer busy waits event indicates contention for a buffer in the SGA. You can obtain more information about this event by querying the V$SESSION_WAIT view. This view contains the P1 and P2 columns. When the event is a buffer busy waits, the P1 column will indicate the file ID of a segment and the P2 column will indicate the block ID causing the contention. You can use these two IDs to query the DBA_EXTENTS view and obtain the name of the segment causing the contention. If the segment is a table or index, you may want to increase the initrans parameter to allow more transactions per data block. Please note, however, that this also increases the amount of storage space needed for the table.
Another way to use the buffer busy waits event is to calculate the ratio of buffer busy waits to the number of logical reads. If the busy ratio is greater than 5 percent, you should examine the V$WAITSTAT view to see what types of blocks are causing the contention. The formula for the ratio is
((buffer busy waits / logical reads) * 100) = busy rate
An enqueue event in this section of the report can be an indication of a couple of things. There may be too many DDL locks or DML locks, or there could be an excessive use of sequences.
Three of the dynamic performance views ("V$" views) can provide details of Oracle wait events:
Although V$SESSION_EVENT allows you to "drill-down" on individual process waits, and V$SESSION_WAIT allows you to examine the detailed location of each wait, it is the V$SYSTEM_EVENT table which I find most useful when diagnosing the overall tune of an Oracle instance. V$SYSTEM_EVENT has the following structure:
On most operating systems, you'll need to set the server parameter TIMED_STATISTICS=TRUE in order to record the elapsed timings for event waits.
A quick tour of selected events
Oracle defines a large number of events, many of which are rarely encounted or trivial in nature. What follows is a review of the events that I have most frequently encountered when tuning Oracle instances.
"db file" waits
Events starting with the phrase "db file" (db file parallel write, db file scattered read, db file sequential read, db file single write) all occur when an ORACLE session issues an I/O request against an ORACLE datafile.
Database file writes are performed only by the database writer, and so "db file" write waits are never experienced by user sessions. User sessions do, however, read data from database files directly and will almost always wait for datafile read events.
Unless your entire database is entirely cached in the SGA, waiting for database file I/O is inevitable and the presence of "db file" waits does not indicate that anything is amiss with the performance of the database. In most healthy databases, "db file" waits account for about 80-90% of all non-idle wait times.
The average wait times for db file events is an indication of the speed and efficiency of your disk IO subsystem. High speed disks should allow for average waits of between 1-2 hundredths of a second. Higher values may indicate a need to review your disk configuration.
log file sync/write waits
Just as ORACLE sessions must inevitably wait for db file I/O, they must also wait for log file I/O. Such waits will occur whenever a COMMIT statement causes a write to the redo log. The session issuing the COMMIT will wait on the log file sync event. When the LogWriter process issues the I/O request, it will wait on the log file parallel write event.
Both these wait events are inevitable and often account for between 10-20% of total non-idle wait times in a healthy database.
The average wait time for a log file parallel write is an important measure. It indicates how quickly the log writer process can flush the redo buffer and is a good indicator of the efficiency of the redo log device. Values under 1 (hundredths of a second) are good, and values of up to 5 hundredths of a second are not unusual. Values above this range may indicate contention for the redo log device.
log file space/switch
These events occur when a redo log entry cannot be made because there is no free space in the redo log buffer or when a redo log cannot be written to because it is in the process of being switched. The incidence of these events should be negligible in a well tuned database. Prior to Oracle 7.3 these the two conditions are combined in the event "log file space/switch". From 7.3 onwards, they are defined by the following events:
These waits suggest that either your redo logs are on slow devices, your log_buffer setting is too low, or you have too few or too small redo logs. If getting "log file switch (archiving needed)", consider alternating your redo logs across multiple devices to reduce contention between the LogWriter and Archiver processes.
Buffer busy waits
This event occurs when a session cannot access a needed block in the SGA because it is in use by another session. The two most common causes are insufficient free lists for a table or too few rollback segments.
You can distinguish between the two causes by considering the V$WAITSTAT table. If the predominant waits are for "data block" or "free list", then it is likely that you need to create multiple FREELISTS (using the FREELIST clause of the CREATE TABLE statement) for tables which are subject to very heavy concurrent inserts. If the leading category is for either "undo header" or "undo block" then you may need to create additional rollback segments.
Free buffer waits/write complete waits
Both these events occur when a session tries to modify or insert a block in the SGA but is unable to.
"Write complete waits" occur when a session tries to modify a block which is currently being written to disk by the database writer process. This will of course happen occasionally - especially during checkpoints - but if it is contributing significantly to overall waits then it may indicate inefficiencies in the database writer.
Free buffer waits occur when a session wishes to read a data block from a database file on disk into the buffer cache. If there are no unmodified ("clean") blocks in the buffer cache then the session will have to wait for the Database Writer process to write modified ("dirty") blocks to disk in order for free buffers to be made available. Normally, the database writer is constantly writing dirty buffers to disk and so this event should rarely occur.
If either of these events comprise a significant proportion of total waits, then you may need to improve the performance of the Database Writer process (DBWR). Some of the ways of doing this are:
Enqueue waits occur when a session waits to obtain a lock. In most cases, this will occur because of a lock on a table or row that the session wishes to lock or modify. In some circumstances, the lock involved may be an ORACLE internal lock (for instance, the Space Transaction enqueue). If the database is well tuned and the application design sound, then enqueue waits should be negligible. Common causes of excessive enqueue waits are:
latch free waits
Latches are ORACLE internal locking mechanisms that prevent multiple sessions from simultaneously updating the same item within ORACLE shared memory (SGA). If a session needs to acquire a latch which is held by another session then a latch free wait may occur.
The presence of latch free waits of any significant magnitude may indicate a bottleneck within the SGA. A detailed discussion on latch tuning would take more space than we have available here. However, the following guidelines may might a useful starting point:
The Oracle server will often record that it is waiting for a SQL*NET operation to complete. These waits all begin with "SQL*NET". Unfortunately, it is difficult to isolate the waits that are the result of network overhead from those which signify that the client process is busy performing other tasks. In particular, the event "SQL*Net message from client" will be observed when the server process is idle and is waiting for further instructions from the client process. Therefore you should usually ignore this event. If other SQL*NET waits contribute significantly to overall waits you should investigate your network overhead but remember that the waits can be occuring because of bottlenecks in the client program or - in the case of "db link" waits - in a remote server.
Events which are safe to ignore
There are some events that occur when a session is idle or waiting for instructions and which are not significant from a performance point of view. Some of the events which are usually safe to ignore (and which are disregarded by the wait_stA.sql script) are:
The events I've described above have been those that I've encountered most frequently when assessing the performance of Oracle instances. However, there are 136 events defined in Oracle 8.0.3, and not all of them can be described here.
Prior to Oracle 8, Oralce events were virtually undocumented. However, in Oracle 8, you can find brief descriptions of each of the events in Appendix A of the Oracle 8 Server Reference.
Dirty buffer write queue
You can generate the dirty buffer write queue statistic by dividing the summed dirty queue length statistic by the write requests statistic. If the value obtained is higher than the value of the _db_block_write_batch parameter, then you should increase the value of the _db_block_ write_batch parameter. You won't be able to find this parameter in the V$PARAMETER view, nor can you view it using the SQLDBA show parameter or the SVRMGR initialization screen. You'll need to query the X$KSPPI table and select the KSPPINM and KSPPIVL columns. Increasing this parameter increases the number of database blocks that can be written to simultaneously.
A possible reason for this statistic being high is having one or two extremely active data files. When you have more modified blocks than can be written at once, you require multiple I/Os, which causes blocks to wait in the queue before being written. Increasing the number of blocks that can be written simultaneously decreases the number of blocks that must wait in the queue. You can also decrease this statistic by increasing the db_block_ simultaneous_writes parameter in the init.ora.
Data file I/O
This section provides a very granular look at how the I/O is distributed across the data files. If one of the data files gets a majority of the reads and writes, you may be able to improve performance by creating multiple data files on separate disks or by stripping the data file across multiple disks. Table E, lists the events in this section.
Table E: These are the events in the data file I/O.
This part of the report sums the I/O by tablespace. You can set two parameters in the init.ora to improve the read and write time: db_file_multiblock_read_count and db_file_simultaneous_writes.
The db_file_multiblock_read_count parameter controls the number of blocks that can be read in one I/O during a full table scan. Increasing this parameter will reduce the number of I/Os needed to scan a table, thus improving the performance of the full table scan. The db_file_simultaneous_ writes parameter controls the number of data blocks in a data file that can be written to simultaneously. Increasing this parameter helps reduce the number of I/Os required to write the data blocks back to the disk after modification. Table F lists the events in this section.
Table F: These are the events in the tablespace I/O section of the report.
If the number of physical block reads is significantly higher than the number of physical reads, it indicates that you may need to review the indexes on these tables or that full table scans are being performed on the tables within the tablespace. In general, if the number of block reads is equal to the number of reads, then the tables in the tablespace were probably being accessed by a rowid, requiring the database to read only one data block.
Latches willing to wait
Processes that are trying to acquire latches are willing to sleep if the latch is unavailable. By querying the V$LATCH view, you can see how many processes had to sleep and the number of times they had to sleep. You need to remember one thing about processes that are sleeping: These processes may also be holding other latches that won't be released until the process is finished with them. This situation will cause even more processes to sleep while waiting for those latches. So, you can see how important it is to reduce contention as much as possible. Table G lists the events in this section.
Table G: The events in this section show the latches that are willing to wait.
You should examine any of the latches that have a hit ratio below .99. Some of the more common latches on this list are the redo allocation, redo copy, library cache, and cache buffers lru.
You can reduce contention for the redo allocation latch by decreasing the size of the Log_Small_Entry_Max_Size parameter. This parameter controls the maximum size of a redo entry that you can copy to the redo log using the redo allocation latch.
The Log_Simultaneous_Copies parameter determines the number of redo copy latches. Increasing this parameter helps reduce contention for these latches.
Contention for the library cache latch usually occurs when space is needed in the library cache. While space is being freed up in order to load a SQL or PL/SQL statement, the latch is being held exclusively, and other users must wait. You can help reduce contention for this latch by increasing the size of the shared pool or by pinning large SQL and PL/SQL statements in memory using the DBMS_Shared_ Pool_Keep procedure. You can reduce contention for the cache buffers lru latch by increasing the _db_block_write_batch parameter.
A couple of init.ora parameters can be set to help decrease the number of latches that sleep while trying to obtain a latch. You can set the spin_count and _latch_spin_count parameters to increase the amount of time a process will spend in the CPU trying to acquire the latch before it sleeps.
Please note: You shouldn't increase these two parameters if your application is already bound by the CPU. In this case, you may want to decrease these parameters to enable processes to sleep faster, which would help free up some of the CPU for other processes.
Latches not willing to wait
The latches in this section of the report don't wait for the latch to become available; they immediately time out and retry to obtain the latch. You can generate these statistics using the immediate_gets and the immediate_misses columns of the V$LATCH view. See the previous section for some hints on reducing contention for these latches. Table H lists the events in this section.
Table H: The events in this section show the latches that are unwilling to wait.
You can generate the statistics in this section of the report by using the V$ROLLSTAT view. Table I lists the events in this section. The optimal size of the rollbacks should be a value that will help keep the number of shrinks down but won't waste disk space.
Table I: The events in this section pertain to the rollback segment.
The ratio of Trans_Tbl_Waits to Trans_ Tbl_Gets should be less then 5 percent. If it's above 5 percent, you should increase the number of rollback segments. ((Trans_Tbl_Waits/ Trans_Tbl_Gets) * 100).
Nondefault database parameters
These parameters in the init.ora are set to a value other than the default. You can generate the list by querying the V$PARAMETER view where the ISDEFAULT column is equal to FALSE. You can use this list as a reference for tuning the database, because these parameters provide a record of how the database performed with certain values.
The dictionary cache
This section of the report contains all of the data dictionary information. This data pertains to all of the objects in the database. Oracle accesses this information for every SQL that gets parsed and again when the statement is executed. The activity in this area can be very heavy. Maintaining a good hit ratio is very important to prevent recursive calls to the database to verify privileges. You can eval-uate the efficiency of the dictionary cache by querying the V$ROWCACHE view. Table J shows the header for this section in the utlbstat/utlestat report. You have a couple of ways to calculate the hit ratio for the dictionary cache. First, you can calculate the hit ratio for the individual cache areas. Second, you can calculate the hit ratio for the entire dictionary cache. You can use the queries in Figure B to calculate the hit ratio both ways.
Table J: These are the events in the dictionary cache section of the report.
Figure B: This query retrieves the details tuning statistics from the V$SYSSTAT view.
Select Parameter, Count, Usage, Fixed, Modifications, Flushes, Gets, GetMisses, ((1 - (GetMisses / (Gets + GetMisses))) * 100) HitRat, Decode(Round(((1 - (GetMisses / (Gets + GetMisses)))), 1), 1, ' ', .9, ' ', '*') Flag_1 From V$RowCache Where Gets + GetMisses != 0 / Select Sum(Decode(Name, 'physical reads', Value, 0)) Phy_Gets, Sum(Decode(Name, 'db block gets', Value, 0)) Blk_Gets, Sum(Decode(Name, 'consistent gets', Value, 0)) Con_Gets, ((1 - (Sum(Decode(Name, 'physical reads', Value, 0)) / (Sum(Decode(Name, 'db block gets', Value, 0)) + Sum(Decode(Name, 'consistent gets', Value, 0))))) * 100) Hit_Rate From V$SySStat;