ORACLE DBA: Latches

This is a copy of Metalink's Reference Note 22908.1, for those that do not have access to it.

1. What is a latch?

Latches are low level serialization mechanisms used to protect shared data structures in the SGA. The implementation of latches is operating system dependent, particularly in regard to whether a process will wait for a latch and for how long.
A latch is a type of a lock that can be very quickly acquired and freed. Latches are typically used to prevent more than one process from executing the same piece of code at a given time. Associated with each latch is a cleanup procedure that will be called if a process dies while holding the latch. Latches have an associated level that is used to prevent deadlocks. Once a process acquires a latch at a certain level it cannot subsequently acquire a latch at a level that is equal to or less than that level (unless it acquires it nowait).

2.Latches vs. Enqueues

Enqueues are another type of locking mechanism used in Oracle.
An enqueue is a more sophisticated mechanism which permits several concurrent processes to have varying degree of sharing of "known" resources. Any object which can be concurrently used, can be protected with enqueues. A good example is of locks on tables. We allow varying levels of sharing on tables e.g. two processes can lock a table in share mode or in share update mode etc. One difference is that the enqueue is obtained using an OS specific locking mechanism. An enqueue allows the user to store a value in the lock, i.e the mode in which we are requesting it. The OS lock manager keeps track of the resources locked. If a process cannot be granted the lock because it is incompatible with the mode requested and the lock is requested with wait, the OS puts the requesting process on a wait queue which is serviced in FIFO.

Another difference between latches and enqueues is that in latches there is no ordered queue of waiters like in enqueues. Latch waiters may either use timers to wakeup and retry or spin (only in multiprocessors). Since all waiters are concurrently retrying (depending on the scheduler), anyone might get the latch and conceivably the first one to try might be the last one to get.

3. When do we need to obtain a latch?

A process acquires a latch when working with a structure in the SGA (System Global Area). It continues to hold the latch for the period of time it works with the structure. The latch is dropped when the process is finished with the structure. Each latch protects a different set of data, identified by the name of the latch.
Oracle uses atomic instructions like "test and set" for operating on latches. Processes waiting to execute a part of code for which a latch has already been obtained by some other process will wait until the latch is released. Examples are redo allocation latches, copy latches, archive control latch etc. The basic idea is to block concurrent access to shared data structures. Since the instructions to set and free latches are atomic, the OS guarantees that only one process gets it. Since it is only one instruction, it is quite fast. Latches are held for short periods of time and provide a mechanism for cleanup in case a holder dies abnormally while holding it. This cleaning is done using the services of PMON.

4. Latches request modes?

Latches request can be made in two modes: "willing-to-wait" or "no wait". Normally, latches will be requested in "willing-to-wait" mode. A request in "willing-to-wait" mode
will loop, wait, and request again until the latch is obtained. In "no wait" mode the process request the latch. If one is not available, instead of waiting, another one is requested. Only when all fail does the server process have to wait.
Examples of "willing-to-wait" latches are: shared pool and library cache latches
A example of "no wait" latches is the redo copy latch.

5. What causes latch contention?

If a required latch is busy, the process requesting it spins, tries again and if still not available, spins again. The loop is repeated up to a maximum number of times determined by the initialization parameter _SPIN_COUNT. If after this entire loop, the latch is still not available, the process must yield the CPU and go to sleep. Initially is sleeps for one centisecond. This time is doubled in every subsequent sleep.
This causes a slowdown to occur and results in additional CPU usage, until a latch is available. The CPU usage is a consequence of the "spinning" of the process. "Spinning" means that the process continues to look for the availability of the latch after certain intervals of time, during which it sleeps.

6. How to identify contention for internal latches?

Relevant data dictionary views to query:

V$LATCH
V$LATCHHOLDER
V$LATCHNAME
Each row in the V$LATCH table contains statistics for a different type of latch. The columns of the table reflect activity for different types of latch requests. The distinction between these types of requests is whether the requesting process continues to request a latch if it is unavailable:

willing-to-wait If the latch requested with a willing-to-wait
request is not available, the requesting process
waits a short time and requests the latch again.
The process continues waiting and requesting until
the latch is available.

no wait If the latch requested with an immediate request is
not available, the requesting process does not
wait, but continues processing.

V$LATCHNAME key information:
---------------------------------------
GETS Number of successful willing-to-wait requests for
a latch.

MISSES Number of times an initial willing-to-wait request
was unsuccessful.

SLEEPS Number of times a process waited a requested a latch
after an initial wiling-to-wait request.

IMMEDIATE_GETS Number of successful immediate requests for each latch.

IMMEDIATE_MISSES Number of unsuccessful immediate requests for each latch.

Calculating latch hit ratio
----------------------------

To get the Hit ratio for latches apply the following formula:

"willing-to-wait" Hit Ratio=(GETS-MISSES)/GETS
"no wait" Hit Ratio=(IMMEDIATE_GETS-IMMEDIATE_MISSES)/IMMEDIATE_GETS

This number should be close to 1. If not, tune according to the latch name

7. Useful SQL scripts to get latch information

/*
** Display System-wide latch statistics.
*/
column name format A32 truncate heading "LATCH NAME"
column pid heading "HOLDER PID"
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+)
and a.latch# = c.latch#
order by a.latch#;
/*
** Given a latch address, find out the latch name.
*/
column name format a64 heading 'Name'
select a.name from v$latchname a, v$latch b
where b.addr = '&addr'
and b.latch#=a.latch#;

/*
** Display latch statistics by latch name.
*/
column name format a32 heading 'LATCH NAME'
column pid heading 'HOLDER PID'
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+) and a.latch# = c.latch#
and c.name like '&latch_name%' order by a.latch#;

8. List of all the latches

Oracle versions might differ in the latch# assigned to the existing latches.
The following query will help you to identify all latches and the number assigned.
column name format a40 heading 'LATCH NAME'
select latch#, name from v$latchname;

9. List of latches that are of most concern to a DBA

BUFFER CACHE LATCHES: There are two main latches which protect data blocks in the buffer cache. Contention for these two latches is usually seen when a database has high I/O rates. We can reduce contention for these latches and tune them by adjusting certain init.ora parameters.

Cache buffers chains latch:

This latch is acquired whenever a block in the buffer cache is accessed (pinned).

Reducing contention for the cache buffer chains latch will usually require reducing logical I/O rates by tuning and minimizing the I/O requirements of the SQL involved. High I/O rates could be a sign of a hot block (meaning a block highly accessed).

See Note 163424.1 How To Identify a Hot Block Within The Database to correctly identify this issue.
Cache buffers LRU chain latch:

The cache buffer lru chain latch is acquired in order to introduce a new block into the buffer cache and when writing a buffer back to disk, specifically when trying to scan the LRU (least recently used) chain containing all the dirty blocks in the buffer cache.

Its possible to reduce contention for the cache buffer lru chain latch by increasing the size of the buffer cache and thereby reducing the rate at which new blocks are introduced into the buffer cache. Two parameters dictate the size of the buffer cache, DB_BLOCK_SIZE and DB_BLOCK_BUFFERS. In actuality, only the DB_BLOCK_BUFFERS can be changed without recreating the database. Caution, when tuning the buffer pool, avoid the use of additional buffers that contribute little or nothing to the cache hit ratio. A common mistake is to continue increasing the value of DB_BLOCK_BUFFERS. Such increases have no effect if you are doing full table scans or other operations that do not use the buffer cache. Multiple buffer pools can help reduce contention on this latch.You can create additional cache buffer lru chain latches by adjusting the configuration parameter DB_BLOCK_LRU_LATCHES. You may be able to reduce the load on the cache buffer chain latches by increasing the configuration parameter _DB_BLOCK_HASH_BUCKETS

REDOLOG BUFFER LATCHES: There are two Redo buffer latches, the redo allocation latch and the redo copy latch. The redo allocation latch must be acquired in order to allocate space within the buffer. If the redo log entry to be made is greater than the configuration parameter LOG_SMALL_ENTRY_MAX_SIZE, the session which acquires the redo allocation latch may copy the entry into the redo buffer immediately while holding the allocation latch. If the log entry is greater than LOG_SMALL_ENTRY_MAX_SIZE, then the session will release the redo allocation latch and will acquire the redo copy latch in order to copy the entry. There is only one redo allocation latch, but there may be up to LOG_SIMULTANEOUS_COPIES allocation latches.

Redo allocation latch:

This latch controls the allocation of space for redo entries in the redo log buffer. There is one redo allocation latch per instance.

Contention for this latch in Oracle7 can be reduced by decreasing the value of LOG_SMALL_ENTRY_MAX_SIZE on multi-cpu systems to force the use of the
redo copy latch. In Oracle8i this parameter is obsolete, so you need to consider to increase the size of the LOG_BUFFER or reduce the load of the log buffer using NOLOGGING features when possible.

Redo copy latch:

This latch is used to write redo records into the redolog buffer. This latch is waited for on both single and multi-cpu systems.

On multi-cpu systems, contention can be reduced by increasing the value of LOG_SIMULTANEOUS_COPIES (Hidden in Oracle8i) and/or increasing LOG_ENTRY_PREBUILD_THRESHOLD (undocumented in Oracle7).

LIBRARY CACHE

Library cache latch:

The library cache latches protect the cached SQL statements and objects definitions held in the library cache within the shared pool. The library cache latch must be acquired in order to add a new statement to the library cache. During a parse, Oracle searches the library cache for a matching statement. If one is not found, then Oracle will parse the SQL statement, obtain the library cache latch and insert the new SQL.

The first resource to reduce contention on this latch is to ensure that the application is reusing as much as possible SQL statement representation. Use bind variables whenever possible in the application. Misses on this latch may also be a sign that the application is parsing SQL at a high rate and may be suffering from too much parse CPU overhead.If the application is already tuned the SHARED_POOL_SIZE can be increased. Be aware that if the application is not using the library cache appropriately, the contention might be worse with a larger structure to be handled.

The _KGL_LATCH_COUNT parameter controls the number of library cache latches. The default value should be adequate, but if contention for the library cache latch cant be resolved, it may be advisable to increase this value. The default value for _KGL_LATCH_COUNT is the next prime number after CPU_COUNT. This value cannot exceed 66 (See: ).

Library cache pin latch:

The library cache pin latch must be acquired when a statement in the library cache is reexecuted. Misses on this latch occur when there is very high rates SQL execution.

There is little that can be done to reduce the load on the library cache pin latch, although using private rather than public synonyms or direct object references such as OWNER.TABLE may help.

SHARED POOL RELATED LATCHES

Shared pool latch:

While the library cache latch protects operations withing the library cache, the shared pool latchis used to protect critical operations when allocating and freeing memory in the shared pool.
If an application makes use of literal (unshared) SQL then this can severely limit scalability and throughput. The cost of parsing a new SQL statement is expensive both in terms of CPU requirements and the number of times the library cache and shared pool latches may need to be acquired and released. Before Oracle9, there use to be just one such latch to the entire database to protects the allocation of memory in the library cache. In Oracle9 multiple childs were introduced to relieve contention on this resource.

Ways to reduce the shared pool latch are, avoid hard parses when possible, parse once, execute many. Eliminating literal SQL is also useful to avoid the shared pool latch. The size of the shared_pool and use of MTS (shared server option) also greatly influences the shared pool latch. Note 62143.1 explains how to identify and correct problems with the shared pool, and shared pool latch.

Row cache objects latch:

This latch comes into play when user processes are attempting to access the cached data dictionary values.

It is not common to have contention in this latch and the only way to reduce contention for this latch is by increasing the size of the shared pool (SHARED_POOL_SIZE).

10. Tuning _SPIN_COUNT (_LATCH_SPIN_COUNT in Oracle7)

SPIN_COUNT controls how many times the process will re-try to obtain the latch before backing off and going to sleep. This basically means the process is in a tight CPU loop continually trying to get the latch for SPIN_COUNT attempts. On a single CPU system if an Oracle process tries to acquire a latch but it is held by someone else the process will release the CPU and go to sleep for a short period before trying again. However, on a multi processor system (SMP) it is possible that the process holding the latch is running on one of the other CPUs and so will potentially release the latch in the next few instructions (latches are usually held for only very short periods of time).

Performance can be adjusted by changing the value of SPIN_COUNT. If a high value is used, the latch will be attained sooner than if you use a low value. However, you may use more CPU time spinning to get the latch if you use a high value for SPIN_COUNT. You can decrease this probability of session sleeps by increasing the value of the configuration parameters _LATCH_SPIN_COUNT or SPIN_COUNT. This parameter controls the number of attempts the session will make to obtain the latch before sleeping. Spinning on the latch consumes CPU, so if you increase this parameter, you may see an increase in your systems overall CPU utilization. If your computer is near 100% CPU and your application is throughput rather than response time driven, you could consider decreasing SPIN_COUNT in order to conserve CPU. Adjusting SPIN_COUNT is trial and error. In general, only increase SPIN_COUNT if there are enough free CPU resources available on the system, and decrease it only if there is no spare CPU capacity.

To summarize latch sleeps and spin count, if you encounter latch contention and have spare CPU capacity, consider increasing the value of SPIN_COUNT. If CPU resources are at full capacity, consider decreasing the value of SPIN_COUNT.

ORACLE DBA

Tuesday, February 24, 2009

Latches

No comments:

Post a Comment