ORACLE DBA: 2009

Wednesday, December 9, 2009

Online table move - reorganisation using DBMS_REDEFINITION

Let's say you have little or no downtime available to move segments to a new tablespace.
Or, perhaps, you have very big segments in a data dictionary tablespace with very small extents.
In a tablespace, I have some segments a few GB each, with thousands KB extents.
If you execute a DDL statement, such as ALTER TABLE MOVE [ATM], TRUNCATE, DROP at those segments,
you will notice your session to take A LOT of time to complete, with "Row Cache Lock" and "DFS Lock Handle" waits.
If you trace your session, you will discover a lot of activity to fet$ and uet$ tables.
Since the tablespace is dictionary managed, those thousand of extents must be deleted from uet$ [used extents dictionary table]
and inserted into fet$ [free extents dictionary table] and then released.

A quick way for these situations is to use the DBMS_REDEFINITION package.
Let's say you want to move a table [ORIG TABLE], under schema [USER] from one tablespace [TBS1] to another [TBS2].

Do not forget to give the necessary quota to [USER] in [TBS2].

First of all, check if [ORIG TABLE] can be used by the redefinition process, by executing:

EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE('[USER]','[ORIG TABLE]');

If you get an error, is probably because [ORIG TABLE] has no primary key.
You may use ROWID, so the following execution should return no error:

EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE('[USER]','[ORIG TABLE]',2);

Now, create an interim table [INT TABLE], an empty, exact copy of [ORIG TABLE].
You may use a "Create As Select" [CTAS] statement with "where 1=2" clause, but know that if a column has a default value set, it will not be transferred.
The best way is to get the creation script of [ORIG TABLE] and change it accordingly to create the [INT TABLE].
You can get it from a tool like TOAD, or by executing the following command:

SELECT DBMS_METADATA.GET_DDL('TABLE','[ORIG TABLE]','[USER]') FROM DUAL;

You should not transfer the "Big size, Small extent" problem to [INT TABLE].
A rule of thumb could be to use the 1/10 of current table size for INITIAL EXTENT and the 1/100 for NEXT EXTENT:

SELECT SEGMENT_NAME, BYTES/1024 SIZE_K,CEIL(BYTES/1024/10) INITIAL_K, CEIL(BYTES/1024/100) NEXT_K
FROM DBA_SEGMENTS
WHERE SEGMENT_NAME = '[ORIG TABLE]'
AND OWNER = '[USER]';

At this point, you have [ORIG TABLE] in tablespace [TBS1] and an empty [INT TABLE] in tablespace [TBS2].

Start the redefinition process, by issuing the following:
If your table is huge, then parallelize first:

ALTER SESSION FORCE PARALLEL DML PARALLEL [#];
ALTER SESSION FORCE PARALLEL QUERY PARALLEL [#];

Then for PK method:

EXEC DBMS_REDEFINITION.START_REDEF_TABLE(uname=>'[USER]',orig_table=>'[ORIG TABLE]',int_table=>'[INT TABLE]');

Or for ROWID method:

EXEC DBMS_REDEFINITION.START_REDEF_TABLE(uname=>'[USER]',orig_table=>'[ORIG TABLE]',int_table=>'[INT TABLE]',options_flag=>2);

If your database is 9i, you should create the appropriate triggers, constraints, privileges and indexes manually to [INT TABLE].
If your database is 10g or higher, you may use COPY_TABLE_DEPENDENTS procedure to transfer constraints, triggers and privileges to [INT TABLE]:

DECLARE
   ERR   PLS_INTEGER;
BEGIN
   DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS (UNAME           => '[USER]',
                                            ORIG_TABLE      => '[ORIG TABLE]',
                                            INT_TABLE       => '[INT TABLE]',
                                            COPY_INDEXES    => 0,
                                            NUM_ERRORS      => ERR,
                                            IGNORE_ERRORS   => TRUE);
END;
/

Better use creation scripts for indexes, and not this procedure.
Set TRUE for ignore_errors, because the NOT NULL constraints will be already transfered and setting it to FALSE will result to an error.
Just after the execution, query the DBA_REDEFINITION_ERRORS view for errors not related to NOT NULL constraints and investigate them.

Now, you may finish the redefinition process:

EXEC DBMS_REDEFINITION.FINISH_REDEF_TABLE(uname=>'[USER]',orig_table=>'[ORIG TABLE]',int_table=>'[INT TABLE]');

Querying the DBA_REDEFINITION_OBJECTS view should return no rows.

Now you should have [ORIG TABLE] in tablespace [TBS2] and [INT TABLE] under tablespace [TBS1], which you can safely DROP.

Know this, DML statements [INSERT, UPDATE, DELETE] are permitted during the redefinition process.
The committed changes will be written to [ORIG TABLE] only, until you execute the FINISH_REDEF_TABLE and synchronize both tables.
The goal, which is actually achieved, is that the committed changes will always be written to [ORIG TABLE], during and after the redefinition process.
BUT any uncommitted transactions will result to a "Wait for Table Lock" wait to the session running the FINISH_REDEF_TABLE procedure.
ONLY when these transactions are committed the FINISH_REDEF_TABLE will be successfully completed and, conclusively, the whole redefinition process.

Thursday, November 12, 2009

Create db link and global_name parameter

When you create a database link with a name for example "TEST" then you might notice that oracle adds an extra ".WORLD" or something else.

This extra addition comes from the GLOBAL_NAME of the database.
To see what is the GLOBAL_NAME of your database, execute:

SELECT * FROM GLOBAL_NAME;

To change the global_name of you database execute:

ALTER DATABASE RENAME GLOBAL_NAME TO NEW_NAME;

Then if you create a new database link with name XXX, then oracle will add the extra "NEW_NAME", so it will be XXX.NEW_NAME

Be careful, there is the also a parameter called "global_names":

SELECT * FROM V$PARAMETER WHERE NAME = 'global_names';

Usually takes a value of FALSE.

Tip: When GLOBAL_NAMES=TRUE then the db link name MUST be identical to the global_name of the db to which it connects.

Tuesday, June 2, 2009

Oracle Quoting error: ORA-06508 on submit order

When a user submits a new order, sometimes will get an error:

Error in Line 1.0: ORA-06508:PL/SQL:could not find program unit being called in Package OM_TAX_UTIL Procedure Tax_Line

This error is persistent and relogin does not allow the user to bypass the error.
This framework uses Java Apache processes to connect to the DB, and not the default connection method, seen when a form opens a session and closes it when it is done.
These Apache processes stay connected to the DB, until an Apache restart/shutdown occurs.

We enable tracing for ORA-06508:

ALTER SYSTEM SET EVENTS '6508 trace name errorstack level 3';

The trace files created by the "submit orders" attempts contain the following group of ORA errors:

ORA-04061: existing state of package body "APPS.ARP_PROCESS_TAX" has been invalidated

ORA-04065: not executed, altered or dropped package body "APPS.ARP_PROCESS_TAX"

ORA-06508: PL/SQL: could not find program unit being called

In OM_TAX_UTIL.TAX_LINE there are calls to ARP_PROCESS_TAX's procedures.
ARP_PROCESS_TAX package gets its body invalidated by a AR purge process, which executes DDL operations on a few AR interface tables.
This process recompiles every object it invalidates, before it finishes.
So, ARP_PROCESS_TAX spec and body have VALID status, when its procedures are called by TAX_LINE.
The key for this problem is the fact that the Apache processes that handle the Quoting requests, stay always connected to the database.
For some reason (probably Bug 2747350) the new package state is not picked up.

A demonstration is following to simulate the issue.
We create a table:

CREATE TABLE SYSTEM.TEST_TABLE
TABLESPACE TOOLS
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
NOMONITORING
AS
   SELECT * FROM DBA_OBJECTS;

A view on this table:

CREATE OR REPLACE FORCE VIEW SYSTEM.TEST_VIEW
(
   OWNER,
   OBJECT_NAME,
   SUBOBJECT_NAME,
   OBJECT_ID,
   DATA_OBJECT_ID,
   OBJECT_TYPE,
   CREATED,
   LAST_DDL_TIME,
   TIMESTAMP,
   STATUS,
   TEMPORARY,
   GENERATED,
   SECONDARY
)
AS
   SELECT "OWNER",
          "OBJECT_NAME",
          "SUBOBJECT_NAME",
          "OBJECT_ID",
          "DATA_OBJECT_ID",
          "OBJECT_TYPE",
          "CREATED",
          "LAST_DDL_TIME",
          "TIMESTAMP",
          "STATUS",
          "TEMPORARY",
          "GENERATED",
          "SECONDARY"
     FROM TEST_TABLE;

A package which uses this view, and is equivalent to ARP_PROCESS_TAX:

CREATE OR REPLACE PACKAGE TEST_PKG
IS
   X   CONSTANT NUMBER := 1;
   GVAR         VARCHAR2 (50);

   FUNCTION GETINFO
      RETURN VARCHAR2;
END;
/

CREATE OR REPLACE PACKAGE BODY TEST_PKG
IS
   FUNCTION GETINFO
      RETURN VARCHAR2
   IS
   BEGIN
      SELECT OBJECT_NAME
        INTO GVAR
        FROM TEST_VIEW
       WHERE OBJECT_ID = 100;
      RETURN GVAR;
   END;
END;
/

A package which calls test_pkg.getInfo and has exception handling of ORA-6508 error.
This is equivalent to the OM_TAX_UTIL package:

CREATE OR REPLACE PACKAGE TEST_PKG_RUN
IS
   X   CONSTANT NUMBER := 1;

   PROCEDURE RUNPROC;
END;
/

CREATE OR REPLACE PACKAGE BODY TEST_PKG_RUN
IS
   PROCEDURE RUNPROC
   IS
      PACKAGE_EXCEPTION   EXCEPTION;
      PRAGMA EXCEPTION_INIT (PACKAGE_EXCEPTION, -6508);
      D                   VARCHAR2 (50);
   BEGIN
      D := TEST_PKG.GETINFO ();
      DBMS_OUTPUT.PUT_LINE (D);
      DBMS_LOCK.SLEEP (10);           --> to allow me time to recreate package
   EXCEPTION
      WHEN PACKAGE_EXCEPTION
      THEN
         DBMS_OUTPUT.PUT_LINE ('Called failed with ' || SQLCODE);
   END;
END;
/

We open one session [1] and run test_pkg_run.runproc:

SQL> conn system@sme_gnvdev
Enter password: *******
Connected.

SQL> set serveroutput on
SQL> exec test_pkg_run.runproc;
I_IDL_UB11
PL/SQL procedure successfully completed.

We open a second session [2] and modify test_view, which invalidates test_pkg's body:

SQL> conn system@sme_gnvdev
Enter password: *******
Connected.

SQL> CREATE OR REPLACE FORCE VIEW SYSTEM.TEST_VIEW
2 (
3 OWNER,
4 OBJECT_NAME,
5 SUBOBJECT_NAME,
6 OBJECT_ID,
7 DATA_OBJECT_ID,
8 OBJECT_TYPE,
9 CREATED,
10 LAST_DDL_TIME,
11 --TIMESTAMP,
12 STATUS,
13 TEMPORARY,
14 GENERATED,
15 SECONDARY
16 )
17 AS
18 SELECT "OWNER",
19 "OBJECT_NAME",
20 "SUBOBJECT_NAME",
21 "OBJECT_ID",
22 "DATA_OBJECT_ID",
23 "OBJECT_TYPE",
24 "CREATED",
25 "LAST_DDL_TIME",
26 -- "TIMESTAMP",
27 "STATUS",
28 "TEMPORARY",
29 "GENERATED",
30 "SECONDARY"
31 FROM test_table;
View created.

SQL> select object_type,status
2 from dba_objects where object_name='TEST_PKG';

OBJECT_TYPE STATUS
------------------
PACKAGE VALID
PACKAGE BODY INVALID

In [2] we compile test_pkg's body and validate it:

SQL> alter package test_pkg compile body;
Package body altered.

SQL> select object_type,status
2 from dba_objects where object_name='TEST_PKG';

OBJECT_TYPE STATUS
------------------
PACKAGE VALID
PACKAGE BODY VALID

In [1] any execution of test_pkg_run.runproc results to a ORA-6508 error:

SQL> set serveroutput on
SQL> exec test_pkg_run.runproc;
Called failed with -6508
PL/SQL procedure successfully completed.

SQL> set serveroutput on
SQL> exec test_pkg_run.runproc;
Called failed with -6508
PL/SQL procedure successfully completed.

SQL> set serveroutput on
SQL> exec test_pkg_run.runproc;
Called failed with -6508
PL/SQL procedure successfully completed.

Oracle has filed this issue under Bug 8613161: PRAGMA EXCEPTION_INIT MASKS ORA-4068 AND PACKAGE IS NOT RE-INSTANTIATED PROBLEM:

1. Clear description of the problem encountered:

A PL/SQL package declares a user defined exception using PRAGMA EXCEPTION_INIT for ORA-6508. When this error occurs a second user defined exception is raised which appears to mask the underlying ORA-4068 error which accompanies the ORA-6508 error. This has the effect of preventing the Package from being re-instantiated in the session even though the underlying cause of the ORA-6508 error (an invalid dependent) is resolved.

procedure runproc is
package_exception exception;
rzy_except exception;
PRAGMA EXCEPTION_INIT (package_exception, -6508);
PRAGMA EXCEPTION_INIT (rzy_except, -20001);
d varchar2(50);
begin
d:=test_pkg.getInfo();
dbms_output.PUT_LINE(d);
dbms_lock.sleep(10); --> to allow me time to recreate package
exception
when package_exception then
dbms_output.put_line('Called failed with '||sqlcode);
--raise;
raise rzy_except;
end;

If the raised user defined exception (raise rzy_except;) is replaced with the raise statement, the ORA-4068 error is shown on the error stack and the package is re-instantiated in the session.

I have raised this bug as a P2 because the APPS customer has coded a great deal of customisation code before encountering this error and it is not feasible for them to make the necessary code changes.

=========================
DIAGNOSTIC ANALYSIS:
=========================
WORKAROUND:
Use the raise statement or reconnect to the db.
=========================
RELATED BUGS:
Bug 229349 ORA-4068 LEADS TO INSERT ALWAYS FAILING IF TRIGGER USES RAISE_APPLICATION_ERROR.
=========================
REPRODUCIBILITY:
1. State if the problem is reproducible; indicate where and predictability Reproduces every time
2. List the versions in which the problem has reproduced On Solaris Oracle Version 10.2.0.4, 11.1.0.7
3. List any versions in which the problem has not reproduced .
=========================
tc.sql - creates table, view and package.
tc1.sql - runs the package
tc2.sql - drops and recreates the view and checks the status of the package in user_objects.
Two sessions A and B
1. Run tc.sql to set up the table, view and package.
2. Run tc1.sql to run the package and straight after in session B run tc2.sql
Session A
SQL> @tc.sql
SQL> @tc1.sql
Session B
SQL> @tc2.sql

Results:
ERROR at line 1:
ORA-20001:
ORA-06512: at "SCOTT.TEST_PKG_RUN", line 17
ORA-04061: existing state of package body "SCOTT.TEST_PKG" has been invalidated
ORA-04065: not executed, altered or dropped package body "SCOTT.TEST_PKG"
ORA-06508: PL/SQL: could not find program unit being called: "SCOTT.TEST_PKG"
ORA-06512: at line 1

This error occurs every time even though the package is now valid, which can be shown in session B.
When 'raise rzy_except;' is commented out and replaced by 'raise;', the package reports the following error:

ERROR at line 1:
ORA-04068: existing state of packages has been discarded
ORA-04061: existing state of package body "SCOTT.TEST_PKG" has been invalidated
ORA-04065: not executed, altered or dropped package body "SCOTT.TEST_PKG"
ORA-06508: PL/SQL: could not find program unit being called: "SCOTT.TEST_PKG"
ORA-06512: at "SCOTT.TEST_PKG_RUN", line 16
ORA-06512: at line 1

The following call to the package works as expected as the ORA-4068 error triggers an in-instantiation of the package.
This is not a bug. The only way to force it to clear the package state, recompile the package and load the new instantiation is to allow the ORA-4068 to be raised back to the client. By trapping it you are signalling to Oracle that you do not want this to happen yet.

Apart from changing their code the only other option is to use event 10945 that reverts behaviour to 8i so that it will not raise the error at all but carry on using the old copy of the package. This also means that the package does not get recompiled on the next call to it in that session.

I would not generally recommend customers do this as, if for instance the application uses connection/session pooling, it could be that some sessions run for a long time using an out of date copy of a package.

The event can be set at session level via:
alter session set events = '10945 trace name context forever, level 1';

but they'd probably need it set system wide.

Note, this event only works as long as the old instantiation exists. If a new session calls the same currently invalid package and therefore recompiles it automatically, or an alter compile is issued, then the event will have no effect on the existing session and the errors will be raised again.

Using the testcase, with the event set in tc1.sql and the alter compile commented out of tc2.sql the package remains invalid and no errors occur.

Run the alter compile though and the errors will appear.

Tuesday, April 14, 2009

SYS.LINK$ Shows all Database Link Passwords unencrypted

From Metalink Note 202987.1:

Problem Description:
Database link passwords are stored as plain text.
A database link is a mechanism used toprovide a method of transparently accessing one server from another.
When creating a database link, a user name and password of the account on the remote server can be specified. Creating the database link without credentials works only if the user exists on both databases and has the same password.
Once this is done, all queries using the link have the privilege of the indicated account on the remote server. By omitting an account and password when creating a database link, the account and password of the user connecting through the link is used. Indicating the username and password of an account to use for all connections through a link can lead to passwords being exposed.
Database link passwords until recently (version 10gR1) were stored unencrypted in the database. Users with SELECT privilege on the SYS.LINK$ table could view the passwords in plain text. Setting up links to authenticate as the current user prevents unencrypted passwords from being exposed, prevents linked servers from being compromised, and provides increased accountability. Oracle accounts were found with permission to view the table SYS.LINK$. Access to view the table SYS.LINK$ should be restricted because database link passwords are stored unencrypted in this table.

Possible Symptoms:
If you have SELECT ANY TABLE privilege on a database, you can see the password of the user that can belong to a remote or local database(s) in the SYS.LINK$ table and using this password, you can connect these remote or local databases at will.
We rely on sys to protect link$. If customers don't trust a DBA, there are many things the DBA can do that make any encryption attempt useless.

Important change in the SELECT ANY DICTIONARY system privilege:
In Oracle release 10gR1, the access to SYS.LINK$ was removed from the SELECT ANY DICTIONARY system privilege (hence the ORA-1031 error), while this still doesn't solve the general problem: tools such as Oracle Enterprise Manager that depend on SELECT ANY DICTIONARY to be available can be deployed without access to SYS.LINK$.

Workarounds:
There are no workarounds to protect against this potential vulnerability but it is possible to use this:

Drop the database link and create a link without specifying an account and passwords.
To drop a database link, execute the command:
SQL> drop database link [link name];

To re-create a link without hard coding the password, execute the command:
SQL> create database link using [connection string];

To revoke permissions from the account or role, execute the following command:
SQL> revoke select on SYS.LINK$ from [account or role] ;

Patches:
Currently there is not a patched Installer available to deal with this problem.
One of the workarounds listed above must be used.
It is no more the case under version 10g Release 2 (10.2.0.x), the LINK$ table now contains a new column PASSWORDX that contains the encrypted database link password.
Details of the encryption scheme will not be disclosed for obvious reasons.

Friday, April 3, 2009

How to connect as a user for a short time, without knowing the password.

Let's say you want, temporarily, to connect to your database as a user, whose password you don't know, and, perhaps, you don't want to find it out.
Furthermore, when you've finished, the user should be able to connect using the same password.
Each user's password is encrypted and stored in the database in the SYS.USER$ table, under PASSWORD column and in 11g databases this is the only place you can find it, since it was removed from DBA_USERS view [Oracle note 735651.1].
You may query the DBA_USERS to get it:

SQL> SELECT PASSWORD FROM DBA_USERS WHERE USERNAME='AARGY';
PASSWORD
----------------
5161D15AA3324F0A

Now, change user's password, so you may connect to the database using that account:

SQL> ALTER USER AARGY IDENTIFIED BY TEST_123;
User altered.

SQL> CONN AARGY/TEST_123
Connected.

You've finished and it's time to restore the password, so:

SQL> ALTER USER AARGY IDENTIFIED BY VALUES '5161D15AA3324F0A';
User altered.

SQL> CONNECT AARGY/TEST_123
ERROR:
ORA-01017: invalid username/password; logon denied

Thursday, April 2, 2009

Viewing Concurrent Requests' output on external viewer stopped working

In e-BS, you have the option to view Reports' output using an external viewer, depending on the output's format.
I got a call saying suddenly users could not view a few Excel Reports. A IE was spawn, but instead of displaying the output, it closed after a few moments.
Nothing was changed in Profile Values and Viewer Options, server side. The funny thing was that it was working for some of my colleagues. The only difference was that they had OS Windows XP SP2. Indeed, users facing the problem had SP3 installed. So, obviously, SP level was the cause.
Since I did not find anything relevant in Metalink, I realized it was one of those cases, where you have to face Microsoft's and Oracle's existential quests, which make our lives so much easier and simpler.
So, I wondered why someone should install a newer SP, according to Microsoft? To make your OS "safer", of course, among other things. After realizing this tremendous fact, I decided to check IE's Security Settings. Since we are facing a download problem, let's view the Downloads Section. "Automatic prompting for file downloads" is disabled?
That looked suspicious. In SP2 its default value is enabled.
So, I enable the option, and problem solved...

Tuesday, March 3, 2009

Execution Plan changes with no apparent reason

From Metalink (Bug 4567767):
It is possible for Queries' execution plans to change without any modification in statistics or optimizer environment.
Usually it is interpreted as the plans changed "out of the blue".

The reason for the change is that density is being reevaluated as 1/ndv,
instead of taking the statistic stored in the data dictionary when the table
is reloaded to the row cache for whatever reason, like a shared pool flush.

It is not easy to catch in the act but can be seen on a 10053 trace file, when the query is hardparsed before and after the table is reloaded to the row cache.

Before:

  Column:    ISOCODE  Col#: 7      Table: PL_X_NP   Alias:  X
    NDV: 1344      NULLS: 0         DENS: 1.5152e-02    <------ From Dict.
    NO HISTOGRAM: #BKT: 1 #VAL: 2

After:

  Column:    ISOCODE  Col#: 7      Table: PL_X_NP   Alias:  X  
    NDV: 1344      NULLS: 0         DENS: 7.4405e-04    <------ 1/1344
    NO HISTOGRAM: #BKT: 1 #VAL: 2

This bug affects all 9i versions and 10g, except 10.2.0.4, installed in any platform.
Oracle's recommendation for a workaround solution, if upgrade is not possible, is:

ALTER SYSTEM SET EVENT='10139 trace name context forever' SCOPE=SPFILE;

and bounce.

Thursday, February 26, 2009

Grid Control: Create user-defined metrics at OS level

Using a scripting language of your choice, create a script that contains logic to check for the condition being monitored.
For example, scripts that check for disk space or memory usage.
All scripts to be run with User-Defined Metrics should be placed in a directory to which the Management Agent has full access privileges.
Scripts themselves must have the requisite permissions set so that they can be executed by the Management Agent.
The script runtime environment must also be configured:
If your script requires an interpreter, such as a Perl interpreter, this must be installed on that host as well.

Let's say that we want to create a metric to alert us if something is wrong with a critical OS process running to our server.
This could be the e-BS Internal Manager.
First of all, we need to create the script "/appl/oragrid/checkInternal.sh":

#!/bin/sh
PS=`ps -ef|grep FNDCPMBR|grep -v grep|wc -l`

echo em_result=$PS
if test $PS -eq 1
then
echo em_message='Internal Manager is OK.'
else
if test $PS -gt 1
then
echo em_message='Stucked Internal Manager process found.'
else
echo em_message='Internal Manager is down.'
fi
fi

Enterprise Manager needs two parameters to be set, in order to be able to process the script's output, em_result and em_message.
em_result is the value that will be compared against the warning and critical threshold you will set in the metric's creation page (see image) and raise the appropriate alert.
The default message for this alert will be: "The value is [em_result]".
For our example, "=1" means we have one Internal Manager process, which is the expected behavior.
">1" means that more than one processes are running and "<1" that there is no process running.
Both these conditions are not acceptable and an alert should be risen about them.
If we want to receive a more clear message than the default for our alert, we set the em_message parameter.
So, if "em_result=1" then "Internal Manager is OK", if "em_result>1" then "Stucked Internal Manager process found" and if "em_result<1" then "Internal Manager is down".

The next step is to create the alert from the Enterprise Manager.
Login and go to the Targets tab.
In the Hosts tab, choose the host where you created your script and to the Related Links section, click the User-defined Metrics link.
Press the Create button and you will be transferred to the page shown in the image attached.
In the Command Line field enter the full command path and the full path name of your script, for our case: "/usr/bin/sh /appl/oragrid/checkInternal.sh".
Choose the appropriate Comparison Operator (for our example this is: "!=") and the Warning and Critical Thresholds.
For our example we set "1" in the Critical field.
Scheduling is obvious.
You define how frequently your metric will be monitored by the Enterprise Manager.
Now, whenever something is wrong with our Internal Manager process on our server, an alert will rise and will be displayed with the message of our choice on on the main host page under the Alerts section.
To be notified via e-mail for these types of alerts, just add the User Defined Numeric Metric in the notification rule you already have for your host or create a new one.
Be sure to be subscribed to this notification rule.

library cache lock

Most of the info discussed here are from Metalink Note 444560.1.
The library cache lock event is also associated with library cache concurrency.
A session must acquire a library cache lock on an object handle to prevent other sessions from accessing it at the same time,
or to maintain a dependency for a long time, or to locate an object in the library cache.
Both locks and pins are provided to access objects in the library cache.

Locks manage concurrency between processes, whereas pins manage cache coherence. In order to access an object, a process must first lock the object handle, and then pin the object heap itself.
Requests for both locks and pins will wait until granted. This is a possible source of contention, because there is no NOWAIT request mode.

By acquiring a lock on the object handle, a process can prevent other processes from accessing the object, or even finding out what type it is.
It can even maintain a dependency on an object without preventing other processes from accessing the object.
Acquiring a lock is also the only way to locate an object in cache—a process locates and locks an object in a single operation.

If the process wants to actually examine or modify the object, then it must acquire a pin on the object itself (after acquiring a lock on the handle).
Pinning the object causes information about the object and its heaps to be loaded into memory if they were not already there.
This information is guaranteed to remain in memory at least until the pin is released. Locks and pins are externalized in X$KGLLK and X$KGLPN, respectively.

Oracle needs Library Cache Lock and Library Cache Pin for the compilation/Parsing of Package/Procedure/Function/View.
This is to ensure that no one is using the object during compilation/parsing as we are making some changes to the definition of the object
and need to drop and recreate the object with the new definition.

When a SQL statement is hard parsed by a session, the session has to acquire a library cache lock so that the other sessions can be prevented from accessing or modifying the same object.
If this event accumulates a lot of time, then it may indicate that the shared pool is too small or is being flushed regularly.
Otherwise, it indicates that database object definitions are being changed regularly.
Apart from hard parsing, if the session wants to change the definition of the object specified in the SQL or do any modifications,
then it has to acquire a library cache lock along with the library cache pin.
It is pinning because it needs the dictionary information to be loaded in the memory to access the same to modify/change the object.

You may use X$KGLLK to find the session which holds the lock:

SELECT SID,
       USERNAME,
       TERMINAL,
       PROGRAM
  FROM V$SESSION
 WHERE SADDR IN
          (SELECT KGLLKSES
             FROM X$KGLLK LOCK_A
            WHERE KGLLKREQ = 0
                  AND EXISTS
                         (SELECT LOCK_B.KGLLKHDL
                            FROM X$KGLLK LOCK_B
                           WHERE     KGLLKSES = 'C0000006D1AFE550' --saddr of blocked session from v$session
                                 AND LOCK_A.KGLLKHDL = LOCK_B.KGLLKHDL
                                 AND KGLLKREQ > 0));

library cache pin

The library cache pin wait event is associated with library cache concurrency.
It occurs when the session tries to pin an object in the library cache to modify or examine it.
The session must acquire a pin to make sure that the object is not updated by other sessions at the same time.
Oracle posts this event when sessions are compiling or parsing PL/SQL procedures and views.

What actions to take to reduce these waits depend heavily on what blocking scenario is occurring.
A common problem scenario is the use of DYNAMIC SQL from within a PL/SQL procedure where the PL/SQL code is recompiled and the DYNAMIC SQL calls something that depends on the calling procedure.
If there is general widespread waiting, the shared pool may need tuning.
If there is a blocking scenario, the following SQL can be used to show the sessions that are holding and/or requesting pins on the object that are given in P1 in the wait:

SELECT S.SID, KGLPNMOD "Mode", KGLPNREQ "Req"
  FROM X$KGLPN P, V$SESSION S
 WHERE P.KGLPNUSE = S.SADDR AND KGLPNHDL = '&P1RAW';

Kill the session with Req=0, Mode=3 or 2.

You can also find out the object's owner and name:

SELECT KGLNAOWN AS OWNER, KGLNAOBJ AS OBJECT
  FROM SYS.X$KGLOB
 WHERE KGLHDADR = '&P1RAW';

You can get P1RAW from v$session_wait view.

Tuesday, February 24, 2009

Indexes not being used due to different data types and data casting problems

We have a table XDP_ORDER_HEADERS, which has a column named EXTERNAL_ORDER_NUMBER of VARCHAR2 type, but, in essence, we store numeric values in it.

SELECT EXTERNAL_ORDER_NUMBER FROM XDP_ORDER_HEADERS;

EXTERNAL_ORDER_NUMBER
1000291
1000292
1000293
1000294
1000295
1000296
1000297
1000298
1000299
100030
1000300

We create an index named XDP_ORDER_HEADERS_IDX on that column.
Since we know that in this column we store numbers, executing the query:

SELECT *
  FROM XDP_ORDER_HEADERS
 WHERE EXTERNAL_ORDER_NUMBER = 1000296;

should use our index and run faster, right?
Wrong!
The execution plan shows us that a Full Table Scan is used instead by the CBO:

Plan	IO Cost	CPU Cost	Bytes	Cost
SELECT STATEMENT CHOOSE	7.505	835,979,287	190	8.35
1 TABLE ACCESS FULL DP.XDP_ORDER_HEADERS	7.505	835,979,287	190	8.35

Since this column is of VARCHAR2 type, we should compare it with VARCHAR2 values, if we want the right index to be used.
So, the query that will take use of our index is the following:

SELECT *
  FROM XDP_ORDER_HEADERS
 WHERE EXTERNAL_ORDER_NUMBER = '1000296';

Plan	IO Cost	CPU Cost	Bytes	Cost
SELECT STATEMENT CHOOSE	4	23.04	190	5
2 TABLE ACCESS BY INDEX ROWID XDP.XDP_ORDER_HEADERS	4	23.04	190	5
1 INDEX RANGE SCAN NON-UNIQUE XDP.XDP_ORDER_HEADERS_IDX	3	14.482		4

Grid Control: Create user-defined metrics at database level

User-Defined Metrics allow you to extend the reach of Enterprise Manager’s monitoring to conditions specific to particular environments via custom scripts or SQL queries and function calls. Once defined, User-Defined Metrics will be monitored, aggregated in the repository and trigger alerts, like regular metrics.

Login to Enterprise Manager and go to the Databases tab. Enter the database you choose and at the Related Link section click the User-defined Metrics link.
Press Create button and you will be transferred to the page shown at the image posted, where you can create your custom metric.

I will not explain every detail you see in this page, since a lot of them are obvious.
You should understand the meaning of thresholds. Your query should return a single value or a two-column set of values. In the first case, you set the Comparison Operator and the Warning and Critical Values according to what you want. In the second case, in the Warning Thresholds by Key and Critical Thresholds by Key fields you define for individual keys specific values that trigger an alert. An example:

SELECT ename FROM emp
Threshold settings for this example are shown.
Use the format key:value . Keys are case-sensitive.
■ Warning:500
■ Critical:300
■ Comparison Operator: < ■ Warning threshold by key: SMITH:250;JONES:400;CLARK:900 The warning threshold is set to 250 for SMITH, 400 for JONES, and 900 for CLARK. ■ Critical threshold by key: SMITH:100;JONES:200;CLARK:500 The critical threshold is set to 100 for SMITH, 200 for JONES, and 500 for CLARK. All other keys will use the threshold values specified in the Warning and Critical fields. When these conditions are met, a warning or critical alert will be risen on the main database page under the Alerts section.

Now, an example for a String metric type.
Supposedly, you want to raise an alert when a job in your database becomes broken (stops running). An appropriate query could be:

select broken from dba_jobs where what = 'KEEP_SIZE_AUD_LOG';

This will return either 'Y' or 'N' (a string value). So, I choose String Metric Type and Single Value SQL Query Output. For Comparison Operator we choose MATCH and and we enter Y in the field Critical. Warning alert has no meaning in our case, so this field will be left blank.
Now, when this particular job stops running a Critical Alert will be risen on the main database page under the Alerts section.

If you do not want to get as an Alert name something vogue, such as "[Metric Name]=Y", you may enter something else in the Alert Message field, like "Job KEEP_SIZE_AUD_LOG is broken." For the "SELECT ename FROM emp" example you could use "Underpaid Employee: %Key% has salary of %value%", so e.g. you will get a warning alert if Smith's [key] salary drops below 250 [value].

Scheduling is obvious. You define how frequently your metric will be monitored by the Enterprise Manager.

To be notified via e-mail for these types of alerts, just add the User Defined Numeric Metric and/or User-Defined String Metric in the notification rule you already have for your database or create a new one. Be sure to be subscribed to this notification rule.

latch free

The latch free wait occurs when the process waits to acquire a latch that is currently held by other process.
Like enqueue, Oracle uses latches to protect data structures.
One process at a time can either modify or inspect the data structure after acquiring the latch.
Other processes needing access to the data structure must wait till they acquire the latch. Unlike enqueue, processes requesting latch do not have to wait in a queue.
If the request to acquire a latch fails, the process simply waits for a short time and requests the latch again.
The short wait time is called “spin”. If the latch is not acquired after one or more spin iterations, the process sleeps for a short time and tries to acquire the latch again,
sleeping for successively longer periods until the latch is obtained.

You can determine which latch is waited, using this query:

SELECT NAME,
       ADDR,
       'Child ' || CHILD#,
       GETS,
       MISSES,
       SLEEPS
  FROM GV$LATCH_CHILDREN
 WHERE ADDR = '&P1RAW'
UNION
SELECT NAME,
       ADDR,
       NULL,
       GETS,
       MISSES,
       SLEEPS
  FROM GV$LATCH
 WHERE ADDR = '&P1RAW';

The types are described here.
Use the script posted in "Identify waiting sessions".

1. Using the SQL hash value from the CURRENT_EVENT cursor, you can capture the SQL textfrom the GV$SQLTEXT view as shown next and store it in the appropriate history table.
If you are also interested in the SQL statistics, you can get the information from the V$SQLAREA using the same SQL hash value from the cursor:

SELECT INST_ID,
         HASH_VALUE,
         ADDRESS,
         PIECE,
         SQL_TEXT
    FROM GV$SQLTEXT
   WHERE HASH_VALUE = 4190307765                              --sql_hash_value
                                AND INST_ID = 1                 --RAC Instance
ORDER BY INST_ID, PIECE;

2. Find the hot block:

SELECT TS#,
         FILE#,
         DBARFIL,
         DBABLK,
         OBJ,
         CLASS,
         STATE,
         TCH
    FROM X$BH
   WHERE HLADDR = 'C0000006FE846AD8' --P1RAW or ADDR from gv$latch_children/gv$latch
ORDER BY TCH DESC;

Blocks with a TCH (touch count) value that go into double figures or bigger are possible targets for hot blocks.

SELECT DISTINCT A.OBJECT_NAME, A.SUBOBJECT_NAME
  FROM DBA_OBJECTS A, SYS.X$BH B
 WHERE     (A.OBJECT_ID = B.OBJ OR A.DATA_OBJECT_ID = B.OBJ)
       AND B.FILE# = 604                        --DBARFIL
       AND B.DBABLK = 2003;                     --DBABLK

Latches

This is a copy of Metalink's Reference Note 22908.1, for those that do not have access to it.

1. What is a latch?

Latches are low level serialization mechanisms used to protect shared data structures in the SGA. The implementation of latches is operating system dependent, particularly in regard to whether a process will wait for a latch and for how long.
A latch is a type of a lock that can be very quickly acquired and freed. Latches are typically used to prevent more than one process from executing the same piece of code at a given time. Associated with each latch is a cleanup procedure that will be called if a process dies while holding the latch. Latches have an associated level that is used to prevent deadlocks. Once a process acquires a latch at a certain level it cannot subsequently acquire a latch at a level that is equal to or less than that level (unless it acquires it nowait).

2.Latches vs. Enqueues

Enqueues are another type of locking mechanism used in Oracle.
An enqueue is a more sophisticated mechanism which permits several concurrent processes to have varying degree of sharing of "known" resources. Any object which can be concurrently used, can be protected with enqueues. A good example is of locks on tables. We allow varying levels of sharing on tables e.g. two processes can lock a table in share mode or in share update mode etc. One difference is that the enqueue is obtained using an OS specific locking mechanism. An enqueue allows the user to store a value in the lock, i.e the mode in which we are requesting it. The OS lock manager keeps track of the resources locked. If a process cannot be granted the lock because it is incompatible with the mode requested and the lock is requested with wait, the OS puts the requesting process on a wait queue which is serviced in FIFO.

Another difference between latches and enqueues is that in latches there is no ordered queue of waiters like in enqueues. Latch waiters may either use timers to wakeup and retry or spin (only in multiprocessors). Since all waiters are concurrently retrying (depending on the scheduler), anyone might get the latch and conceivably the first one to try might be the last one to get.

3. When do we need to obtain a latch?

A process acquires a latch when working with a structure in the SGA (System Global Area). It continues to hold the latch for the period of time it works with the structure. The latch is dropped when the process is finished with the structure. Each latch protects a different set of data, identified by the name of the latch.
Oracle uses atomic instructions like "test and set" for operating on latches. Processes waiting to execute a part of code for which a latch has already been obtained by some other process will wait until the latch is released. Examples are redo allocation latches, copy latches, archive control latch etc. The basic idea is to block concurrent access to shared data structures. Since the instructions to set and free latches are atomic, the OS guarantees that only one process gets it. Since it is only one instruction, it is quite fast. Latches are held for short periods of time and provide a mechanism for cleanup in case a holder dies abnormally while holding it. This cleaning is done using the services of PMON.

4. Latches request modes?

Latches request can be made in two modes: "willing-to-wait" or "no wait". Normally, latches will be requested in "willing-to-wait" mode. A request in "willing-to-wait" mode
will loop, wait, and request again until the latch is obtained. In "no wait" mode the process request the latch. If one is not available, instead of waiting, another one is requested. Only when all fail does the server process have to wait.
Examples of "willing-to-wait" latches are: shared pool and library cache latches
A example of "no wait" latches is the redo copy latch.

5. What causes latch contention?

If a required latch is busy, the process requesting it spins, tries again and if still not available, spins again. The loop is repeated up to a maximum number of times determined by the initialization parameter _SPIN_COUNT. If after this entire loop, the latch is still not available, the process must yield the CPU and go to sleep. Initially is sleeps for one centisecond. This time is doubled in every subsequent sleep.
This causes a slowdown to occur and results in additional CPU usage, until a latch is available. The CPU usage is a consequence of the "spinning" of the process. "Spinning" means that the process continues to look for the availability of the latch after certain intervals of time, during which it sleeps.

6. How to identify contention for internal latches?

Relevant data dictionary views to query:

V$LATCH
V$LATCHHOLDER
V$LATCHNAME
Each row in the V$LATCH table contains statistics for a different type of latch. The columns of the table reflect activity for different types of latch requests. The distinction between these types of requests is whether the requesting process continues to request a latch if it is unavailable:

willing-to-wait If the latch requested with a willing-to-wait
request is not available, the requesting process
waits a short time and requests the latch again.
The process continues waiting and requesting until
the latch is available.

no wait If the latch requested with an immediate request is
not available, the requesting process does not
wait, but continues processing.

V$LATCHNAME key information:
---------------------------------------
GETS Number of successful willing-to-wait requests for
a latch.

MISSES Number of times an initial willing-to-wait request
was unsuccessful.

SLEEPS Number of times a process waited a requested a latch
after an initial wiling-to-wait request.

IMMEDIATE_GETS Number of successful immediate requests for each latch.

IMMEDIATE_MISSES Number of unsuccessful immediate requests for each latch.

Calculating latch hit ratio
----------------------------

To get the Hit ratio for latches apply the following formula:

"willing-to-wait" Hit Ratio=(GETS-MISSES)/GETS
"no wait" Hit Ratio=(IMMEDIATE_GETS-IMMEDIATE_MISSES)/IMMEDIATE_GETS

This number should be close to 1. If not, tune according to the latch name

7. Useful SQL scripts to get latch information

/*
** Display System-wide latch statistics.
*/
column name format A32 truncate heading "LATCH NAME"
column pid heading "HOLDER PID"
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+)
and a.latch# = c.latch#
order by a.latch#;
/*
** Given a latch address, find out the latch name.
*/
column name format a64 heading 'Name'
select a.name from v$latchname a, v$latch b
where b.addr = '&addr'
and b.latch#=a.latch#;

/*
** Display latch statistics by latch name.
*/
column name format a32 heading 'LATCH NAME'
column pid heading 'HOLDER PID'
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+) and a.latch# = c.latch#
and c.name like '&latch_name%' order by a.latch#;

8. List of all the latches

Oracle versions might differ in the latch# assigned to the existing latches.
The following query will help you to identify all latches and the number assigned.
column name format a40 heading 'LATCH NAME'
select latch#, name from v$latchname;

9. List of latches that are of most concern to a DBA

BUFFER CACHE LATCHES: There are two main latches which protect data blocks in the buffer cache. Contention for these two latches is usually seen when a database has high I/O rates. We can reduce contention for these latches and tune them by adjusting certain init.ora parameters.

Cache buffers chains latch:

This latch is acquired whenever a block in the buffer cache is accessed (pinned).

Reducing contention for the cache buffer chains latch will usually require reducing logical I/O rates by tuning and minimizing the I/O requirements of the SQL involved. High I/O rates could be a sign of a hot block (meaning a block highly accessed).

See Note 163424.1 How To Identify a Hot Block Within The Database to correctly identify this issue.
Cache buffers LRU chain latch:

The cache buffer lru chain latch is acquired in order to introduce a new block into the buffer cache and when writing a buffer back to disk, specifically when trying to scan the LRU (least recently used) chain containing all the dirty blocks in the buffer cache.

Its possible to reduce contention for the cache buffer lru chain latch by increasing the size of the buffer cache and thereby reducing the rate at which new blocks are introduced into the buffer cache. Two parameters dictate the size of the buffer cache, DB_BLOCK_SIZE and DB_BLOCK_BUFFERS. In actuality, only the DB_BLOCK_BUFFERS can be changed without recreating the database. Caution, when tuning the buffer pool, avoid the use of additional buffers that contribute little or nothing to the cache hit ratio. A common mistake is to continue increasing the value of DB_BLOCK_BUFFERS. Such increases have no effect if you are doing full table scans or other operations that do not use the buffer cache. Multiple buffer pools can help reduce contention on this latch.You can create additional cache buffer lru chain latches by adjusting the configuration parameter DB_BLOCK_LRU_LATCHES. You may be able to reduce the load on the cache buffer chain latches by increasing the configuration parameter _DB_BLOCK_HASH_BUCKETS

REDOLOG BUFFER LATCHES: There are two Redo buffer latches, the redo allocation latch and the redo copy latch. The redo allocation latch must be acquired in order to allocate space within the buffer. If the redo log entry to be made is greater than the configuration parameter LOG_SMALL_ENTRY_MAX_SIZE, the session which acquires the redo allocation latch may copy the entry into the redo buffer immediately while holding the allocation latch. If the log entry is greater than LOG_SMALL_ENTRY_MAX_SIZE, then the session will release the redo allocation latch and will acquire the redo copy latch in order to copy the entry. There is only one redo allocation latch, but there may be up to LOG_SIMULTANEOUS_COPIES allocation latches.

Redo allocation latch:

This latch controls the allocation of space for redo entries in the redo log buffer. There is one redo allocation latch per instance.

Contention for this latch in Oracle7 can be reduced by decreasing the value of LOG_SMALL_ENTRY_MAX_SIZE on multi-cpu systems to force the use of the
redo copy latch. In Oracle8i this parameter is obsolete, so you need to consider to increase the size of the LOG_BUFFER or reduce the load of the log buffer using NOLOGGING features when possible.

Redo copy latch:

This latch is used to write redo records into the redolog buffer. This latch is waited for on both single and multi-cpu systems.

On multi-cpu systems, contention can be reduced by increasing the value of LOG_SIMULTANEOUS_COPIES (Hidden in Oracle8i) and/or increasing LOG_ENTRY_PREBUILD_THRESHOLD (undocumented in Oracle7).

LIBRARY CACHE

Library cache latch:

The library cache latches protect the cached SQL statements and objects definitions held in the library cache within the shared pool. The library cache latch must be acquired in order to add a new statement to the library cache. During a parse, Oracle searches the library cache for a matching statement. If one is not found, then Oracle will parse the SQL statement, obtain the library cache latch and insert the new SQL.

The first resource to reduce contention on this latch is to ensure that the application is reusing as much as possible SQL statement representation. Use bind variables whenever possible in the application. Misses on this latch may also be a sign that the application is parsing SQL at a high rate and may be suffering from too much parse CPU overhead.If the application is already tuned the SHARED_POOL_SIZE can be increased. Be aware that if the application is not using the library cache appropriately, the contention might be worse with a larger structure to be handled.

The _KGL_LATCH_COUNT parameter controls the number of library cache latches. The default value should be adequate, but if contention for the library cache latch cant be resolved, it may be advisable to increase this value. The default value for _KGL_LATCH_COUNT is the next prime number after CPU_COUNT. This value cannot exceed 66 (See: ).

Library cache pin latch:

The library cache pin latch must be acquired when a statement in the library cache is reexecuted. Misses on this latch occur when there is very high rates SQL execution.

There is little that can be done to reduce the load on the library cache pin latch, although using private rather than public synonyms or direct object references such as OWNER.TABLE may help.

SHARED POOL RELATED LATCHES

Shared pool latch:

While the library cache latch protects operations withing the library cache, the shared pool latchis used to protect critical operations when allocating and freeing memory in the shared pool.
If an application makes use of literal (unshared) SQL then this can severely limit scalability and throughput. The cost of parsing a new SQL statement is expensive both in terms of CPU requirements and the number of times the library cache and shared pool latches may need to be acquired and released. Before Oracle9, there use to be just one such latch to the entire database to protects the allocation of memory in the library cache. In Oracle9 multiple childs were introduced to relieve contention on this resource.

Ways to reduce the shared pool latch are, avoid hard parses when possible, parse once, execute many. Eliminating literal SQL is also useful to avoid the shared pool latch. The size of the shared_pool and use of MTS (shared server option) also greatly influences the shared pool latch. Note 62143.1 explains how to identify and correct problems with the shared pool, and shared pool latch.

Row cache objects latch:

This latch comes into play when user processes are attempting to access the cached data dictionary values.

It is not common to have contention in this latch and the only way to reduce contention for this latch is by increasing the size of the shared pool (SHARED_POOL_SIZE).

10. Tuning _SPIN_COUNT (_LATCH_SPIN_COUNT in Oracle7)

SPIN_COUNT controls how many times the process will re-try to obtain the latch before backing off and going to sleep. This basically means the process is in a tight CPU loop continually trying to get the latch for SPIN_COUNT attempts. On a single CPU system if an Oracle process tries to acquire a latch but it is held by someone else the process will release the CPU and go to sleep for a short period before trying again. However, on a multi processor system (SMP) it is possible that the process holding the latch is running on one of the other CPUs and so will potentially release the latch in the next few instructions (latches are usually held for only very short periods of time).

Performance can be adjusted by changing the value of SPIN_COUNT. If a high value is used, the latch will be attained sooner than if you use a low value. However, you may use more CPU time spinning to get the latch if you use a high value for SPIN_COUNT. You can decrease this probability of session sleeps by increasing the value of the configuration parameters _LATCH_SPIN_COUNT or SPIN_COUNT. This parameter controls the number of attempts the session will make to obtain the latch before sleeping. Spinning on the latch consumes CPU, so if you increase this parameter, you may see an increase in your systems overall CPU utilization. If your computer is near 100% CPU and your application is throughput rather than response time driven, you could consider decreasing SPIN_COUNT in order to conserve CPU. Adjusting SPIN_COUNT is trial and error. In general, only increase SPIN_COUNT if there are enough free CPU resources available on the system, and decrease it only if there is no spare CPU capacity.

To summarize latch sleeps and spin count, if you encounter latch contention and have spare CPU capacity, consider increasing the value of SPIN_COUNT. If CPU resources are at full capacity, consider decreasing the value of SPIN_COUNT.