Thursday, February 26, 2009

Grid Control: Create user-defined metrics at OS level


Using a scripting language of your choice, create a script that contains logic to check for the condition being monitored.
For example, scripts that check for disk space or memory usage.
All scripts to be run with User-Defined Metrics should be placed in a directory to which the Management Agent has full access privileges.
Scripts themselves must have the requisite permissions set so that they can be executed by the Management Agent.
The script runtime environment must also be configured:
If your script requires an interpreter, such as a Perl interpreter, this must be installed on that host as well.

Let's say that we want to create a metric to alert us if something is wrong with a critical OS process running to our server.
This could be the e-BS Internal Manager.
First of all, we need to create the script "/appl/oragrid/checkInternal.sh":

#!/bin/sh
PS=`ps -ef|grep FNDCPMBR|grep -v grep|wc -l`

echo em_result=$PS
if test $PS -eq 1
then
echo em_message='Internal Manager is OK.'
else
if test $PS -gt 1
then
echo em_message='Stucked Internal Manager process found.'
else
echo em_message='Internal Manager is down.'
fi
fi

Enterprise Manager needs two parameters to be set, in order to be able to process the script's output, em_result and em_message.
em_result is the value that will be compared against the warning and critical threshold you will set in the metric's creation page (see image) and raise the appropriate alert.
The default message for this alert will be: "The value is [em_result]".
For our example, "=1" means we have one Internal Manager process, which is the expected behavior.
">1" means that more than one processes are running and "<1" that there is no process running.
Both these conditions are not acceptable and an alert should be risen about them.
If we want to receive a more clear message than the default for our alert, we set the em_message parameter.
So, if "em_result=1" then "Internal Manager is OK", if "em_result>1" then "Stucked Internal Manager process found" and if "em_result<1" then "Internal Manager is down".

The next step is to create the alert from the Enterprise Manager.
Login and go to the Targets tab.
In the Hosts tab, choose the host where you created your script and to the Related Links section, click the User-defined Metrics link.
Press the Create button and you will be transferred to the page shown in the image attached.
In the Command Line field enter the full command path and the full path name of your script, for our case: "/usr/bin/sh /appl/oragrid/checkInternal.sh".
Choose the appropriate Comparison Operator (for our example this is: "!=") and the Warning and Critical Thresholds.
For our example we set "1" in the Critical field.
Scheduling is obvious.
You define how frequently your metric will be monitored by the Enterprise Manager.
Now, whenever something is wrong with our Internal Manager process on our server, an alert will rise and will be displayed with the message of our choice on on the main host page under the Alerts section.
To be notified via e-mail for these types of alerts, just add the User Defined Numeric Metric in the notification rule you already have for your host or create a new one.
Be sure to be subscribed to this notification rule.

No comments:

Post a Comment