Wednesday, June 29, 2011

OS Watcher (OSW) and Lite Onboard Monitor (LTOM)

The following four new white papers have just been released by Oracle's Center of Expertise:

10g Upgrade Companion

Determining CPU Resource Usage for Linux and Unix

Measuring Memory Usage for Linux and Unix

Best Practices for Load Testing
I checked the second and third white papers, both of which are written by Roger Snyde from Oracle Support's Center of Expertise. These white papers describe a tool called OSW (OS Watcher) . Oracle Support’s Center of Expertise has developed OSWatcher, a script-based tool for Unix and Linux systems that runs and archives output from a number of operating system monitoring utilities, such as vmstat, top, iostat, mpstat and ps.

OSWatcher is available from Metalink as note 301137.1. It is a shell script tool and will run on Unix and Linux servers. It operates as a background process and runs the native operating system utilities at user-settable intervals, by default 30 seconds, and retains an archive of the output for a user settable period, defaulting to 48 hours. This value may be increased in order to retain more information when evaluating performance, and to capture baseline information during important cycle-end periods.

Oracle recommends customers download and install OSWatcher on all production and test servers that need to be monitored.

While going through 301137.1, I found the mention of another tool called LTOM(The embedded Lite Onboard Monitor): To collect database metrics in addition to OS metrics consider running LTOM. The Lite Onboard Monitor (LTOM) is a java program designed as a real-time diagnostic platform for deployment to a customer site. LTOM differs from other support tools, as it is proactive rather than reactive. LTOM provides real-time automatic problem detection and data collection. LTOM runs on the customer's UNIX server, is tightly integrated with the host operating system and provides an integrated solution for detecting and collecting trace files for system performance issues. The ability to detect problems and collect data in real-time will hopefully reduce the amount of time it takes to solve problems and reduce customer downtime.

Both OSW and LTOM now provide a graphing utility to graph the data collected. This greatly reduces the need to manually inspect all the output files.


OSWatcher:
used OSWatcher to monitor CPU/Memory/Network to investigate the problem on servers. I think It's easy to setup. But I have to download it from metalink.

OS Watcher (OSW) is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues. We can download it from metalink. OSW operates as a set of background processes on the server and gathers OS data on a regular basis, invoking such Unix utilities as vmstat, netstat and iostat.
More detail: metalink: 301137.1

After I downloaded it from metalink. It's time to setup:

$ ls osw212.tar
osw212.tar

$ tar xvf osw212.tar
./
./osw/
./osw/Exampleprivate.net
./osw/OSWatcher.sh
./osw/OSWatcherFM.sh
./osw/profile/
./osw/oswnet.sh
./osw/oswsub.sh
./osw/startOSW.sh
./osw/stopOSW.sh
./osw/tarupfiles.sh
./osw/topaix.sh
./osw/README
./osw/OSWgREADME
./osw/src/
./osw/src/OSW_profile.htm
./osw/src/coe_logo.gif
./osw/src/oswg_input.txt
./osw/src/missing_graphic.gif
./osw/src/tombody.gif
./osw/src/watch.gif
./osw/gif/
./osw/oswlnxtop.sh
./osw/private.net
./osw/oswlnxio.sh
./osw/oswg.jar
./osw/tmp/

$ cd osw


Just extract from tar file... read README file and get idea with utility commands:

startOSW.sh script:
need 2 arguments which control the frequency that data is collected and the number of hours worth of data to archive.
An optional 3rd argument allows the user to specify a zip utility name to compressthe files after they have been created:

ARG1 = snapshot interval in seconds (default 30 seconds).
ARG2 = the number of hours of archive data to store (default 48 hours)
ARG3 (optional) = the name of the zip utility to run if the user wants to compress the files automatically after creation.

Example:

./startOSW.sh
Info...You did not enter a value for snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for archiveInterval.
Info...Using default value = 48
.
.

./startOSW.sh 60 10 gzip
Info...Zip option IS specified.
Info...OSW will use gzip to compress files.
.
.
Starting OSWatcher V2.1.2 on Tue Jul 21 11:16:40 ICT 2009
With SnapshotInterval = 60
With ArchiveInterval = 10
.
.


stopOSW.sh script:

Example:

./stopOSW.sh

Or use "OSWatcher.sh" run to test:


$ ./OSWatcher.sh

Info...You did not enter a value for snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for archiveInterval.
Info...Using default value = 48

Testing for discovery of OS Utilities...

VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
NETSTAT found on your system.
TOP found on your system.

Discovery completed.

Starting OSWatcher V2.1.2 on Tue Jul 21 10:29:55 ICT 2009
With SnapshotInterval = 30
With ArchiveInterval = 48

OSWatcher - Written by Carl Davis, Center of Expertise, Oracle Corporation

Starting Data Collection...

osw heartbeat:Tue Jul 21 10:29:55 ICT 2009
.
.

CTRL+C


It's time to show it (TEST): Starting

$ ./startOSW.sh
Info...You did not enter a value for snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for archiveInterval.
Info...Using default value = 48

Testing for discovery of OS Utilities...

VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
NETSTAT found on your system.
TOP found on your system.

Discovery completed.

Starting OSWatcher V2.1.2 on Tue Jul 21 10:34:24 ICT 2009
With SnapshotInterval = 30
With ArchiveInterval = 48

OSWatcher - Written by Carl Davis, Center of Expertise, Oracle Corporation

Starting Data Collection...

osw heartbeat:Tue Jul 21 10:34:24 ICT 2009
osw heartbeat:Tue Jul 21 10:34:55 ICT 2009
osw heartbeat:Tue Jul 21 10:35:25 ICT 2009
.
.
.
monitor!... and want to stop:

$ ./stopOSW.sh
Terminated


What I see?

Archives 're stored in osw/archive/ PATH.

$ find ./archive/ -type f
./archive/oswiostat/oratest01_iostat_09.07.21.1000.dat
./archive/oswslabinfo/oratest01_slabinfo_09.07.21.1000.dat
./archive/oswprvtnet/oratest01_prvtnet_09.07.21.1000.dat
./archive/oswps/oratest01_ps_09.07.21.1000.dat
./archive/oswtop/oratest01_top_09.07.21.1000.dat
./archive/oswvmstat/oratest01_vmstat_09.07.21.1000.dat
./archive/oswmeminfo/oratest01_meminfo_09.07.21.1000.dat
./archive/oswnetstat/oratest01_netstat_09.07.21.1000.dat
./archive/oswmpstat/oratest01_mpstat_09.07.21.1000.dat
.
.
.


From Archive Files, that can see stats. and use archives to make graph as well:
use OSWg(more detail: metalink 461053.1) generate graph (requires as a minimum java version 1.4.2 or higher), and need X-windows.

read OSWgREADME File to help generate Graph.
and test with some archives:

$ $ORACLE_HOME/jdk/bin/java -version
java version "1.4.2_14"

$ $ORACLE_HOME/jdk/bin/java -jar oswg.jar -i archive/

Starting OSWg V2.1.2
OSWatcher Graph Written by Oracle Center of Expertise
Copyright (c) 2008 by Oracle Corporation

Parsing Data. Please Wait...

Parsing file oratest01_iostat_09.07.21.1000.dat ...
Parsing file oratest01_iostat_09.07.21.1100.dat ...
.
.
.

Parsing Completed.

Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter T to Specify Different Time Scale
Enter D to Return to Default Time Scale
Enter R to Remove Currently Displayed Graphs
Enter P to Generate A Profile
Enter Q to Quit Program

Please Select an Option:5

The Following Devices and Average Service Times Are Ready to Display:

Device Name Average Service Times in Milliseconds

sda 2.0477464788732385
sdb 1.192676056338029

Specify A Case Sensitive Device Name to View (Q to exit): sda

No comments: