Wednesday, June 29, 2011

OS Watcher (OSW) and Lite Onboard Monitor (LTOM)

The following four new white papers have just been released by Oracle's Center of Expertise:

10g Upgrade Companion

Determining CPU Resource Usage for Linux and Unix

Measuring Memory Usage for Linux and Unix

Best Practices for Load Testing
I checked the second and third white papers, both of which are written by Roger Snyde from Oracle Support's Center of Expertise. These white papers describe a tool called OSW (OS Watcher) . Oracle Support’s Center of Expertise has developed OSWatcher, a script-based tool for Unix and Linux systems that runs and archives output from a number of operating system monitoring utilities, such as vmstat, top, iostat, mpstat and ps.

OSWatcher is available from Metalink as note 301137.1. It is a shell script tool and will run on Unix and Linux servers. It operates as a background process and runs the native operating system utilities at user-settable intervals, by default 30 seconds, and retains an archive of the output for a user settable period, defaulting to 48 hours. This value may be increased in order to retain more information when evaluating performance, and to capture baseline information during important cycle-end periods.

Oracle recommends customers download and install OSWatcher on all production and test servers that need to be monitored.

While going through 301137.1, I found the mention of another tool called LTOM(The embedded Lite Onboard Monitor): To collect database metrics in addition to OS metrics consider running LTOM. The Lite Onboard Monitor (LTOM) is a java program designed as a real-time diagnostic platform for deployment to a customer site. LTOM differs from other support tools, as it is proactive rather than reactive. LTOM provides real-time automatic problem detection and data collection. LTOM runs on the customer's UNIX server, is tightly integrated with the host operating system and provides an integrated solution for detecting and collecting trace files for system performance issues. The ability to detect problems and collect data in real-time will hopefully reduce the amount of time it takes to solve problems and reduce customer downtime.

Both OSW and LTOM now provide a graphing utility to graph the data collected. This greatly reduces the need to manually inspect all the output files.


OSWatcher:
used OSWatcher to monitor CPU/Memory/Network to investigate the problem on servers. I think It's easy to setup. But I have to download it from metalink.

OS Watcher (OSW) is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues. We can download it from metalink. OSW operates as a set of background processes on the server and gathers OS data on a regular basis, invoking such Unix utilities as vmstat, netstat and iostat.
More detail: metalink: 301137.1

After I downloaded it from metalink. It's time to setup:

$ ls osw212.tar
osw212.tar

$ tar xvf osw212.tar
./
./osw/
./osw/Exampleprivate.net
./osw/OSWatcher.sh
./osw/OSWatcherFM.sh
./osw/profile/
./osw/oswnet.sh
./osw/oswsub.sh
./osw/startOSW.sh
./osw/stopOSW.sh
./osw/tarupfiles.sh
./osw/topaix.sh
./osw/README
./osw/OSWgREADME
./osw/src/
./osw/src/OSW_profile.htm
./osw/src/coe_logo.gif
./osw/src/oswg_input.txt
./osw/src/missing_graphic.gif
./osw/src/tombody.gif
./osw/src/watch.gif
./osw/gif/
./osw/oswlnxtop.sh
./osw/private.net
./osw/oswlnxio.sh
./osw/oswg.jar
./osw/tmp/

$ cd osw


Just extract from tar file... read README file and get idea with utility commands:

startOSW.sh script:
need 2 arguments which control the frequency that data is collected and the number of hours worth of data to archive.
An optional 3rd argument allows the user to specify a zip utility name to compressthe files after they have been created:

ARG1 = snapshot interval in seconds (default 30 seconds).
ARG2 = the number of hours of archive data to store (default 48 hours)
ARG3 (optional) = the name of the zip utility to run if the user wants to compress the files automatically after creation.

Example:

./startOSW.sh
Info...You did not enter a value for snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for archiveInterval.
Info...Using default value = 48
.
.

./startOSW.sh 60 10 gzip
Info...Zip option IS specified.
Info...OSW will use gzip to compress files.
.
.
Starting OSWatcher V2.1.2 on Tue Jul 21 11:16:40 ICT 2009
With SnapshotInterval = 60
With ArchiveInterval = 10
.
.


stopOSW.sh script:

Example:

./stopOSW.sh

Or use "OSWatcher.sh" run to test:


$ ./OSWatcher.sh

Info...You did not enter a value for snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for archiveInterval.
Info...Using default value = 48

Testing for discovery of OS Utilities...

VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
NETSTAT found on your system.
TOP found on your system.

Discovery completed.

Starting OSWatcher V2.1.2 on Tue Jul 21 10:29:55 ICT 2009
With SnapshotInterval = 30
With ArchiveInterval = 48

OSWatcher - Written by Carl Davis, Center of Expertise, Oracle Corporation

Starting Data Collection...

osw heartbeat:Tue Jul 21 10:29:55 ICT 2009
.
.

CTRL+C


It's time to show it (TEST): Starting

$ ./startOSW.sh
Info...You did not enter a value for snapshotInterval.
Info...Using default value = 30
Info...You did not enter a value for archiveInterval.
Info...Using default value = 48

Testing for discovery of OS Utilities...

VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
NETSTAT found on your system.
TOP found on your system.

Discovery completed.

Starting OSWatcher V2.1.2 on Tue Jul 21 10:34:24 ICT 2009
With SnapshotInterval = 30
With ArchiveInterval = 48

OSWatcher - Written by Carl Davis, Center of Expertise, Oracle Corporation

Starting Data Collection...

osw heartbeat:Tue Jul 21 10:34:24 ICT 2009
osw heartbeat:Tue Jul 21 10:34:55 ICT 2009
osw heartbeat:Tue Jul 21 10:35:25 ICT 2009
.
.
.
monitor!... and want to stop:

$ ./stopOSW.sh
Terminated


What I see?

Archives 're stored in osw/archive/ PATH.

$ find ./archive/ -type f
./archive/oswiostat/oratest01_iostat_09.07.21.1000.dat
./archive/oswslabinfo/oratest01_slabinfo_09.07.21.1000.dat
./archive/oswprvtnet/oratest01_prvtnet_09.07.21.1000.dat
./archive/oswps/oratest01_ps_09.07.21.1000.dat
./archive/oswtop/oratest01_top_09.07.21.1000.dat
./archive/oswvmstat/oratest01_vmstat_09.07.21.1000.dat
./archive/oswmeminfo/oratest01_meminfo_09.07.21.1000.dat
./archive/oswnetstat/oratest01_netstat_09.07.21.1000.dat
./archive/oswmpstat/oratest01_mpstat_09.07.21.1000.dat
.
.
.


From Archive Files, that can see stats. and use archives to make graph as well:
use OSWg(more detail: metalink 461053.1) generate graph (requires as a minimum java version 1.4.2 or higher), and need X-windows.

read OSWgREADME File to help generate Graph.
and test with some archives:

$ $ORACLE_HOME/jdk/bin/java -version
java version "1.4.2_14"

$ $ORACLE_HOME/jdk/bin/java -jar oswg.jar -i archive/

Starting OSWg V2.1.2
OSWatcher Graph Written by Oracle Center of Expertise
Copyright (c) 2008 by Oracle Corporation

Parsing Data. Please Wait...

Parsing file oratest01_iostat_09.07.21.1000.dat ...
Parsing file oratest01_iostat_09.07.21.1100.dat ...
.
.
.

Parsing Completed.

Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter T to Specify Different Time Scale
Enter D to Return to Default Time Scale
Enter R to Remove Currently Displayed Graphs
Enter P to Generate A Profile
Enter Q to Quit Program

Please Select an Option:5

The Following Devices and Average Service Times Are Ready to Display:

Device Name Average Service Times in Milliseconds

sda 2.0477464788732385
sdb 1.192676056338029

Specify A Case Sensitive Device Name to View (Q to exit): sda

Tuesday, June 28, 2011

Custom Logrotate in Solaris 10

Here I explain how to configure logadm to rotage any system wide files according to given criteria.
1. Add the corresponding entries in /etc/logadm.conf in below format.
root@server1 # tail -3 /etc/logadm.conf
/var/adm/wtmpx -A 1m -o adm -g adm -m 664 -p 1d -t '$file.old.%Y%m%d_%H%M' -z 1
/var/adm/wtmpx -A 1m -g adm -m 664 -o adm -p 1w -t '$file.old.%Y%m%d_%H%M' -z 5
/var/adm/utmpx -A 1m -g adm -m 664 -o adm -p 1w -t '$file.old.%Y%m%d_%H%M' -z 5
/var/adm/loginlog -A 1m -g sys -m 700 -o root -p 1w -t '$file.old.%Y%m%d_%H%M' -z 5
Explanation for each switch:
-A ->Delete any versions that have not been modified for the amount of time specified by age. Specify age as a number followed by an h (hours), d (days), w(weeks), m (months), or y (years).
-o -> the owner of the newly creating empty file
-g-> the group of newly creating file
-m ->mode of the new empty file (chmod xxx)
-p -> Rotate a log file after the specified time period (period as d, w, m, y)
-t -> Specify the template to use when renaming log files (Here, wtmpx.old.20101225_0757) (see man logadm for more info)
-z ->How many copy of rotaged files needs to retain on the system.
-P ->Used by logadm to record the last time the log was rotated in /etc/logadm.conf (no need to set this manually)
2. Once above entries are done, execute logadm -v command to run a logrotation now. Now logadm reads the /etc/logadm.conf file, and for every entry found in that file checks the corresponding log file to see if it should be rotated.
root@server1 # logadm -v
# loading /etc/logadm.conf
# processing logname: /var/log/syslog
# using default rotate rules: -s1b -p1w
# using default template: $file.$n
# processing logname: /var/adm/messages
# using default rotate rules: -s1b -p1w
# using default template: $file.$n
# processing logname: /var/cron/log
# using default expire rule: -C10
# processing logname: /var/lp/logs/lpsched
# using default rotate rules: -s1b -p1w
# processing logname: /var/fm/fmd/errlog
# using default expire rule: -C10
# using default template: $file.$n
# processing logname: /var/fm/fmd/fltlog
# using default template: $file.$n
# processing logname: smf_logs
# using default template: $file.$n
# processing logname: /var/adm/pacct
# using default template: $file.$n
# processing logname: /var/log/pool/poold
# using default expire rule: -C10
# using default template: $file.$n
# processing logname: /var/svc/log/system-webconsole:console.log
# using default rotate rules: -s1b -p1w
# using default expire rule: -C10
# using default template: $file.$n
# processing logname: /var/opt/SUNWsasm/log/sasm.log
# using default template: $file.$n
# processing logname: /var/adm/wtmpx
mkdir -p /var/adm # verify directory exists
mv -f /var/adm/wtmpx /var/adm/wtmpx.old.20101225_1250 # rotate log file
touch /var/adm/wtmpx
chown adm:adm /var/adm/wtmpx
chmod 664 /var/adm/wtmpx
# recording rotation date Sat Dec 25 12:50:51 2010 for /var/adm/wtmpx
# processing logname: /var/adm/utmpx
mkdir -p /var/adm # verify directory exists
mv -f /var/adm/utmpx /var/adm/utmpx.old.20101225_1250 # rotate log file
touch /var/adm/utmpx
chown adm:adm /var/adm/utmpx
chmod 664 /var/adm/utmpx
# recording rotation date Sat Dec 25 12:50:51 2010 for /var/adm/utmpx
# processing logname: /var/adm/loginlog
mkdir -p /var/adm # verify directory exists
mv -f /var/adm/loginlog /var/adm/loginlog.old.20101225_1250 # rotate log file
touch /var/adm/loginlog
chown root:sys /var/adm/loginlog
chmod 700 /var/adm/loginlog
# recording rotation date Sat Dec 25 12:50:51 2010 for /var/adm/loginlog
# writing changes to /etc/logadm.conf
As you can see the last line of above command, once the logadm command successfully run, it do some changes to with -P switch in /etc/logadm.conf file regarding the last update of logrotation.
root@server1 # tail -3 /etc/logadm.conf
/var/adm/wtmpx -A 1m -P 'Sat Dec 25 12:50:51 2010' -g adm -m 664 -o adm -p 1w -t '$file.old.%Y%m%d_%H%M' -z 5
/var/adm/utmpx -A 1m -P 'Sat Dec 25 12:50:51 2010' -g adm -m 664 -o adm -p 1w -t '$file.old.%Y%m%d_%H%M' -z 5
/var/adm/loginlog -A 1m -P 'Sat Dec 25 12:50:51 2010' -g sys -m 700 -o root -p 1w -t '$file.old.%Y%m%d_%H%M' -z 5
List of new files created in /var/adm
root@server1 # ls -ltr /var/adm/*.old*
-rwx------ 1 root sys 0 Dec 25 11:00 /var/adm/loginlog.old.20101225_1250
-rw-r--r-- 1 root bin 3720 Dec 25 15:49 /var/adm/utmpx.old.20101225_1250
-rw-rw-r-- 1 adm adm 8595060 Dec 25 15:51 /var/adm/wtmpx.old.20101225_1250