OVERVIEW "Senior Solaris System Administrator", with a wide range of Unix platform. A background of 24x7 mission-critical environments, full change control process, systems monitoring, and performance analysis. A knack for cleaning up impossible messes, and making things Work Right. A history of mentoring junior admins to reach senior level.
Friday, April 10, 2009
Sun cluster 3.1 Daemons
clexecd:This is used by cluster kernel threads to executeuserland commands (such as the run_reserve and dofsckcommands). It is also used to run cluster commands remotely (like the cluster shutdown command).This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.
cl_ccrad: This daemon provides access from userland management applications to the CCR.It is automatically restarted if it is stopped.
cl_eventd: The cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events.The daemon is automatically respawned if it is killed.
cl_eventlogd: cluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.
failfastd: This daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed
rgmd: The resource group management daemon which manages the state of all cluster-unaware applications.A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.fed: This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.pmfd: This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.
pnmd: Public managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.
scdpmd: Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.
Obtaining network interface information with dladm
==================================================================
# dladm show-dev
bge0 link: up speed: 100 Mbps duplex: full
bge1 link: up speed: 1000 Mbps duplex: full
bge2 link: up speed: 1000 Mbps duplex: full
bge3 link: unknown speed: 0 Mbps duplex: unknown
==================================================================
Fixing XDMCP logins on Solaris
# ps -ef grep dtlogin
root 4919 4838 0 12:37:58 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 0
root 4838 1 0 12:36:50 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 0
The XDMCP requests are usually accepted on udp port 177, so udp port set to 0 would surely remove the dtlogin's ability to accept the requests. I'm not sure why the Solaris developers decided to do that, but I'm guess it was done to improve the security of the installation out of the box. Knowing this fact it is easy to fix this pesky problem. All I had to do is to change the udpPort property value for cde-login service in SMF repository:
# svccfg
svc:> select cde-login
svc:/application/graphical-login/cde-login> listprop *arg*dtlogin/args astring " -udpPort 0"
So here we go, all we need to do is to set the dtlogin/args property to " -udpPort 177" and we should be in business as usual:
svc:/application/graphical-login/cde-login> setprop
dtlogin/args=astring:" -udpPort 177"
svc:/application/graphical-login/cde-login> quit
#
Now we can just restart the cde-login service and XDMCP login should work (Note: if you're doing this in an X Window session, the session will be restarted):
# svcadm restart cde-login
As a configmation, lets see if the dtlogin process is running with correct arguments:
# ps -ef grep dtlogin
root 4919 4838 0 12:37:58 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 177
root 4838 1 0 12:36:50 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 177
Sure enough, we've got the dtlogin process listening on the correct port. We should now be in business as usual:
# /usr/openwin/bin/Xnest :1 -query nevada
gives me a nice login as it should have in the very beginning
Create a Jumpstart Server in Solaris
Create a standalone server that can be used as both a boot and install server
Creating a Jumpstart Server should not be as hard as documented on many others’ websites. Actually, nothing on a Sun hardware platform should be as difficult as many people (Sun included) make it. After digesting most of the Sun documentation regarding creating the install, boot, and file servers (Solaris 9 Installation Guide), I will now continue to explain step by step how to easily create a new jumpstart server that is both a boot and image server. This HOWTO assumes that you have a basic technical background (Sun Engineer level or better lol) and have certain other files already prepared. I may in the future include how to create most of the other files you will need (as I’m sure I’ll have to do it myself at some point)…
But only if you be good.
Preface:
I usually do not allow the cd/dvd drive to be automounted (primarily due to the environment I work in). Because of this, all of the instructions below will show the commands needed to mount and unmount the cd/dvd drive. My cd/dvd drive will be c0t0d0s0 throughout this HOWTO.
Step 1: Install a base Solaris 9 server
You may do this any number of ways. If you already have a jumpstart server and a base Solaris 9 profile you can jumpstart a new box, or if you do not you can just perform a base install using the media of your choice.
Step 2: Create some preliminary directories
There are several directories that you may want to go ahead and create. These will all be used later and will hold a variety of different items:
#> mkdir -p -m 755 /jumpstart/os/200509_Solaris_9#> mkdir /jumpstart/profiles#> mkdir /jumpstart/flasharchives
Step 3: Create an image of the OS media
This is really a misnomer (read: Sun is retarded). You are not really making an image of the media, you are running a script that will copy the contents of the media to the location of your choice. Why they really call this an image is unknown to me. What this “image” will be used for is when you are performing a “Custom Jumpstart Installation”. This will be where the packages and other items can be found when it is time to install them. This is also where the boot images come from a little later when we use this server as a boot server as well.
We will now copy the contents of the media (in this case DVD) to the server. Since this server will also be our boot server we also want to copy over the needed boot images. To do so type the following:
#> mount -F hsfs /dev/dsk/c0t0d0s0 /mnt#> cd /mnt/Solaris_9/Tools#> ./setup_install_server -b /jumpstart/os/200509_Solaris_9/
At this point, you will see the script checking for adequate disk space, and then finally copying the data over. Notice that the target directory is /jumpstart/os/200509_Solaris_9/. This is because the setup_install_server script requires that the target directory be completely void of all files both visible and hidden. So to accomodate this, I chose to create a folder that includes the media’s release month and year. This can come in handy if you decide to add newer revisions of the media.
Step 4: Copy over existing rules, profiles, sysidcfgs, and flars
Currently, the jumpstart server itself is practically ready to go, unless you were reading the installation guide in which you would have went from Chapter 23 to Chapter 27 and back to… well, you get the idea (Sun doesn’t though so let’s laugh at them, hahahaha). What we need to do now however is copy our existing configurations over to the new server. This includes but is not limited to:
/jumpstart/sysidcfg/*
/jumpstart/profiles/*
/jumpstart/rules
/jumpstart/
/mnt/Solaris_9/Tools/add_install_client
If this is the first jumpstart server you are building, you will unfortunately have to wait until I write a continuation HOWTO which contains instructions to create these necessary files.
We should also at this time clean up our configurations. Edit /jumpstart/rules and remove all of the entries except for the “any” entry and any others that you will surely be using. Also delete all of the profiles that don’t have a corresponding rules entry out of /jumpstart/profiles. Feel free to regenerate the rules.ok file
#> cd /jumpstart#> ./check
Yeah, that’s right, you made a mistake in the rules file didn’t you? Fix it now before you forget…
Step 5 (Optional): Modify the nomatch.beg start script
Located in /jumpstart you should see a script name nomatch.beg. This script is run anytime a machine does not match another rule in the /jumpstart/rules.ok file. What I like to do is to have it spit out a reasonable error message stating, “This server did not have a matching rule listed in /jumpstart/rules.ok. Did you run /jumpstart/check after modifying it?”, or something like that. It really beats the alternative (the installation just failing, thanks Sun for another glorious use of documentation, comments, and error messages).
Step 6: Share the jumpstart directory
Simply put, we need to share the /jumpstart directory to allow files to be copied:
#> share -F nfs -o ro,anon=0 /jumpstart#> shareall
Step 7: Add a client to the configuration, check running services, and test
We have now reached the point were we can add a client to the jumpstart server and test the configuration. Add a client to the server by adding an entry in:
/etc/ethers
/etc/hosts
/jumpstart/profiles
/jumpstart/rules
Now check the rules file like a good little person:
#> cd /jumpstart#> ./check
Next be sure to add the client to allow tftpboot to work:
#> ./add_install_client -s :/jumpstart/os/200509_Solaris_9 -c :/jumpstart -p :/jumpstart sun4u
NOTE: If when you add the client you receive odd errors, make sure that rpc is running and re-add the client
#> /etc/init.d/nfs.server stop && /etc/init.d/rpc start && /etc/init.d/nfs.server start#> cd /jumpstart#> ./add_install_client -s :/jumpstart/os/200509_Solaris_9 -c :/jumpstart -p :/jumpstart sun4u
Now let’s check and see if the 2 necessary process are running to get the jump off the ground:
verify that in.rarpd is running (ps -ef grep -i in.rarpd)
verify that in.tftpd is available (cat /etc/inetd.conf grep -i tftp)
And finally, test the server by jumping the test box by booting the test box, sending a break and typing:
ok> boot net:speed=100,duplex=full -v - install
Or if you are fortunate enough to have good gigabit hardware:
ok> boot net:speed=1000,duplex=full -v - install
The client should soon start to build itself at this point with regard to how it is set up (flasharchive, new build, etc)
HOWTO:: Add and Configure LUNs in Solaris 9 and Veritas Cluster
Of course not… Read on…
This HOWTO assumes you have 2 servers to each database that are part of Veritas Cluster and that the databases are running on node 1 and that you are using Emulex HBAs.
ON BOTH NODES
Step 1. Make a copy of the output when you run format
Step 2. Edit /kernel/drv/sd.conf and add an entry for the LUNs you are adding
name="sd" parent="lpfc" target=17 lun=42 hba="lpfc0";
Step 3. Run the following commands so that Solaris can see the new LUNs as disks.
#> update_drv -f sd#> devfsadm
Step 4. Run format and select the new disks. It should show up as c3t17dxx. Select that disk and label it when asked. Do this for all added LUNs.
Step 5. Run vxdiskadm. Select 1. Add or initialize one or more disks. You can type list now to view all of the disks that Veritas Cluster currently sees. The uninitialized ones will be labeled as such. Type in the c3t17dxx designation to select it and accept all of the default except for the following:
When it asks for the disk group, enter the name of the disk group that contains the volume that you wish to increase its size When it asks if you would like to encapsulate the disk, say no. You will then be prompted to initialize, say yes.
Step 6. Make a note of the disk designations that Veritas Cluster provides (ex: mydisk12)
Step 7. Run vxassist to get an idea if the disk initialization really worked as expected. This will allow you to see the maximum amount of space the volumes can be expanded:
#> vxassist -g mydiskgroup maxgrow myvolume mydisk##
It will show you the maximum size that the volume can be expanded in clusters and MB.
Step 8. It is now a good idea to get a copy of the current sizes of the disks. You will need this later. So do a df -th and record the sizes of the volumes you are wanting to expand.
Step 9. Grow the filesystems you need by issuing the following command:
/etc/vx/bin/vxresize -F vxfs -g mydiskgroup myvolume +42g mydisk##
Step 10. Compare the output of df -th to the copy you saved earlier (or scroll up for all you non-hackers)
That’s it. That’s all there is to it… now get off my back.
Thursday, April 9, 2009
Configuring NEW LUNs
spdma501:# format < /dev/null
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c506b2fca,0
1. c1t1d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c506b39cf,0
Specify disk (enter its number):
spdma501:# cfgadm -o show_FCP_dev -al
Ap_Id Type Receptacle Occupant Conditionc1 fc-private connected configured unknownc1::2100000c506b2fca,0 disk connected configured unknownc1::2100000c506b39cf,0 disk connected configured unknownc3 fc-fabric connected unconfigured unknownc3::50060482ccaae5a3,61 disk connected unconfigured unknownc3::50060482ccaae5a3,62 disk connected unconfigured unknownc3::50060482ccaae5a3,63 disk connected unconfigured unknownc3::50060482ccaae5a3,64 disk connected unconfigured unknownc3::50060482ccaae5a3,65 disk connected unconfigured unknownc3::50060482ccaae5a3,66 disk connected unconfigured unknownc3::50060482ccaae5a3,67 disk connected unconfigured unknown
spdma501:# cfgadm -c configure c3
spdma501:# cfgadm -c configure c5
spdma501:# format < /dev/null
Note: IF YOU DON'T SEE THE NEW LUNS IN FORMAT, RUN devfsadm !!!!
# /usr/sbin/devfsadm
Label the new disks !!!!
# cd /tmp
# cat format.cmd
label
quit
# for disk in `format < /dev/null 2> /dev/null grep "^c" cut -d: -f1`
do
format -s -f /tmp/format.cmd $disk
echo "labeled $disk ....."
done
To verify whether an HBA is connected to a fabric or not
Found path to 4 HBA ports
/devices/pci@1e,600000/SUNW,qlc@3/fp@0,0:devctl CONNECTED
/devices/pci@1e,600000/SUNW,qlc@3,1/fp@0,0:devctl NOT CONNECTED
/devices/pci@1e,600000/SUNW,qlc@4/fp@0,0:devctl CONNECTED
/devices/pci@1e,600000/SUNW,qlc@4,1/fp@0,0:devctl NOT CONNECTED
Note: Your SAN administrator will ask for the WWNs for Zoning. Here are some steps I use to get that information
# prtconf -vp grep wwn
port-wwn: 210000e0.8b1d8d7d
node-wwn: 200000e0.8b1d8d7d
port-wwn: 210100e0.8b3d8d7d
node-wwn: 200000e0.8b3d8d7d
port-wwn: 210000e0.8b1eaeb0
node-wwn: 200000e0.8b1eaeb0
port-wwn: 210100e0.8b3eaeb0
node-wwn: 200000e0.8b3eaeb0
Note: you may use fcinfo, if installed
# modinfo grep qlc
76 7ba9e000 cdff8 282 1 qlc (SunFC Qlogic FCA v20060630-2.16)
root@PSBLD008 # prtdiag grep qlc
/IO04/C3V2 PCI 157 B 66 66 1,0 ok SUNW,qlc-pci1077,2312.1077.10a.2+
/IO04/C3V2 PCI 157 B 66 66 1,1 ok SUNW,qlc-pci1077,2312.1077.10a.2+
/IO05/C3V2 PCI 189 B 66 66 1,0 ok SUNW,qlc-pci1077,2312.1077.10a.2+
/IO05/C3V2 PCI 189 B 66 66 1,1 ok SUNW,qlc-pci1077,2312.1077.10a.2+
root@PSBLD008 # luxadm qlgc
Found Path to 4 FC100/P, ISP2200, ISP23xx Devices
Opening Device: /devices/pci@9d,700000/SUNW,qlc@1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04
Opening Device: /devices/pci@9d,700000/SUNW,qlc@1,1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04
Opening Device: /devices/pci@bd,700000/SUNW,qlc@1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04
Opening Device: /devices/pci@bd,700000/SUNW,qlc@1,1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04
Complete