Wednesday, August 19, 2009

Removing Sun[TM] Cluster 3.x node and cluster software packages

http://sunsolve.sun.com/search/document.do?assetkey=1-61-230779-1
Document Audience: SPECTRUM
Document ID: 230779
Old Document ID: (formerly 50093)
Title: Removing Sun[TM] Cluster 3.x node and cluster software packages
Copyright Notice: Copyright © 2009 Sun Microsystems, Inc. All Rights Reserved
Update Date: Thu Dec 18 00:00:00 MST 2008

Solution Type Technical Instruction

Solution 230779 : Removing Sun[TM] Cluster 3.x node and cluster software packages


Related Categories


Home>Product>Software>Enterprise Computing

Description
This document serves to address that need. Need of redeploying clusters.
There are many instances where a cluster node needs to be redeployed and its cluster software removed for resource allocation.
This document describes a 3 node scalable topology configuration running Solaris[TM] 9 and Sun[TM] Cluster 3.1 update 2.
The nodes are referred to as node1, node2 and node3.
There are 4 resource groups configured:
logical-rg (SUNW.LogicalHostname)
dg1-rg(SUNW.HAStoragePlus)
shareaddr-rg(SUNW.SharedAddress)
apache-rg(SUNW.apache)
Since SunCluster 3.0 Update 3, cluster packages can be removed using scinstall -r.
The procedure below removes node2 and uses scinstall -r in the final step.
Notes:
If you plan to completely remove cluster software from all cluster nodes, please refer to Infodoc: < Solution: 217563 > for a more succinct procedure that does not involve removing one node at a time.
This procedure assumes that at least a quorum device is configured for the cluster. This is true in most of the cases. However, otherwise, at least one quorum device needs to be configured in order to remove the first node of the 3 nodes. Please refer to document < Solution: 203650 > for further details.

Steps to Follow
Migrate off resource groups and device groups to other nodes.

# scswitch -S -h node2
Delete node2 instances from all resource groups.
* Start with scalable resource groups, followed by failover resource groups
* Gather configuration information by running the following commands
# scrgadm -pv | grep "Res Group Nodelist"
# scconf -pv | grep "Node ID"
# scrgadm -pvv | grep "NetIfList.*value"
* Scalable Resource Group(s)
- Set maximum and desired primaries to appropriate number
# scrgadm -c -g apache-rg -y maximum_primaries="2" \
-y desired_primaries="2"
- Set remaining nodenames to scalable resource group
# scrgadm -c -g apache-rg -h node1,node3
- Remove from node list failover resource group with shared address
# scrgadm -c -g shareaddr-rg -h node1,node3
* Failover Resource Group(s)
- Set remaining nodenames to failover resource group
# scrgadm -c -g logical-rg -h node1,node3
# scrgadm -c -g dg1-rg -h node1,node3
- Check for IPMP groups affected
# scrgadm -pvv -g logical-rg | grep -i netiflist
# scrgadm -pvv -g shareaddr-rg | grep -i netiflist
- Update IPMP groups affected
# scrgadm -c -j logicalhost \
-x netiflist=sc_ipmp0 @1,sc_ipmp0@3
# scrgadm -c -j shared-address \
-x netiflist=sc_ipmp0@1,sc_ipmp0@3
* Verify changes to resource groups
# scrgadm -pvv -g apache-rg | grep -i nodelist
# scrgadm -pvv -g apache-rg | grep -i netiflist
# scrgadm -pvv -g shareaddr-rg | grep -i nodelist
# scrgadm -pvv -g shareaddr-rg | grep -i netiflist
# scrgadm -pvv -g logical-rg | grep -i nodelist
# scrgadm -pvv -g logical-rg | grep -i netiflist
3. Delete node instances from all disk device groups
* Solaris Volume Manager
- Check for diskgroups affected
# scconf -pv | grep -i "Device group" | grep node2
# scstat -D
- Remove node from diskset nodelist
# metaset -s setname -d -h nodelist (use -f needed)
* VERITAS Volume Manager
- Check for diskgroups affected
# scconf -pv | grep -i "Device group" | grep node2 # scstat -D
- Remove node from diskgroup nodelist
# scconf -r -D name=dg1,nodelist=node2
* Raw Disk Device Group
- Remember to change desired secondaries to 1
- On any active remaining node(s), identify device groups connected # scconf -pvv | grep node2 | grep "Device group node list"
- Determine raw device
# scconf -pvv | grep Disk
- Disable the localonly property of each Local_Disk
# scconf -c -D name=,localonly=false
- Verify disabled localonly property
# scconf -pvv | grep "Disk"
- Remove node from raw device
# scconf -r -D name=rawdisk-device-group,nodelist=node2
Steps 3-5 is not applicable for 2 node clusters.
3. Remove all fully connected quorum devices.
- Check quorum disk information
# scconf -pv | grep Quorum
- Remove quorum disk
# scconf -r -q globaldev=d
4. Remove all fully connected storage devices from node2. Use any method that will block access from node2 to shared storage
- vxdiskadm to suppress access from VxVM
- cfgadm -c unconfigure
- LUN masking/mapping methods if application
- physical cable removal if allowed
5. Add back the quorum devices
# scconf -a -q globaldev=d,node=node1,node=node3
6. Place the node being removed into maintenance state.
* Shutdown node2
# shutdown -g0 -y -i0
* On remaining node
# scconf -c -q node=node2,maintstate
* Verify quorum status
# scstat -q
7. Remove all logical transport connections from node being removed
* Check for interconnect configuration
# scstat -W
# scconf -pv | grep cable
# scconf -pv | grep adapter
* Remove cables configuration
# scconf -r -m endpoint=node2:qfe0
# scconf -r -m endpoint=node2:qfe1
* Remove adapter configuration
# scconf -r -A name=qfe0,node=node2
# scconf -r -A name=qfe1,node=node2
8. For 2 node clusters only, remove quorum disk.
* If not already done so, shutdown node to be uninstalled.
# shutdown -y -g 0
* On remaining node, put node to be removed in maintenence mode
# scconf -c -q node=node2,maintstate
* Place cluster in installmode
# scconf -c -q installmode
* Remove quorum disk
# scconf -r -q globaldev=d
* Verify quorum status
# scstat -q
9. Remove node from the cluster software configuration.
* # scconf -r -h node=node2
* # scstat -n
10. Remove cluster software
* If not already done so, shutdown node to be uninstalled.
# shutdown -g0 -y -i0
* Reboot the node into non-cluster mode.
ok> boot -x
* Remove all globally file systems except /global/.devices in /etc/vfstab
* Uninstall Sun Cluster software from the node
# scinstall -r
If it is desirable to remove the last node of the cluster, a complete removal of all resource and device groups will be required. Please follow the procedure below:
1. Offline all resource groups (RGs):
# scswitch -F -g [,...]
2. Disable all configured resources:
# scswitch -n -j [,...]
3. Remove all resources from the resource group:
# scrgadm -r -j
4. Remove the now empty resource groups:
# scrgadm -r -g
5. Remove global mounts in /etc/vfstab file and "/node@nodeid" mount options.
6. Remove all device groups:
# scstat -D (to get a list of device groups)
# scswitch -F -D device-group-name (to offline device-group)
# scconf -r -D name=device-group-name (to remove/unregister
NOTE: If there are any "rmt" devices, they must be removed with the command:
# /usr/cluster/dtk/bin/dcs_config -c remove -s rmt/1
This assumes that you have the package "SUNWscdtk". If you do not, you will need to install it in order to remove the rmt/XX entries, or the "scinstall -r" will fail.
The SUNWscdtk package is the diagnostics tool for cluster and is not available on the Cluster CD, you need to get it from the following URL:
http://suncluster.eng/service/tools.html
Uninstall the Sun Cluster 3.X software:
* If not already done so, shutdown node.
# shutdown -g0 -y -i0
* Reboot the node into non-cluster mode.
ok> boot -x
* Finally remove the SunCluster 3.x software using:
# scinstall -r

Product
Sun Cluster Geographic Edition 3.1 8/05
Solaris Cluster 3.2
Sun Cluster 3.1
Sun Cluster 3.1 Data Services Agents
Sun Cluster Agents 3.1 9/04
Sun Cluster Agents 3.1 4/04
Sun Cluster Agents 3.1 10/03
Sun Cluster Agents 3.1 05/03
Sun Cluster 3.1 9/04
Sun Cluster 3.1 8/05
Sun Cluster 3.1 7/05
Sun Cluster 3.1 4/04
Sun Cluster 3.1 10/03 for SunPlex Systems
Sun Cluster 3.0
Sun Cluster 3.0 7/01
Sun Cluster 3.0 5/02
Sun Cluster 3.0 12/01

Keywords
remove, removal, Cluster, node, scinstall, 3.x, ccr, resources

Blogs Website

Tuesday, August 18, 2009

How to remove DISKs/LUNs FROM Solaris

1. Identify the file systems.
2. Get the disks that belong to the file system.
3. Check them in the metaset/metadevice and make sure no one else is using them (no other soft partition).
4. Clean the metadevice from the metaset
5. REMOVE THE DISKS FROM THE METASET
6. REMOVE THE METADB FOR THE DISKS THAT YOU WANT TO REMOVE
7. ASK data storage to remove the disks
8. Configure the controllers after you confirm that the disks has been removed in all nodes
9. Run devfsadm -Cv in all nodes
9. Run scgdevs in ONE NODE (in case you are using SUN Cluster)
10. Run scdidadm -C in ONE NODE (in case you are using SUN Cluster)
11. Check all nodes have the same number of LUNS (in case you are using SUN Cluster)

For more information check http://docs.sun.com/app/docs/doc/817-1673/6mhcv6m38?a=view

Removing and unregistering a diskset from Sun Cluster

Today I realized that the procedure "How to Remove and Unregister a Device Group (Solaris Volume Manager)" lacks a specific example.
Lets assume the following diskset configuration on a Sun Cluster with two nodes named cluster01 and cluster02:
# cat /etc/lvm/md.tab
test_ds/d1 -m test_ds/d10
test_ds/d10 1 1 /dev/did/rdsk/d4s0

# metaset -s test_ds -a -h cluster01 cluster02
# metaset -s test_ds -a /dev/did/rdsk/d4
# metaset -s test_ds -a -m cluster01 cluster02
# metainit test_ds/d10
test_ds/d10: Concat/Stripe is setup
# metainit test_ds/d1test_ds/d1: Mirror is setup
# cldg show
=== Device Groups === Device Group Name: test_ds Type: SVM failback: false Node List: cluster01, cluster02 preferenced: true numsecondaries: 1 diskset name: test_ds
# cldg status
=== And now assume you want to remove and unregister this diskset again. Generally speaking you want to make sure prior to perform this, that
no file system is mounted on any node from this diskset
no entry on any node for this diskset is active in /etc/vfstab
no SUNW.HAStoragePlus resource is using this diskset or a file system from this diskset
Find out on which node the diskset is primary/online:
# cldg status
=== Cluster Device Groups ===--- Device Group Status ---Device Group Name Primary Secondary Status----------------- ------- --------- ------test_ds cluster01 cluster02 Online
Perform all following on the node where the diskset is primary/online (here: pplanet1):
Remove all metadevices on that diskset:
# metaclear -s test_ds -a
test_ds/d1: Mirror is clearedtest_ds/d10: Concat/Stripe is cleared

Remove all devices from that diskset (you need the -f option for the last one):
# metaset -s test_ds -d -f /dev/did/rdsk/d4

On a two node cluster, if mediators are configured, remove them:
# metaset -s test_ds -d -m pplanet1 pplanet2
For all nodes (but the node where the diskset is primary last) perform:
# metaset -s test_ds -d -h cluster02
# metaset -s test_ds -d -h cluster01
In /var/adm/messages you see the following after the last command:
Jun 2 02:21:33 cluster01 Cluster.Framework: [ID 801593 daemon.notice] stdout: no longer primary for test_ds
And you can confirm that the diskset is now removed and unregistered:
# cldg list#

Friday, April 10, 2009

Sun Cluster 3.1 (Resources)

Adding failover network resource : scrgadm –a –L –g -l
Adding shared network resource : scrgadm –a –S –g -l

adding a failover apache application and attaching the network resource
scrgadm –a –j apache_res -g \ -t SUNW.apache -y Network_resources_used = -y Scalable=False –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

adding a shared apache application and attaching the network resource
scrgadm –a –j apache_res -g \ -t SUNW.apache -y Network_resources_used = -y Scalable=True –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

Create a HAStoragePlus failover resource
scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \> -x FileSystemMountPoints=/oracle/data01 \> -x Affinityon=true

Removing
scrgadm –r –j res-ip

Note: must disable the resource first

changing properties
scrgadm -c -j -y
List : scstat -g
Detailed List : scrgadm –pv –j res-ipscrgadm –pvv –j res-ip
Disable resoure monitor: scrgadm –n –M –j res-ip
Enable resource monitor: scrgadm –e –M –j res-ip
Disabling : scswitch –n –j res-ip
Enabling : scswitch –e –j res-ip

Clearing a failed resource
scswitch –c –h, -j -f STOP_FAILED

Find the network of a resource
# scrgadm –pvv –j grep –I network

Removing a resource and resource group

offline the group
# scswitch –F –g rgroup-1

remove the resource
# scrgadm –r –j res-ip

remove the resource group
# scrgadm –r –g rgroup-1

Resource Types:

Adding : scrgadm –a –t i.e SUNW.HAStoragePlus
Deleting : scrgadm –r –t
Listing : scrgadm –pv grep ‘Res Type name’

Sun Cluster (File locations & Important configuration)

man pages :/usr/cluster/man
log files:/var/cluster/logs/var/adm/messages
sccheck logs:/var/cluster/sccheck/report.
CCR files : /etc/cluster/ccr
Cluster infrastructure file : /etc/cluster/ccr/infrastructure

SCSI Reservations:


Display reservation keys
scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2
scsi3:/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2


determine the device owner
scsi2:/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2
scsi3:/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Cluster information:

Quorum info : scstat –q
Cluster components: scstat -pv
Resource/Resource group status : scstat –g
IP Networking Multipathing : scstat –i
Status of all nodes : scstat –n
Disk device groups: scstat –D
Transport info : scstat –W
Detailed resource/resource group : scrgadm -pv
Cluster configuration info : scconf –p
Installation info (prints packages and version) : scinstall –pv

Cluster Configuration:


Integrity check : sccheck
Configure the cluster (add nodes, add data services, etc) : scinstall
Cluster configuration utility (quorum, data sevices, resource groups, etc) : scsetup
Add a node : scconf –a –T node=
Remove a node : scconf –r –T node=
Prevent new nodes from entering : scconf –a –T node=.
Put a node into maintenance state : scconf -c -q node=,maintstate

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.
Get a node out of maintenance state : scconf -c -q node=,reset

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.

Admin Quorum Device
Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together.You can use the scsetup GUI interface to add/remove quorum devices or use the below commands.



Adding a device to the quorum : scconf –a –q globaldev=d11
Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace.



Removing a device to the quorum : scconf –r –q globaldev=d11

Remove the last quorum device :
Evacuate all nodesput cluster into maint mode #scconf –c –q
installmoderemove the quorum device#scconf –r –q globaldev=d11
check the quorum devices#scstat –q


Resetting quorum info
scconf –c –q reset

Note: this will bring all offline quorum devices online

Bring a quorum device into maintenance mode

obtain the device number

#scdidadm –L
#scconf –c –q globaldev=,maintstate

Bring a quorum device out of maintenance mode
#scconf –c –q globaldev=,reset



Resource Groups
Adding : scrgadm -a -g -h ,
Removing : scrgadm –r –g
changing properties : scrgadm -c -g -y
Listing : scstat –g
Detailed List : scrgadm –pv –g
Display mode type (failover or scalable) : scrgadm -pv -g grep 'Res Group mode'
Offlining : scswitch –F –g
Onlining : scswitch -Z -g
Unmanaging : scswitch –u –g


Note: (all resources in group must be disabled)

Managing " scswitch –o –g
Switching : scswitch –z –g –h







Sun cluster 3.1 Daemons

Daemons

clexecd:This is used by cluster kernel threads to executeuserland commands (such as the run_reserve and dofsckcommands). It is also used to run cluster commands remotely (like the cluster shutdown command).This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.

cl_ccrad: This daemon provides access from userland management applications to the CCR.It is automatically restarted if it is stopped.


cl_eventd: The cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events.The daemon is automatically respawned if it is killed.

cl_eventlogd: cluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.

failfastd: This daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed

rgmd: The resource group management daemon which manages the state of all cluster-unaware applications.A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.fed: This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.pmfd: This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.

pnmd: Public managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.

scdpmd: Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.

Obtaining network interface information with dladm

Remember the good old days when you had to use ndd to know whether network interfaces on your machine negotiated the bandwidth and duplex settings correctly? And to make matters worse some interfaces would have slightly different ndd getters to obtain that information, which was fairly frustrating sometimes. Well, it's been a long time coming, but with Solaris 10 you don't have to that any more. A new fangled dladm utility takes care of abstracting the details of underlying network interface driver and can obtain the details of available network interfaces in a rather simple but quite useful format, all you have to do is invoke dladm with "show-dev" parameter (as on one of my systems):
==================================================================
# dladm show-dev
bge0 link: up speed: 100 Mbps duplex: full
bge1 link: up speed: 1000 Mbps duplex: full
bge2 link: up speed: 1000 Mbps duplex: full
bge3 link: unknown speed: 0 Mbps duplex: unknown

==================================================================

Fixing XDMCP logins on Solaris

I use XDMCP logins via Xnest quite frequently with X Window to get graphical console on another machine, mostly to do an install that requires a graphical environment (Oracle quickly comes to mind) or launch an application from the point of view of a different machine. With the Solaris Nevada (Solaris Excepress or SXDE/SXCE) builds on x86 I noticed that the XDMCP logins no longer work and when using Xnest querying the other host I end up with an empty window -- in other words the machine is not accepting the XDMCP requests. Scratching my head I decided to poke around the to see if dtlogin process responsible for accepting the XDMCP requests runs with any special arguments. Sure enough for some reason the Nevada builds have "-udpPort 0" argument passed to dtlogin process:

# ps -ef grep dtlogin
root 4919 4838 0 12:37:58 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 0
root 4838 1 0 12:36:50 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 0

The XDMCP requests are usually accepted on udp port 177, so udp port set to 0 would surely remove the dtlogin's ability to accept the requests. I'm not sure why the Solaris developers decided to do that, but I'm guess it was done to improve the security of the installation out of the box. Knowing this fact it is easy to fix this pesky problem. All I had to do is to change the udpPort property value for cde-login service in SMF repository:

# svccfg
svc:> select cde-login
svc:/application/graphical-login/cde-login> listprop *arg*dtlogin/args astring " -udpPort 0"

So here we go, all we need to do is to set the dtlogin/args property to " -udpPort 177" and we should be in business as usual:

svc:/application/graphical-login/cde-login> setprop
dtlogin/args=astring:" -udpPort 177"
svc:/application/graphical-login/cde-login> quit
#

Now we can just restart the cde-login service and XDMCP login should work (Note: if you're doing this in an X Window session, the session will be restarted):
# svcadm restart cde-login

As a configmation, lets see if the dtlogin process is running with correct arguments:
# ps -ef grep dtlogin
root 4919 4838 0 12:37:58 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 177
root 4838 1 0 12:36:50 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPort 177

Sure enough, we've got the dtlogin process listening on the correct port. We should now be in business as usual:
# /usr/openwin/bin/Xnest :1 -query nevada

gives me a nice login as it should have in the very beginning

Create a Jumpstart Server in Solaris

The goal:::
Create a standalone server that can be used as both a boot and install server
Creating a Jumpstart Server should not be as hard as documented on many others’ websites. Actually, nothing on a Sun hardware platform should be as difficult as many people (Sun included) make it. After digesting most of the Sun documentation regarding creating the install, boot, and file servers (Solaris 9 Installation Guide), I will now continue to explain step by step how to easily create a new jumpstart server that is both a boot and image server. This HOWTO assumes that you have a basic technical background (Sun Engineer level or better lol) and have certain other files already prepared. I may in the future include how to create most of the other files you will need (as I’m sure I’ll have to do it myself at some point)…
But only if you be good.

Preface:
I usually do not allow the cd/dvd drive to be automounted (primarily due to the environment I work in). Because of this, all of the instructions below will show the commands needed to mount and unmount the cd/dvd drive. My cd/dvd drive will be c0t0d0s0 throughout this HOWTO.

Step 1: Install a base Solaris 9 server
You may do this any number of ways. If you already have a jumpstart server and a base Solaris 9 profile you can jumpstart a new box, or if you do not you can just perform a base install using the media of your choice.
Step 2: Create some preliminary directories
There are several directories that you may want to go ahead and create. These will all be used later and will hold a variety of different items:
#> mkdir -p -m 755 /jumpstart/os/200509_Solaris_9#> mkdir /jumpstart/profiles#> mkdir /jumpstart/flasharchives

Step 3: Create an image of the OS media
This is really a misnomer (read: Sun is retarded). You are not really making an image of the media, you are running a script that will copy the contents of the media to the location of your choice. Why they really call this an image is unknown to me. What this “image” will be used for is when you are performing a “Custom Jumpstart Installation”. This will be where the packages and other items can be found when it is time to install them. This is also where the boot images come from a little later when we use this server as a boot server as well.
We will now copy the contents of the media (in this case DVD) to the server. Since this server will also be our boot server we also want to copy over the needed boot images. To do so type the following:
#> mount -F hsfs /dev/dsk/c0t0d0s0 /mnt#> cd /mnt/Solaris_9/Tools#> ./setup_install_server -b /jumpstart/os/200509_Solaris_9/

At this point, you will see the script checking for adequate disk space, and then finally copying the data over. Notice that the target directory is /jumpstart/os/200509_Solaris_9/. This is because the setup_install_server script requires that the target directory be completely void of all files both visible and hidden. So to accomodate this, I chose to create a folder that includes the media’s release month and year. This can come in handy if you decide to add newer revisions of the media.

Step 4: Copy over existing rules, profiles, sysidcfgs, and flars
Currently, the jumpstart server itself is practically ready to go, unless you were reading the installation guide in which you would have went from Chapter 23 to Chapter 27 and back to… well, you get the idea (Sun doesn’t though so let’s laugh at them, hahahaha). What we need to do now however is copy our existing configurations over to the new server. This includes but is not limited to:
/jumpstart/sysidcfg/*
/jumpstart/profiles/*
/jumpstart/rules
/jumpstart/
/mnt/Solaris_9/Tools/add_install_client

If this is the first jumpstart server you are building, you will unfortunately have to wait until I write a continuation HOWTO which contains instructions to create these necessary files.
We should also at this time clean up our configurations. Edit /jumpstart/rules and remove all of the entries except for the “any” entry and any others that you will surely be using. Also delete all of the profiles that don’t have a corresponding rules entry out of /jumpstart/profiles. Feel free to regenerate the rules.ok file
#> cd /jumpstart#> ./check
Yeah, that’s right, you made a mistake in the rules file didn’t you? Fix it now before you forget…

Step 5 (Optional): Modify the nomatch.beg start script
Located in /jumpstart you should see a script name nomatch.beg. This script is run anytime a machine does not match another rule in the /jumpstart/rules.ok file. What I like to do is to have it spit out a reasonable error message stating, “This server did not have a matching rule listed in /jumpstart/rules.ok. Did you run /jumpstart/check after modifying it?”, or something like that. It really beats the alternative (the installation just failing, thanks Sun for another glorious use of documentation, comments, and error messages).

Step 6: Share the jumpstart directory
Simply put, we need to share the /jumpstart directory to allow files to be copied:
#> share -F nfs -o ro,anon=0 /jumpstart#> shareall

Step 7: Add a client to the configuration, check running services, and test
We have now reached the point were we can add a client to the jumpstart server and test the configuration. Add a client to the server by adding an entry in:
/etc/ethers
/etc/hosts
/jumpstart/profiles
/jumpstart/rules

Now check the rules file like a good little person:
#> cd /jumpstart#> ./check
Next be sure to add the client to allow tftpboot to work:

#> ./add_install_client -s :/jumpstart/os/200509_Solaris_9 -c :/jumpstart -p :/jumpstart sun4u
NOTE: If when you add the client you receive odd errors, make sure that rpc is running and re-add the client
#> /etc/init.d/nfs.server stop && /etc/init.d/rpc start && /etc/init.d/nfs.server start#> cd /jumpstart#> ./add_install_client -s :/jumpstart/os/200509_Solaris_9 -c :/jumpstart -p :/jumpstart sun4u

Now let’s check and see if the 2 necessary process are running to get the jump off the ground:
verify that in.rarpd is running (ps -ef grep -i in.rarpd)
verify that in.tftpd is available (cat /etc/inetd.conf grep -i tftp)
And finally, test the server by jumping the test box by booting the test box, sending a break and typing:
ok> boot net:speed=100,duplex=full -v - install

Or if you are fortunate enough to have good gigabit hardware:

ok> boot net:speed=1000,duplex=full -v - install
The client should soon start to build itself at this point with regard to how it is set up (flasharchive, new build, etc)

HOWTO:: Add and Configure LUNs in Solaris 9 and Veritas Cluster

Do you have a Veritas Cluster? Do you know how to configure a LUN and increase the size of a volume that is managed by Veritas Cluster?
Of course not… Read on…

This HOWTO assumes you have 2 servers to each database that are part of Veritas Cluster and that the databases are running on node 1 and that you are using Emulex HBAs.
ON BOTH NODES

Step 1. Make a copy of the output when you run format

Step 2. Edit /kernel/drv/sd.conf and add an entry for the LUNs you are adding
name="sd" parent="lpfc" target=17 lun=42 hba="lpfc0";

Step 3. Run the following commands so that Solaris can see the new LUNs as disks.
#> update_drv -f sd#> devfsadm

Step 4. Run format and select the new disks. It should show up as c3t17dxx. Select that disk and label it when asked. Do this for all added LUNs.

Step 5. Run vxdiskadm. Select 1. Add or initialize one or more disks. You can type list now to view all of the disks that Veritas Cluster currently sees. The uninitialized ones will be labeled as such. Type in the c3t17dxx designation to select it and accept all of the default except for the following:
When it asks for the disk group, enter the name of the disk group that contains the volume that you wish to increase its size When it asks if you would like to encapsulate the disk, say no. You will then be prompted to initialize, say yes.

Step 6. Make a note of the disk designations that Veritas Cluster provides (ex: mydisk12)

Step 7. Run vxassist to get an idea if the disk initialization really worked as expected. This will allow you to see the maximum amount of space the volumes can be expanded:
#> vxassist -g mydiskgroup maxgrow myvolume mydisk##
It will show you the maximum size that the volume can be expanded in clusters and MB.

Step 8. It is now a good idea to get a copy of the current sizes of the disks. You will need this later. So do a df -th and record the sizes of the volumes you are wanting to expand.

Step 9. Grow the filesystems you need by issuing the following command:
/etc/vx/bin/vxresize -F vxfs -g mydiskgroup myvolume +42g mydisk##
Step 10. Compare the output of df -th to the copy you saved earlier (or scroll up for all you non-hackers)
That’s it. That’s all there is to it… now get off my back.

Thursday, April 9, 2009

Configuring NEW LUNs

spdma501:# format < /dev/null
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c506b2fca,0
1. c1t1d0
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c506b39cf,0
Specify disk (enter its number):

spdma501:# cfgadm -o show_FCP_dev -al

Ap_Id Type Receptacle Occupant Conditionc1 fc-private connected configured unknownc1::2100000c506b2fca,0 disk connected configured unknownc1::2100000c506b39cf,0 disk connected configured unknownc3 fc-fabric connected unconfigured unknownc3::50060482ccaae5a3,61 disk connected unconfigured unknownc3::50060482ccaae5a3,62 disk connected unconfigured unknownc3::50060482ccaae5a3,63 disk connected unconfigured unknownc3::50060482ccaae5a3,64 disk connected unconfigured unknownc3::50060482ccaae5a3,65 disk connected unconfigured unknownc3::50060482ccaae5a3,66 disk connected unconfigured unknownc3::50060482ccaae5a3,67 disk connected unconfigured unknown

spdma501:# cfgadm -c configure c3

spdma501:# cfgadm -c configure c5

spdma501:# format < /dev/null

Note: IF YOU DON'T SEE THE NEW LUNS IN FORMAT, RUN devfsadm !!!!

# /usr/sbin/devfsadm

Label the new disks !!!!

# cd /tmp
# cat format.cmd
label
quit
# for disk in `format < /dev/null 2> /dev/null grep "^c" cut -d: -f1`
do
format -s -f /tmp/format.cmd $disk
echo "labeled $disk ....."
done

To verify whether an HBA is connected to a fabric or not

#/usr/sbin/luxadm -e port
Found path to 4 HBA ports
/devices/pci@1e,600000/SUNW,qlc@3/fp@0,0:devctl CONNECTED
/devices/pci@1e,600000/SUNW,qlc@3,1/fp@0,0:devctl NOT CONNECTED
/devices/pci@1e,600000/SUNW,qlc@4/fp@0,0:devctl CONNECTED
/devices/pci@1e,600000/SUNW,qlc@4,1/fp@0,0:devctl NOT CONNECTED

Note: Your SAN administrator will ask for the WWNs for Zoning. Here are some steps I use to get that information

# prtconf -vp grep wwn
port-wwn: 210000e0.8b1d8d7d
node-wwn: 200000e0.8b1d8d7d
port-wwn: 210100e0.8b3d8d7d
node-wwn: 200000e0.8b3d8d7d
port-wwn: 210000e0.8b1eaeb0
node-wwn: 200000e0.8b1eaeb0
port-wwn: 210100e0.8b3eaeb0
node-wwn: 200000e0.8b3eaeb0

Note: you may use fcinfo, if installed

# modinfo grep qlc
76 7ba9e000 cdff8 282 1 qlc (SunFC Qlogic FCA v20060630-2.16)

root@PSBLD008 # prtdiag grep qlc

/IO04/C3V2 PCI 157 B 66 66 1,0 ok SUNW,qlc-pci1077,2312.1077.10a.2+
/IO04/C3V2 PCI 157 B 66 66 1,1 ok SUNW,qlc-pci1077,2312.1077.10a.2+
/IO05/C3V2 PCI 189 B 66 66 1,0 ok SUNW,qlc-pci1077,2312.1077.10a.2+
/IO05/C3V2 PCI 189 B 66 66 1,1 ok SUNW,qlc-pci1077,2312.1077.10a.2+

root@PSBLD008 # luxadm qlgc

Found Path to 4 FC100/P, ISP2200, ISP23xx Devices

Opening Device: /devices/pci@9d,700000/SUNW,qlc@1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04

Opening Device: /devices/pci@9d,700000/SUNW,qlc@1,1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04

Opening Device: /devices/pci@bd,700000/SUNW,qlc@1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04

Opening Device: /devices/pci@bd,700000/SUNW,qlc@1,1/fp@0,0:devctl
Detected FCode Version: ISP2312 Host Adapter Driver: 1.14.09 03/08/04
Complete