Wednesday, August 19, 2009

Removing Sun[TM] Cluster 3.x node and cluster software packages

http://sunsolve.sun.com/search/document.do?assetkey=1-61-230779-1
Document Audience: SPECTRUM
Document ID: 230779
Old Document ID: (formerly 50093)
Title: Removing Sun[TM] Cluster 3.x node and cluster software packages
Copyright Notice: Copyright © 2009 Sun Microsystems, Inc. All Rights Reserved
Update Date: Thu Dec 18 00:00:00 MST 2008

Solution Type Technical Instruction

Solution 230779 : Removing Sun[TM] Cluster 3.x node and cluster software packages


Related Categories


Home>Product>Software>Enterprise Computing

Description
This document serves to address that need. Need of redeploying clusters.
There are many instances where a cluster node needs to be redeployed and its cluster software removed for resource allocation.
This document describes a 3 node scalable topology configuration running Solaris[TM] 9 and Sun[TM] Cluster 3.1 update 2.
The nodes are referred to as node1, node2 and node3.
There are 4 resource groups configured:
logical-rg (SUNW.LogicalHostname)
dg1-rg(SUNW.HAStoragePlus)
shareaddr-rg(SUNW.SharedAddress)
apache-rg(SUNW.apache)
Since SunCluster 3.0 Update 3, cluster packages can be removed using scinstall -r.
The procedure below removes node2 and uses scinstall -r in the final step.
Notes:
If you plan to completely remove cluster software from all cluster nodes, please refer to Infodoc: < Solution: 217563 > for a more succinct procedure that does not involve removing one node at a time.
This procedure assumes that at least a quorum device is configured for the cluster. This is true in most of the cases. However, otherwise, at least one quorum device needs to be configured in order to remove the first node of the 3 nodes. Please refer to document < Solution: 203650 > for further details.

Steps to Follow
Migrate off resource groups and device groups to other nodes.

# scswitch -S -h node2
Delete node2 instances from all resource groups.
* Start with scalable resource groups, followed by failover resource groups
* Gather configuration information by running the following commands
# scrgadm -pv | grep "Res Group Nodelist"
# scconf -pv | grep "Node ID"
# scrgadm -pvv | grep "NetIfList.*value"
* Scalable Resource Group(s)
- Set maximum and desired primaries to appropriate number
# scrgadm -c -g apache-rg -y maximum_primaries="2" \
-y desired_primaries="2"
- Set remaining nodenames to scalable resource group
# scrgadm -c -g apache-rg -h node1,node3
- Remove from node list failover resource group with shared address
# scrgadm -c -g shareaddr-rg -h node1,node3
* Failover Resource Group(s)
- Set remaining nodenames to failover resource group
# scrgadm -c -g logical-rg -h node1,node3
# scrgadm -c -g dg1-rg -h node1,node3
- Check for IPMP groups affected
# scrgadm -pvv -g logical-rg | grep -i netiflist
# scrgadm -pvv -g shareaddr-rg | grep -i netiflist
- Update IPMP groups affected
# scrgadm -c -j logicalhost \
-x netiflist=sc_ipmp0 @1,sc_ipmp0@3
# scrgadm -c -j shared-address \
-x netiflist=sc_ipmp0@1,sc_ipmp0@3
* Verify changes to resource groups
# scrgadm -pvv -g apache-rg | grep -i nodelist
# scrgadm -pvv -g apache-rg | grep -i netiflist
# scrgadm -pvv -g shareaddr-rg | grep -i nodelist
# scrgadm -pvv -g shareaddr-rg | grep -i netiflist
# scrgadm -pvv -g logical-rg | grep -i nodelist
# scrgadm -pvv -g logical-rg | grep -i netiflist
3. Delete node instances from all disk device groups
* Solaris Volume Manager
- Check for diskgroups affected
# scconf -pv | grep -i "Device group" | grep node2
# scstat -D
- Remove node from diskset nodelist
# metaset -s setname -d -h nodelist (use -f needed)
* VERITAS Volume Manager
- Check for diskgroups affected
# scconf -pv | grep -i "Device group" | grep node2 # scstat -D
- Remove node from diskgroup nodelist
# scconf -r -D name=dg1,nodelist=node2
* Raw Disk Device Group
- Remember to change desired secondaries to 1
- On any active remaining node(s), identify device groups connected # scconf -pvv | grep node2 | grep "Device group node list"
- Determine raw device
# scconf -pvv | grep Disk
- Disable the localonly property of each Local_Disk
# scconf -c -D name=,localonly=false
- Verify disabled localonly property
# scconf -pvv | grep "Disk"
- Remove node from raw device
# scconf -r -D name=rawdisk-device-group,nodelist=node2
Steps 3-5 is not applicable for 2 node clusters.
3. Remove all fully connected quorum devices.
- Check quorum disk information
# scconf -pv | grep Quorum
- Remove quorum disk
# scconf -r -q globaldev=d
4. Remove all fully connected storage devices from node2. Use any method that will block access from node2 to shared storage
- vxdiskadm to suppress access from VxVM
- cfgadm -c unconfigure
- LUN masking/mapping methods if application
- physical cable removal if allowed
5. Add back the quorum devices
# scconf -a -q globaldev=d,node=node1,node=node3
6. Place the node being removed into maintenance state.
* Shutdown node2
# shutdown -g0 -y -i0
* On remaining node
# scconf -c -q node=node2,maintstate
* Verify quorum status
# scstat -q
7. Remove all logical transport connections from node being removed
* Check for interconnect configuration
# scstat -W
# scconf -pv | grep cable
# scconf -pv | grep adapter
* Remove cables configuration
# scconf -r -m endpoint=node2:qfe0
# scconf -r -m endpoint=node2:qfe1
* Remove adapter configuration
# scconf -r -A name=qfe0,node=node2
# scconf -r -A name=qfe1,node=node2
8. For 2 node clusters only, remove quorum disk.
* If not already done so, shutdown node to be uninstalled.
# shutdown -y -g 0
* On remaining node, put node to be removed in maintenence mode
# scconf -c -q node=node2,maintstate
* Place cluster in installmode
# scconf -c -q installmode
* Remove quorum disk
# scconf -r -q globaldev=d
* Verify quorum status
# scstat -q
9. Remove node from the cluster software configuration.
* # scconf -r -h node=node2
* # scstat -n
10. Remove cluster software
* If not already done so, shutdown node to be uninstalled.
# shutdown -g0 -y -i0
* Reboot the node into non-cluster mode.
ok> boot -x
* Remove all globally file systems except /global/.devices in /etc/vfstab
* Uninstall Sun Cluster software from the node
# scinstall -r
If it is desirable to remove the last node of the cluster, a complete removal of all resource and device groups will be required. Please follow the procedure below:
1. Offline all resource groups (RGs):
# scswitch -F -g [,...]
2. Disable all configured resources:
# scswitch -n -j [,...]
3. Remove all resources from the resource group:
# scrgadm -r -j
4. Remove the now empty resource groups:
# scrgadm -r -g
5. Remove global mounts in /etc/vfstab file and "/node@nodeid" mount options.
6. Remove all device groups:
# scstat -D (to get a list of device groups)
# scswitch -F -D device-group-name (to offline device-group)
# scconf -r -D name=device-group-name (to remove/unregister
NOTE: If there are any "rmt" devices, they must be removed with the command:
# /usr/cluster/dtk/bin/dcs_config -c remove -s rmt/1
This assumes that you have the package "SUNWscdtk". If you do not, you will need to install it in order to remove the rmt/XX entries, or the "scinstall -r" will fail.
The SUNWscdtk package is the diagnostics tool for cluster and is not available on the Cluster CD, you need to get it from the following URL:
http://suncluster.eng/service/tools.html
Uninstall the Sun Cluster 3.X software:
* If not already done so, shutdown node.
# shutdown -g0 -y -i0
* Reboot the node into non-cluster mode.
ok> boot -x
* Finally remove the SunCluster 3.x software using:
# scinstall -r

Product
Sun Cluster Geographic Edition 3.1 8/05
Solaris Cluster 3.2
Sun Cluster 3.1
Sun Cluster 3.1 Data Services Agents
Sun Cluster Agents 3.1 9/04
Sun Cluster Agents 3.1 4/04
Sun Cluster Agents 3.1 10/03
Sun Cluster Agents 3.1 05/03
Sun Cluster 3.1 9/04
Sun Cluster 3.1 8/05
Sun Cluster 3.1 7/05
Sun Cluster 3.1 4/04
Sun Cluster 3.1 10/03 for SunPlex Systems
Sun Cluster 3.0
Sun Cluster 3.0 7/01
Sun Cluster 3.0 5/02
Sun Cluster 3.0 12/01

Keywords
remove, removal, Cluster, node, scinstall, 3.x, ccr, resources

Blogs Website

Tuesday, August 18, 2009

How to remove DISKs/LUNs FROM Solaris

1. Identify the file systems.
2. Get the disks that belong to the file system.
3. Check them in the metaset/metadevice and make sure no one else is using them (no other soft partition).
4. Clean the metadevice from the metaset
5. REMOVE THE DISKS FROM THE METASET
6. REMOVE THE METADB FOR THE DISKS THAT YOU WANT TO REMOVE
7. ASK data storage to remove the disks
8. Configure the controllers after you confirm that the disks has been removed in all nodes
9. Run devfsadm -Cv in all nodes
9. Run scgdevs in ONE NODE (in case you are using SUN Cluster)
10. Run scdidadm -C in ONE NODE (in case you are using SUN Cluster)
11. Check all nodes have the same number of LUNS (in case you are using SUN Cluster)

For more information check http://docs.sun.com/app/docs/doc/817-1673/6mhcv6m38?a=view

Removing and unregistering a diskset from Sun Cluster

Today I realized that the procedure "How to Remove and Unregister a Device Group (Solaris Volume Manager)" lacks a specific example.
Lets assume the following diskset configuration on a Sun Cluster with two nodes named cluster01 and cluster02:
# cat /etc/lvm/md.tab
test_ds/d1 -m test_ds/d10
test_ds/d10 1 1 /dev/did/rdsk/d4s0

# metaset -s test_ds -a -h cluster01 cluster02
# metaset -s test_ds -a /dev/did/rdsk/d4
# metaset -s test_ds -a -m cluster01 cluster02
# metainit test_ds/d10
test_ds/d10: Concat/Stripe is setup
# metainit test_ds/d1test_ds/d1: Mirror is setup
# cldg show
=== Device Groups === Device Group Name: test_ds Type: SVM failback: false Node List: cluster01, cluster02 preferenced: true numsecondaries: 1 diskset name: test_ds
# cldg status
=== And now assume you want to remove and unregister this diskset again. Generally speaking you want to make sure prior to perform this, that
no file system is mounted on any node from this diskset
no entry on any node for this diskset is active in /etc/vfstab
no SUNW.HAStoragePlus resource is using this diskset or a file system from this diskset
Find out on which node the diskset is primary/online:
# cldg status
=== Cluster Device Groups ===--- Device Group Status ---Device Group Name Primary Secondary Status----------------- ------- --------- ------test_ds cluster01 cluster02 Online
Perform all following on the node where the diskset is primary/online (here: pplanet1):
Remove all metadevices on that diskset:
# metaclear -s test_ds -a
test_ds/d1: Mirror is clearedtest_ds/d10: Concat/Stripe is cleared

Remove all devices from that diskset (you need the -f option for the last one):
# metaset -s test_ds -d -f /dev/did/rdsk/d4

On a two node cluster, if mediators are configured, remove them:
# metaset -s test_ds -d -m pplanet1 pplanet2
For all nodes (but the node where the diskset is primary last) perform:
# metaset -s test_ds -d -h cluster02
# metaset -s test_ds -d -h cluster01
In /var/adm/messages you see the following after the last command:
Jun 2 02:21:33 cluster01 Cluster.Framework: [ID 801593 daemon.notice] stdout: no longer primary for test_ds
And you can confirm that the diskset is now removed and unregistered:
# cldg list#