Solaris sun cluster & SAN Storage: 2010

Tuesday, December 21, 2010

CLOSE_WAIT Connections – Tuning Solaris

This article aims to describe a way to tune TCP parameters on Solaris to get a better performance running a WebServer. I will show how the TCP connection is initiated and termintated on a high level and I will focus on how to tune TCP parameters on Solaris.

I have experienced some problems with a bunch of CLOSE_WAIT connections on Solaris affecting the application causing delays on response time and refusing new connections.

I will start this article explaning how a connection is initiated (TCP Three-Way Handshake sequence).

Let’s use the scenario of two servers (A and B) where the server A is going to initiate the connection.

1-) The first segment SYN is sent by the node A to node B. This is a request to server synchronizes the sequence numbers.

Node A —— SYN —–> Node B

2-) Node B sends an acknowledge (ACK) about the request of the Node A. At the same time, Node B is also sending its request (SYN) to the Node A for synchronization of its sequence numbers.

Node A <—– SYN/ACK —- Node B 3-) Node A then send an acknowledge to Node B. Node A —— ACK ——-> Node B

At this time the connection should be established.

Let’s show how the connections are terminated (here is the CLOSE_WAIT issue):

In the termination process, it is important to remember that each application process on each side of connection should close independently its half of connection. This terminating process consists of:

Let’s supose Node A will close its half of connection first.

1-)NodeA transmits a FIN packet to Node B.

(Established) (Established)
Node A —- FIN —-> Node B
(FIN_WAIT1)

2-)NodeB transmits an ACK packet to Node A:

NodeA <—- ACK —- Node B (FIN_WAIT2) (CLOSE_WAIT) //Here is the CLOSE_WAIT Issue. App on Node B should invoke close() method to close the connection on its end. If the App does not invoke close() method, then it will keep the connection stuck on CLOSE_WAIT for the time specified on TCP Stack. If you have much traffic in the server and a lot of connections on CLOSE_WAIT status it will cause some issues such as: - Refusing new connections request. - Slow on response time. - High Processing Resource Utilization. Now, I will describe some tips that helped me to solve some problems in Webservers. It basically consists of changing some TCP parameters in Solaris that will reduce the time that a connection will be on CLOSE_WAIT, releasing this kind of connection quickly. - TCP_TIME_INTERVAL parameter: Description: Notifies TCP/IP on how long to keep the connection control blocks closed. After the applications complete the TCP/IP connection, the control blocks are kept for the specified time. When high connection rates occur, a large backlog of the TCP/IP connections accumulates and can slow server performance. The server can stall during certain peak periods. If the server stalls, the netstat command shows that many of the sockets that are opened to the HTTP server are in the CLOSE_WAIT or FIN_WAIT_2 state. Visible delays can occur for up to four minutes, during which time the server does not send any responses, but CPU utilization stays high, with all of the activities in system processes. 1-) Verify the current value of this: ndd -get /dev/tcp tcp_time_wait_interval 2-) Set the new value ndd -set /dev/tcp tcp_time_wait_interval 60000 (Defaul value is 240000 milliseconds = 4 minutes. Recommended is 60000 milliseconds). - TCP_FIN_WAIT_2_FLUSH_INTERVAL Specifies the timer interval prohibiting a connection in the FIN_WAIT_2 state to remain in that state. 1-)Verify the current value of this: ndd -get /dev/tcp tcp_fin_wait_2_flush_interval 2-) Set the new value: ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500 (Default Value is 675000 milliseconds. Recommended is 67500 milliseconds). - TCP_KEEPALIVE_INTERVAL keepAlive packet ensures that a connection stays in an active and established state. 1-)Verify the current value of this: ndd -get /dev/tcp tcp_keepalive_interval 2-) Set the new value: ndd -set /dev/tcp tcp_keepalive_interval 300000 (Default Value is 7200000 milliseconds. Recommended is 15000 milliseconds). - Connection backlog It means that a high number of incoming connections results in failure. 1-)Verify the current value of this: ndd -get /dev/tcp tcp_conn_req_max_q 2-) Set the new value: ndd -set /dev/tcp tcp_conn_req_max_q 8000 (Default value is 128. Recommended is 8000) This configuration change will help to improve the system performance, and better than this, it will help to reduce major impacts. I have experienced situations that the application was not responding due to a lot of connections on CLOSE_WAIT status. In my case, we identified a bug in the application and we use this tunings as an work-around. It is very useful and it can help you when you are experiencing problems due to many connections on this state. Reference: This article was inspirate on IBM “Tuning Solaris systems” @ WebSphere Application Server Information Center. Additional Info: Local Server closes first: ESTABLISHED -> FIN_WAIT_1-> FIN_WAIT_2 -> TIME_WAIT -> CLOSED.

Remote Server closes first:
ESTABLISHED -> CLOSE_WAIT -> LAST_ACK -> CLOSED.

Local and Remote Server close at the same time:
ESTABLISHED -> FIN_WAIT_1-> CLOSING ->TIME_WAIT -> CLOSED.

Monday, December 20, 2010

Backup commands – usage and examples

Backup commands – ufsdump, tar , cpio
Unix backup and restore can be done using unix commands ufsdump , tar ,
cpio . Though these commands may be sufficient for small setups in
order to take a enterprise backup you have to go in for some custom
backup and restore solutions like Symatic netbackup, EMC networker or
Amanda .
Any backup solution using these commands depends on the type of backup you
are taking and capability of the commands to fulfill the requirement . Following
paragraphs will give you an idea of commands , syntax and examples.

Features of ufsdump , tar , cpio

ufsdump
1. Used for complete file system backup .
2. It copies every thing from regular files in a file system to special character and block device files.
2. It can work on mounted or unmounted file systems.

tar:
1. Used for single or multiple files backup .
2. Can’t backup special character & block device files ( 0 byte files ).
3. Works only on mounted file system.

cpio:
1. Used for single or multiple files backup .
2. Can backup special character & block device files .
3. Works only on mounted file system.
4. Need a list of files to be backed up .
5. Preserve hard links and time stamps of the files .

Identifying the tape device in Solaris

dmesg | grep st

Checking the status of the tape drive

mt -f /dev/rmt/0 status

Backup restore and disk copy with ufsdump :

Backup file system using ufsdump
ufsdump 0cvf /dev/rmt/0 /dev/rdsk/c0t0d0s0
or
ufsdump 0cvf /dev/rmt/0 /usr

To restore a dump with ufsrestore

ufsrestore rvf /dev/rmt/0
ufsrestore in interactive mode allowing selection of individual files and
directories using add , ls , cd , pwd and extract commands .
ufsrestore -i /dev/rmt/0

Making a copy of a disk slice using ufsdump

ufsdump 0f – /dev/rdsk/c0t0d0s7 |(cd /mnt/backup ;ufsrestore xf -)

Backup restore and disk copy with tar :

– Backing up all files in a directory including subdirectories to a tape device (/dev/rmt/0),

tar cvf /dev/rmt/0 *

Viewing a tar backup on a tape

tar tvf /dev/rmt/0

Extracting tar backup from the tape

tar xvf /dev/rmt/0
(Restoration will go to present directory or original backup path depending on
relative or absolute path names used for backup )

Backup restore and disk copy with tar :

Back up all the files in current directory to tape .

find . -depth -print | cpio -ovcB > /dev/rmt/0
cpio expects a list of files and find command provides the list , cpio has
to put these file on some destination and a > sign redirect these files to tape . This can be a file as well .

Viewing cpio files on a tape

cpio -ivtB < /dev/rmt/0

Restoring a cpio backup

cpio -ivcB < /dev/rmt/0

Compress/uncompress files :

You may have to compress the files before or after the backup and it can be done with following commands .
Compressing a file

compress -v file_name
gzip filename

To uncompress a file

uncompress file_name.Z
or
gunzip filename

What is a sticky bit

In Unix sticky bit is permission bit that protects the files within a directory. If the directory has the sticky bit set, a file can be deleted only by the owner of the file, the owner of the directory, or super user. This prevents a user from deleting other users’ files from public directories. A t or T in the access permissions column of a directory listing indicates that the sticky bit has been set, as shown here:

drwxrwxrwt 5 root sys 458 Oct 21 17:04 /public

Sticky bit cab be set by chmod command. You need to assign the octal value 1 as the first number in a series of four octal values.

# chmod 1777 public

Solaris Volume Manager (SVM) – Creating Disk Mirrors

One great thing about Solaris (x86 and Sparc) is that some really cool disk management software is built right in, and it’s called SVM, or Solaris Volume Manager. In previous versions of Solaris it was called Solstice Disksuite, or just Disksuite for short, and it’s still referred to by that name sometimes by people who have been doing this for a long time and therefore worked with that first. The point is that they are the same thing, except SVM is the new version of the tool. Today, we are going to look at what we need to create a mirror out of two disks. Actually, we’ll be creating a mirror between two slices (partitions) of two disks. You can, for example, create a mirror between the root file system slices if you want. Or, if you follow old school rules and break out /var, /usr, etc., you can mirror those as well. You can even mirror your swap slices if you don’t mind the performance hit and need that extra uptime assurance, but we’ll talk about swap in another article. For now, let’s talk about SVM and mirrors.
For the purposes of this article, I am going to assume I have a server with two SCSI hard drives, this is the same process for IDE drives, but the drive device names will be different. The device names I am going to use are /dev/dsk/c0t0d0 and /dev/dsk/c0t1d0, notice that they are the same except for the target (t) number changes, indicating the next disk on the bus. For the slices to use, let’s mirror the root file system on slice 0 and swap on slice 1, sound good? Good.
In order to use SVM, we have to setup what are called “meta databases”. These small databases hold all of the information pertaining to the mirrors that we create, and without them, the machine won’t start. It’s important to note here that it’s not just that the server won’t start without them, the server won’t start (i.e. It goes into single user mode) if you have SVM setup and it can’t find 50% or more of these meta databases. This means that you need to put SVM on your main two drives, or even distribute copies on all local drives if you want, but don’t, for any reason, put any meta databases on removable, external or SAN drives! If you do, and you ever try to start your machine with those drives gone, it won’t start! So keep it on the local drives to make your life easier later.
The disk mirroring is done after the Solaris OS (operating system) has been installed, and therefore we can be sure that the main drive is partitioned correctly since we had to do that as part of the install. However, we need to partition the second disk the same way, the disk label (partition structure) needs to be the same on both disks in the mirror.
We need to pick what partition will hold the meta databases, we already know where / and swap are going to go, and don’t forget that slice 2 is the whole disk or backup partition, so we don’t want to use that for anything. I normally put the meta databases on slice 7. I create a partition of 256MB, which is more than you need, you can use probably 10 if you want, I just like to have some room to grow in the future. It’s important to make sure you get all the slices setup before you do the install! Now that we have determined where all the slices are going to be and what they will hold (slice 0 is / or root, slice 1 is swap, and slice 7 holds the meta information), let’s copy the partition table from disk 0 to disk 1. Luckily, you can accomplish this in one easy step, like this:

#prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

Do you understand what we are doing here? We are using the prtvtoc (print vtoc, or disklabel) command to print the current partition structure, and piping it into the fmthard (format hard) command to essentially push the partition table from one drive to the other. Be sure you get the drive names absolutely correct, or you WILL destroy data! This will NOT ask you if you are sure, and there is NO WAY to undo this if you get it backwards, or wrong! Ok, the two disks now have matching labels, awesome! Next we need to create the meta databases, which will live on slice 7.

The command will look like this:
#metadb –a –c 3 -f c0t0d0s7 c0t1d0s7
See what we are doing here? We are issuing the metadb command, the -a says to add databases, the -c 3 says to add three copies (in case one gets corrupted), and the -f option is used to create the initial state database. It is also used to force the deletion of replicas below the minimum of one. (The -a and -f options should be used together only when no state databases exist). Lastly on the line we have the disks we want to setup the databases on. Note that we didn’t have to give the absolute of full device path (no /dev/dsk), and we added an s7 to indicate slice 7. Sweet, isn’t it?! Now we have our meta databases setup, so next we need to initialize the root slice on the primary disk. Don’t worry, even though we say initialize, it isn’t destructive. Basically, we tell the SVM software to create a meta device using that root partition, which will then be paired up with another meta device that represents the root partition of the other disk to make the mirror. The only thing here that you have to think about, is what you want to call the meta device. It will be a “d” with a number, and you will have a meta device for each partition, that will be mirrored to create another meta device that is the mirror. Got that? I normally name them all close to each other, something along the lines of d11 for the root slice of disk 1, d12 for the root slice of disk 2, and then d10 for the mirror itself that is made up of disks 1 and 2. That make sense? You can name it anything you want, and some folk use complicated naming schemes that involve disk ids and parts of the serial number, but I really don’t see the point in all that. The commands to initialize the root slices for both disks are as follows:

#metainit -f d11 1 1 c0t0d0s0
#metainit -f d12 1 1 c0t1d0s0
See how easy that is? We run the metainit command, using the -f again since we already have an operating system in place, we specify d11 and d12 respectively, and we want 1 physical device in the meta device (the 1 1 tells metainit to create a one to one concatenation of the disk). Again, like before, we specify the target disk, and again with no absolute device name. Take a look though and notice that we did change from s7 to s0, since we are trying to mirror slice 0 which is our root slice. Now that we have initialized the root slices of both disks, and created the two meta devices, we want to create the meta device that will be the mirror. This command will look like this:

#metainit d10 -m d11
Again, we use the metainit command, this time using -m to indicate we are creating a mirror called d10, and attaching d11. Whoah! Wait a minute pardner! Where’s d12 at you are asking? I know you are, admit it, you’re that good! I am glad you noticed. We actually will add that to the mirror (d10) later, after we do a couple other things and reboot the machine. This is a good spot to mention the metastat command. This command will show you the current status of all of your meta devices, like the mirror itself, and all of the disks in the mirror. It’s a good idea to run this once in awhile to make sure that you don’t have a failed disk that you don’t know about. For my systems, I have a script that runs from cron to check at regular intervals and email me when it sees a problem. Before we can reboot and attach d12, we have to issue the metaroot command that will setup d10 as our boot device (essentially it goes and changes the /etc/vfstab for you). Remember that this is only for a boot device. If you were mirroring two other drives (like in a server that has four disks) that you aren’t booting off of, you don’t metaroot those. The command looks like so:

#metaroot d10
How simple. That’s it! Well, that’s it for the root slice anyway. We’ll run through those same command to mirror the swap devices, which I will put down for you here with some notes, but without all the explanation. We’ll be using numbers in the 20′s for our devices, d20, d21 and d22. See if you can follow along:
(*Note: At this point, we already have the label and meta databases in place, so the prtvtoc and metadb steps aren’t needed.)

Initialize the swap slices:

metainit d21 1 1 c0t0d0s1=Notice we changed to
#metainit d22 1 1 c0t1d0s1=slice 1(s1) for swapNow,initialize the mirror
#metainit d20 -m d21
==============================================================
And there you go, at least for the meta device part. One thing to remember though, whether you are doing swap, or a separate set of disks, if you don’t run that metaroot command (like if it’s not the boot disk), you have to change the /etc/vfstab yourself or it won’t work. Here is where we point out a device name difference for meta devices. Instead of /dev/dsk for your mirror, the meta device is now located at /dev/md/dsk/ and then the meta device name. So, our root mirror is /dev/md/dsk/d10 and our swap mirror is /dev/md/dsk/d20. Simple huh? So for your swap mirror, you would edit /etc/vfstab and change the swap device from whatever it is now, to your meta device, which is /dev/md/dsk/d20 in this example. The rest of the entry stays the same, it’s just a different device name. Lastly, in order to make all this magic work, you have to restart the machine. Once it comes back up, you can attach the second drives of the mirror with this command:

For the root mirror
#metattach d10 d12
For the swap mirror
#metattach d20 d22

Once this is done, you should be able to see the mirrors re-syncing when you run the metastat command. Just run metastat, and for each mirror meta device, you should see the re-syncing status for a while. Once the sync is done, it should change to OK.

Example metastat output for d10 after the attachment:

d10: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d12
State: Resyncing
Resync in progress: 0 % done
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 279860352 blocks (133 GB)

d11: Submirror of d10
State: Okay
Size: 279860352 blocks (133 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s0 0 No Okay Yes

d12: Submirror of d10
State: Resyncing
Size: 279860352 blocks (133 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t1d0s0 0 No Okay Yes

There you have it, the output from the metastat command shows the meta device that is the mirror, d10, and the meta devices that make up the mirror. In addition, it shows the status of the mirror and devices which is real handy. For example, in the script that I use to monitor my disks, I use the following command to tell me if any meta devices have any status other than Okay. Check it out:

#metastat | grep State | egrep -v Okay

If I get any information back from that command, I just have the script email it to me so I know what is going on. Cool, huh?

We just had the long version, so here I am going to put the commands together, so you can simply see them all at once, and even use this as a reference. See what you think:

#prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

#metadb –a –c 3 -f c0t0d0s7 c0t1d0s7
#metainit -f d11 1 1 c0t0d0s0
#metainit -f d12 1 1 c0t1d0s0
#metainit d10 -m d11
#metaroot d10
#metainit d21 1 1 c0t0d0s1
#metainit d22 1 1 c0t1d0s1
#metainit d20 -m d21
>REBOOT<
#metattach d10 d12
#metattach d20 d22

There you have it! That’s how easy it is to create disk mirrors and protect your data with SVM. I hope you enjoyed this article and found it useful!

HOWTO: Mirrored root disk on Solaris

0. Partition the first disk
# format c0t0d0
Use the partition tool (=> "p , p "!) to setup the slices. We assume the following slice setup afterwards:
# Tag Flag Cylinders Size Blocks
- ---------- ---- ------------- -------- --------------------
0 root wm 0 - 812 400.15MB (813/0/0) 819504
1 swap wu 813 - 1333 256.43MB (521/0/0) 525168
2 backup wm 0 - 17659 8.49GB (17660/0/0) 17801280
3 unassigned wm 1334 - 1354 10.34MB (21/0/0) 21168
4 var wm 1355 - 8522 3.45GB (7168/0/0) 7225344
5 usr wm 8523 - 14764 3.00GB (6242/0/0) 6291936
6 unassigned wm 14765 - 16845 1.00GB (2081/0/0) 2097648
7 home wm 16846 - 17659 400.15MB (813/0/0) 819504
1. Copy the partition table of the first disk to its future mirror disk
# prtvtoc /dev/rdsk/c0t0d0s2 fmthard -s - /dev/rdsk/c0t1d0s2
2. Create at least two state database replicas on each disk
# metadb -a -f -c 2 c0t0d0s3 c0t1d0s3
Check the state of all replicas with metadb:
# metadb
Notes:
A state database replica contains configuration and state information about the meta devices. Make sure that always at least 50% of the replicas are active!

3. Create the root slice mirror and its first submirror
# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d30 -m d10
Run metaroot to prepare /etc/vfstab and /etc/system (do this only for the root slice!):
# metaroot d30
4. Create the swap slice mirror and its first submirror
# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d31 -m d11
5. Create the var slice mirror and its first submirror
# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d34 -m d14
6. Create the usr slice mirror and its first submirror
# metainit -f d15 1 1 c0t0d0s5
# metainit -f d25 1 1 c0t1d0s5
# metainit d35 -m d15
7. Create the unassigned slice mirror and its first submirror
# metainit -f d16 1 1 c0t0d0s6
# metainit -f d26 1 1 c0t1d0s6
# metainit d36 -m d16
8. Create the home slice mirror and its first submirror
# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d37 -m d17
9. Edit /etc/vfstab to mount all mirrors after boot, including mirrored swap

/etc/vfstab before changes:
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c0t0d0s1 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /usr ufs 1 no ro,logging
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /var ufs 1 no nosuid,logging
/dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /home ufs 2 yes nosuid,logging
/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -
/etc/vfstab after changes:
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/md/dsk/d31 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/md/dsk/d35 /dev/md/rdsk/d35 /usr ufs 1 no ro,logging
/dev/md/dsk/d34 /dev/md/rdsk/d34 /var ufs 1 no nosuid,logging
/dev/md/dsk/d37 /dev/md/rdsk/d37 /home ufs 2 yes nosuid,logging
/dev/md/dsk/d36 /dev/md/rdsk/d36 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -
Notes:
The entry for the root device (/) has already been altered by the metaroot command we executed before.

10. Reboot the system
# lockfs -fa && init 6
11. Attach the second submirrors to all mirrors
# metattach d30 d20
# metattach d31 d21
# metattach d34 d24
# metattach d35 d25
# metattach d36 d26
# metattach d37 d27
Notes:
This will finally cause the data from the boot disk to be synchronized with the mirror drive.
You can use metastat to track the mirroring progress.

12. Change the crash dump device to the swap metadevice
# dumpadm -d `swap -l tail -1 awk '{print $1}'
13. Make the mirror disk bootable
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0
Notes:
This will install a boot block to the second disk.

14. Determine the physical device path of the mirror disk
# ls -l /dev/dsk/c0t1d0s0
... /dev/dsk/c0t1d0s0 -> ../../devices/pci@1f,4000/scsi@3/sd@1,0:a
15. Create a device alias for the mirror disk
# eeprom "nvramrc=devalias mirror /pci@1f,4000/scsi@3/disk@1,0"
# eeprom "use-nvramrc?=true"
Add the mirror device alias to the Open Boot parameter boot-device to prepare the case of a problem with the primary boot device.
# eeprom "boot-device=disk mirror cdrom net"
You can also configure the device alias and boot-device list from the Open Boot Prompt (OBP a.k.a. ok prompt):
ok nvalias mirror /pci@1f,4000/scsi@3/disk@1,0
ok use-nvramrc?=true
ok boot-device=disk mirror cdrom net
Notes:
From the OBP, you can use boot mirror to boot from the mirror disk.
On my test system, I had to replace sd@1,0:a with disk@1,0. Use devalias on the OBP prompt to determine the correct device path.

Monday, November 15, 2010

A quick guide to setting up imap on solaris

A quick guide to setting up imap on solaris

Installing packages

Get the following packages from www.sunfreeware.com:

openssl-0.9.8e-sol10-sparc-local
imap-2006e-sol10-sparc-local

and install both of them

/etc/services configuration

Ensure the following /etc/services entries are present
pop2 109/tcp pop pop-2 # Post Office Protocol - V2
pop3 110/tcp # Post Office Protocol - Version 3
imap 143/tcp imap2 # Internet Mail Access Protocol v2
imaps 993/tcp

inetd configuration

The inetd configuration on Solaris 10 is a pain to setup now that you cant just edit inetd.conf, however you can use inetd.conf as an input to inetconv.

This is the easiest way !

Add in the following to inetd.conf

pop stream tcp nowait root /usr/local/sbin/ipop2d ipop2d
pop3 stream tcp nowait root /usr/local/sbin/ipop3d ipop3d
imap stream tcp nowait root /usr/local/sbin/imapd imapd
pop3s stream tcp nowait root /usr/local/sbin/ipop3d ipop3d
imaps stream tcp nowait root /usr/local/sbin/imapd imapd

Then run
#inetconv -f

to create the service entries. Then use inetadm to check they are ok.

root@host: inetadm | egrep "pop|imap"

enabled online svc:/network/pop3/tcp:default
enabled online svc:/network/imap/tcp:default
enabled online svc:/network/pop3s/tcp:default
enabled online svc:/network/imaps/tcp:default
enabled online svc:/network/pop/tcp:default

SSL configuration

Then you need to create SSL certificate as imapd will not accept plain text authentication

If you dont you will see the following type of errors in syslog when you try to connect with a plain text passwd.

Mar 29 09:56:58 myserver imapd[6959]: [ID 210418 auth.notice] Login disabled user=user1 auth=user1 host=myotherserver.example.com [10.11.12.13]

Use openssl to create certificate for imap.

cd /usr/local/ssl/certs

/usr/local/ssl/bin/openssl req -new -x509 -nodes -out imapd.pem \

-keyout imapd.pem -days 365

This should create an imapd.pem certificate file in the cert directory

Client configuration

Then in the account options on your mail client (netscape, outlook etc) choose the option to authenticate using SSL.

Thursday, November 11, 2010

Recovering a System to a Different Machine Using ufsrestore and the Solaris 9 or 10 OS

Here is a procedure for recovering a failed server to another server on a different platform. This failed server is running the Solaris 9 or 10 Operating System for SPARC platforms and Solaris Volume Manager. The procedure could be modified to work for a system that runs the Solaris OS for x86 platforms.

Scenario: A couple of servers share one tape drive connecting to server Tapehost. The servers are backed up to tapes using ufsdump. One old server, Myhost, which runs Solaris Volume Manager, fails -- and you want to restore Myhost to a new server on a different platform.

Part A: Restore From Remote Tape to the New Machine

If there is more than one ufsdump image on a tape, you must write down which image is for which file system backup of Myhost right after the backup occurs.

Here, I assume that the root file system's full backup of Myhost is the third image on the tape.

1. Position the tape in Tapehost (10.1.1.47) for the root file system's full backup image of Myhost:

root@Tapehost# mt -f /dev/rmt/0n fsf 3

2. On Myhost (10.1.1.46), boot into single-user mode from a CD-ROM of the same OS version:

ok boot cdrom -s

3. Enable the network port, for example, bge0:

# ifconfig bge0 10.1.1.46 up

4. Using the format command, prepare partitions for file systems. The basic procedure is to format the disk, select the disk, create partitions, and label the disk.

5. Create a new root file system on a partition, for example, /dev/rdsk/c1t0d0s0, and mount it to /mnt:

# newfs /dev/rdsk/c1t0d0s0

# mount /dev/dsk/c1t0d0s0 /mnt

6. Restore the full backup of the root file system from tape:

# cd /mnt

# ufsrestore rf 10.1.1.47:/dev/rmt/0n

7. If you want to restore the incremental backup, re-position the remote tape and use the ufsrestore command again. After restoring, remove the restoresymtable file.

# rm restoresymtable

8. Install boot block in the root disk:

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk

/dev/rdsk/c1t0d0s0

9. Unmount the root file system:

# cd /

# umount /mnt

10. Repeat steps 1, 5, 6, 7, and 9 to restore other file systems.

11. Mount the root file system, /dev/dsk/c1t0d0s0 to /mnt, and edit /mnt/etc/vfstab so that each mount point mounts in the correct partition.

For example, change the following line from this:

/dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no -

To this:

/dev/dsk/c1t0d0s0 /dev/rdsk/c1t0d0s0 / ufs 1 no -

Part B: Remove Solaris Volume Manager Information

Use the procedure below to remove the Solaris Volume Manager information.

Note: Another way to clear out Solaris Volume Manager, is to reboot into single-user mode and use metaclear, metadb -d. But with the Solaris 10 OS, the mdmonitor service will complain when the system first reboots. However, the complaints will be gone after the Solaris Volume Manager information is cleared out.

1. If Myhost had a mirrored root file system, there is an entry similar to rootdev:/pseudo/md@0:0,0,blk in the /etc/system file. After performing the procedure in Part A, remove this entry from /mnt/etc/system. Do not just comment it out.

2. All of the Solaris Volume Manager information is stored in three files: /kernel/drv/md.conf, /etc/lvm/mddb.cf, and /etc/lvm/md.cf. So to clear out Solaris Volume Manager, overwrite these files with the files from a system without Solaris Volume Manager.

Note: If you intend to configure the meta devices the same way they were, configuration information is in the /etc/lvm/md.cf file. So take notes before this file is overwritten.

# cp /kernel/drv/md.conf /mnt/kernel/drv/md.conf

# cp /etc/lvm/mddb.cf /mnt/etc/lvm/mddb.cf

# cp /etc/lvm/md.cf /mnt/etc/lvm/md.cf

Part C: Reconfigure /devices, /dev, and /etc/path_to_inst

1. Because the new server has different hardware than the old server, the device trees will change too. Update the /etc/path_to_inst file to reflect this change.

# rm -r /mnt/dev/*

# rm -r /mnt/devices/*

# devfsadm -C -r /mnt -p /mnt/etc/path_to_inst

2. Reboot the system from the root disk:

# init 6

If it does not reboot, you can use setenv boot-device from OpenBoot PROM or eeprom boot-device from the OS to set up the root disk as boot disk.

Monday, September 20, 2010

Simple Guide on Installing 2 Nodes Oracle 10g RAC on Solaris 10 64bit

Introduction

Network Configuration (Hostname and IP address)
Create Oracle groups and Oracle user
Prepare disk for Oracle binaries (Local disk)
iSCSI Configuration
Prepare disk for OCR, Voting and ASM
Setting Kernel Parameters
Check and install required package
Installing Oracle Clusterware
Installing Oracle Database 10g Software
Create ASM instance and ASM diskgroup

Introduction

These article are intended for people who have basic knowledge of Oracle RAC. This article does not detail everything required to be understood in order to configure a RAC database. Please refer to Oracle documentation for explanation.

This article, however, focuses on putting together your own Oracle RAC 10g environment for development and testing by using Solaris servers and a low cost shared disk solution; iSCSI by using Openfiler (Openfiler installation and disk management is not covered in this article).

The two Oracle RAC nodes will be configured as follows:

Oracle Database Files
RAC Node Name Instance Name Database Name $ORACLE_BASE File System for DB Files
soladb1 sola1 sola /oracle ASM
soladb2 sola2 sola /oracle ASM

Oracle Clusterware Shared Files
File Type File Name iSCSI Volume Name Mount Point File System
Oracle Cluster Registry /dev/rdsk/c2t3d0s2 ocr RAW
CRS Voting Disk /dev/rdsk/c2t4d0s2 vot RAW

The Oracle Clusterware software will be installed to /oracle/product/10.2.0/crs_1 on both the nodes that make up the RAC cluster. All of the Oracle physical database files (data, online redo logs, control files, archived redo logs) will be installed to shared volumes being managed by Automatic Storage Management (ASM).

1. Network Configuration (Hostname and IP address)

Perform the following network configuration on both Oracle RAC nodes in the cluster

Both of the Oracle RAC nodes should have one static IP address for the public network and one static IP address for the private cluster interconnect. The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data along with data for the network storage server (Openfiler). Although it is possible to use the public network for the interconnect, this is not recommended as it may cause degraded database performance (reducing the amount of bandwidth for Cache Fusion and Cluster Manager traffic). For a production RAC implementation, the interconnect should be at least gigabit (or more) and only be used by Oracle as well as having the network storage server on a separate gigabit network.

The following example is from soladb1:

i. Update entry of /etc/hosts

# cat /etc/hosts

127.0.0.1 localhost

# Public Network (e1000g0)
192.168.2.100 soladb1 loghost
192.168.2.101 soladb2

# Public Virtual IP (VIP) addresses
192.168.2.104 soladb1-vip
192.168.2.105 soladb2-vip

# Private Interconnect (e1000g1)
10.0.0.100 soladb1-priv
soladb2-priv

ii. Edit name of server hostname by update /etc/nodename file
# cat /etc/nodename
soladb1
iii. Update/add file /etc/hostname. to
# cat hostname.e1000g0
soladb1

# cat hostname.e1000g1
soladb1-priv

Once the network is configured, you can use the ifconfig command to verify everything is working. The following example is from soladb1:

# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.2.100 netmask ffffff00 broadcast 192.168.2.255
ether 0:50:56:99:45:20
e1000g1: flags=1000843 mtu 1500 index 3
inet 10.0.0.100 netmask ff000000 broadcast 10.255.255.255
ether 0:50:56:99:4f:a1

Adjusting Network Settings
The UDP (User Datagram Protocol) settings affect cluster interconnect transmissions. If the buffers set by these parameters are too small, then incoming UDP datagrams can be dropped due to insufficient space, which requires send-side retransmission. This can result in poor cluster performance.

On Solaris, the UDP parameters are udp_recv_hiwat and udp_xmit_hiwat. The default values for these paramaters on Solaris 10 are 57344 bytes. Oracle recommends that you set these parameters to at least 65536 bytes.

To see what these parameters are currently set to, enter the following commands:
# ndd /dev/udp udp_xmit_hiwat
# ndd /dev/udp udp_recv_hiwat

To set the values of these parameters to 65536 bytes in current memory, enter the following commands:
# ndd -set /dev/udp udp_xmit_hiwat 65536
# ndd -set /dev/udp udp_recv_hiwat 65536

We need to write a startup script udp_rac in /etc/init.d with the following contents to set to these values when the system boots.

#!/sbin/sh
case "$1" in
'start')
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536
;;
'state')
ndd /dev/udp udp_xmit_hiwat
ndd /dev/udp udp_recv_hiwat
;;
*)
echo "Usage: $0 { start | state }"
exit 1
;;
esac

We now need to create a link to this script in the /etc/rc3.d directory.

# ln -s /etc/init.d/udp_rac /etc/rc3.d/S86udp_rac

2. Create Oracle groups and Oracle user
Perform the following task on all Oracle RAC nodes in the cluster
We will create the dba group and the oracle user account along with all appropriate directories.

# mkdir -p /oracle
# groupadd –g 501 oinstall
# groupadd –g 502 dba

# useradd -s /usr/bin/bash -u 500 -g 501 -G 502 -d /oracle oracle -c "Oracle Software Owner" oracle
# chown -R oracle:dba /oracle
# passwd oracle

Modify Oracle user environment variable
Perform the following task on all Oracle RAC nodes in the cluster

After creating the oracle user account on both nodes, ensure that the environment is setup correctly by using the following .bash_profile (Please note that the .bash_profile will not exist on Solaris; you will have to create it).

The following example is from soladb1:

# su – oracle
$ cat .bash_profile
PATH=/usr/sbin:/usr/bin
export ORACLE_SID=sola1
export ORACLE_BASE=/oracle
export ORACLE_HOME=/oracle/product/10.2.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.2.0/crs_1
export PATH=$PATH:$ORACLE_HOME/bin:$ORA_CRS_HOME/bin

3. Prepare disk for Oracle binaries (Local disk)

Perform the following task on all Oracle RAC nodes in the cluster

1. Format the disk

# format
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c1t1d0
/pci@0,0/pci15ad,1976@10/sd@1,0
Specify disk (enter its number): 1

format> fdisk
No fdisk table exists. The default partition for the disk is:
a 100% "SOLARIS System" partition
Type "y" to accept the default partition, otherwise type "n" to edit the
partition table.
Y

format> p
PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
! - execute , then return
quit

partition> p (print - display the current table)
Current partition table (original):
Total disk cylinders available: 2607 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 2606 19.97GB (2607/0/0) 41881455
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 7.84MB (1/0/0) 16065
9 unassigned wm 0 0 (0/0/0) 0partition> label

partition> label
Ready to label disk, continue? Y

2. Create solaris file system
# newfs /dev/dsk/c1t1d0s2

3. Add entry to /etc/vfstab
# cat /etc/vfstab
/dev/dsk/c1t1d0s2 /dev/rdsk/c1t1d0s2 /oracle ufs - yes -

4. mount the filesystem
# mkdir /oracle
# mount /oracle

5.Change Owner of /oracle
# chown -R oracle:oinstall /oracle

4. iSCSI Configuration

Perform the following task on all Oracle RAC nodes in the cluster

In this article, we will be using the Static Config method. We first need to verify that the iSCSI software packages are installed on our servers before we can proceed further.

# pkginfo SUNWiscsiu SUNWiscsir
system SUNWiscsir Sun iSCSI Device Driver (root)
system SUNWiscsiu Sun iSCSI Management Utilities (usr)

After verifying that the iSCSI software packages are installed to the client machines (soladb1, soladb2) and that the iSCSI Target (Openfiler) is configured, run the following from the client machine to discover all available iSCSI LUNs. Note that the IP address for the Openfiler network storage server is accessed through the private network and located at the address 10.0.0.108

Configure the iSCSI target device to be discovered static by specifying IQN, IP Address and port no:

# iscsiadm add static-config iqn.2006-01.com.openfiler:tsn.2fc90b6b9c73,10.0.0.108:3260

Listing Current Discovery Settings
# iscsiadm list discovery
Discovery:
Static: disable
Send Targets: disabled
iSNS: disabled

The iSCSI connection is not initiated until the discovery method is enabled. This is enabled using the following command:

# iscsiadm modify discovery --static enable

Create the iSCSI device links for the local system. The following command can be used to do this:

# devfsadm -i iscsi

To verify that the iSCSI devices are available on the node, we will use the format command. The output of the format command should look like the following:

# format
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c1t1d0
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t3d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,0
3. c2t4d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,1
4. c2t5d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,2
5. c2t6d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,3
6. c2t7d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,4
Specify disk (enter its number):

5. Prepare disk for OCR, Voting and ASM

Perform the following task on one(1) of the Oracle RAC nodes in the cluster

Now, we need to create partitions on the iSCSI volumes. The main point is that when formatting the devices to be used for the OCR and the Voting Disk files, the disk slices to be used must skip the first cylinder (cylinder 0) to avoid overwriting the disk VTOC (Volume Table of Contents). The VTOC is a special area of disk set aside for aside for storing information about the disk’s controller, geometry and slices.

Oracle Shared Drive Configuration
File System Type iSCSI Target
(short) Name Size Device Name ASM Dg Name File Types
RAW ocr 300 MB /dev/rdsk/c2t3d0s2 Oracle Cluster Registry (OCR) File
RAW vot 300 MB /dev/rdsk/c2t4d0s2 Voting Disk
RAW asmspfile 30 MB /dev/rdsk/c2t7d0s2 ASM SPFILE
ASM asm1 14 GB /dev/rdsk/c2t5d0s2 DATA Oracle Database Files
ASM asm2 14 GB /dev/rdsk/c2t6d0s2 ARCH Oracle Database Files

Perform below operation for all the disk from the solaris1 node only using format command.

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c1t1d0
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t3d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,0
3. c2t4d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,1
4. c2t5d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,2
5. c2t6d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,3
6. c2t7d0
/iscsi/disk@0000iqn.2006-01.com.openfiler%3Atsn.0db3c7c0efb1FFFF,4
Specify disk (enter its number): 2
selecting c2t3d0
[disk formatted]

FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
fdisk - run the fdisk program
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return
quit

format> partition
Please run fdisk first
format> fdisk
No fdisk table exists. The default partition for the disk is:

a 100% "SOLARIS system" partition

Type "y" to accept the default partition, otherwise type "n" to edit the partition table.
y
format> partition
PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
! - execute , then return
quit

partition> print
Current partition table (unnamed):
Total disk cylinders available: 508 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 507 508.00MB (508/0/0) 1040384
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 1.00MB (1/0/0) 2048
9 unassigned wm 0 0 (0/0/0) 0

partition> 2
Part Tag Flag Cylinders Size Blocks
2 unassigned wm 0 - 507 508.00MB (508/0/0) 1040384

Enter partition id tag[backup]:

Enter partition permission flags[wm]:
Enter new starting cyl[0]: 5
Enter partition size[0b, 0c, 3e, 0.00mb, 0.00gb]: $
partition> label
Ready to label disk, continue? y

partition> quit

Repeat this operation for all the iSCSI disk.Setting Device Permissions

The devices we will be using for the various components of this article (e.g. the OCR and the voting disk) must have the appropriate ownership and permissions set on them before we can proceed to the installation stage. We will the set the permissions and ownerships using the chown and chmod commands as follows: (this must be done as the root user)

# chown root:oinstall /dev/rdsk/c2t3d0s2
# chmod 660 /dev/rdsk/c2t1d0s1
# chown oracle:oinstall /dev/rdsk/c2t4d0s2
# chmod 660 /dev/rdsk/c2t4d0s2
# chown oracle: oinstall /dev/rdsk/c2t7d0s2
# chown oracle: oinstall /dev/rdsk/c2t5d0s2
# chown oracle: oinstall /dev/rdsk/c2t6d0s2

These permissions will be persistent accross reboots. No further configuration needs to be performed with the permissions.

6. Setting Kernel Parameters
In Solaris 10, there is a new way of setting kernel parameters. The old Solaris 8 and 9 way of setting kernel parameters by editing the /etc/system file is deprecated. A new method of setting kernel parameters exists in Solaris 10 using the resource control facility and this method does not require the system to be re-booted for the change to take effect.

Create a default project for the oracle user.
# projadd -U oracle -K "project.max-shm-memory=(priv,4096MB,deny)" user.oracle

Modify the max-shm-memory Parameter
# projmod -s -K "project.max-shm-memory=(priv,4096MB,deny)" user.oracle

Modify the max-sem-ids Parameter
# projmod -s -K "project.max-sem-ids=(priv,256,deny)" user.oracle

Check the Parameters as User oracle
$ prctl -i project user.oracle

Configure RAC Nodes for Remote Access
Perform the following configuration procedures on both Oracle RAC nodes in the cluster.

Before you can install and use Oracle RAC, you must configure either secure shell (SSH) or remote shell (RSH) for the oracle user account both of the Oracle RAC nodes in the cluster. The goal here is to setup user equivalence for the oracle user account. User equivalence enables the oracle user account to access all other nodes in the cluster without the need for a password. This can be configured using either SSH or RSH where SSH is the preferred method.
Perform below operation as User oracle to setup RSH between all nodes.

# su – oracle
$ cd
$ vi .rhosts
+

7. Check and install required package

Perform the following checks on all Oracle RAC nodes in the cluster

The following packages must be installed on each server before you can continue. To check whether any of these required packages are installed on your system, use the pkginfo -i command as follows:
# pkginfo -i SUNWarc SUNWbtool SUNWhea SUNWlibmr SUNWlibm SUNWsprot SUNWtoo SUNWi1of SUNWi1cs SUNWi15cs SUNWxwfnt SUNWxwplt SUNWmfrun SUNWxwplr SUNWxwdv SUNWbinutils SUNWgcc SUNWuiu8

If you need to install any of the above packages, use the pkgadd –d command. E.g.
# pkgadd -d /cdrom/sol_10_1009_x86/Solaris_10/Product -s /var/spool/pkg SUNWi15cs
# pkgadd SUNWi15cs

8. Installing Oracle Clusterware
Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (soladb1). The Oracle Clusterware software will be installed to both of the Oracle RAC nodes in the cluster by the OUI.

Using xstart or any xterm client, login as Oracle user and start the installation.

$ ./runInstaller.sh

Screen Name Response
Welcome Screen Click Next
Specify Inventory directory and credentials Accept the default values:
Inventory directory: /oracle/oraInventory
Operating System group name: oinstall
Specify Home Details Set the Name and Path for the ORACLE_HOME (actually the $ORA_CRS_HOME that I will be using in this article) as follows:
Name: OraCrs10g_home
Path: /oracle/product/10.2.0/crs_1
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle Clusterware software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox. For my installation, all checks passed with no problems.

Click Next to continue.

Specify Cluster Configuration Cluster Name: crs

Public Node Name Private Node Name Virtual Node Name
soladb1 soladb1-priv soladb1-vip
soladb2 soladb2-priv soladb2-vip

Specify Network Interface Usage Interface Name Subnet Interface Type
e1000g0 192.168.2.0 Public
e1000g1 10.0.0.0 Private

Specify OCR Location Starting with Oracle Database 10g Release 2 (10.2) with RAC, Oracle Clusterware provides for the creation of a mirrored OCR file, enhancing cluster reliability. For the purpose of this example, I did not choose to mirror the OCR file by using the option of “External Redundancy”:

Specify OCR Location: /dev/rdsk/c2t3d0s2

Specify Voting Disk Location For the purpose of this example, I did not choose to mirror the voting disk by using the option of “External Redundancy”:

Voting Disk Location: /dev/rdsk/c2t4d0s2

Summary Click Install to start the installation!
Execute Configuration Scripts After the installation has completed, you will be prompted to run the orainstRoot.sh and root.sh script. Open a new console window on both Oracle RAC nodes in the cluster, (starting with the node you are performing the install from), as the “root” user account.

Navigate to the / oracle/oraInventory directory and run orainstRoot.sh ON ALL NODES in the RAC cluster.

--------------------------------------------------------------------------------
Within the same new console window on both Oracle RAC nodes in the cluster, (starting with the node you are performing the install from), stay logged in as the “root” user account.

Navigate to the /oracle/product/10.2.0/crs_1 directory and locate the root.sh file for each node in the cluster – (starting with the node you are performing the install from). Run the root.sh file ON ALL NODES in the RAC cluster ONE AT A TIME.

You will receive several warnings while running the root.sh script on all nodes. These warnings can be safely ignored.

The root.sh may take awhile to run.

Go back to the OUI and acknowledge the “Execute Configuration scripts” dialog window after running the root.sh script on both nodes.

End of installation At the end of the installation, exit from the OUI.

After successfully install Oracle 10g Clusterware (10.2.0.1), start the OUI for patching the clusteware with the latest patch available (10.2.0.5). We can refer back above step for the patching activity.

Verify Oracle Clusterware Installation
After the installation of Oracle Clusterware, we can run through several tests to verify the install was successful. Run the following commands on both nodes in the RAC Cluster

$ ./oracle/product/10.2.0/crs_1/bin/olsnodes
soladb1
soladb2

$ ./oracle/product/10.2.0/crs_1/bin/crs_stat –t
Name Type Target State Host
------------------------------------------------------------
ora....db1.gsd application ONLINE ONLINE soladb1
ora....db1.ons application ONLINE ONLINE soladb1
ora....db1.vip application ONLINE ONLINE soladb1
ora....db2.gsd application ONLINE ONLINE soladb2
ora....db2.ons application ONLINE ONLINE soladb2
ora....db2.vip application ONLINE ONLINE soladb2

9. Installing Oracle Database 10g Software
Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (soladb1). The Oracle Database software will be installed to both of the Oracle RAC nodes in the cluster by the OUI.

Using xstart or any xterm client, login as Oracle user and start the installation.

$ ./runInstaller.sh

Screen Name Response
Welcome Screen Click Next
Select Installation Type Select the Enterprise Edition option.
Specify Home Details Set the Name and Path for the ORACLE_HOME as follows:
Name: OraDb10g_home1
Path: /oracle/product/10.2.0/db_1
Specify Hardware Cluster Installation Mode Select the Cluster Installation option then select all nodes available. Click Select All to select all servers: soladb1 and soladb2.

If the installation stops here and the status of any of the RAC nodes is “Node not reachable”, perform the following checks:

Ensure Oracle Clusterware is running on the node in question. (crs_stat –t)
Ensure you are table to reach the node in question from the node you are performing the installation from.

Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle database software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox.

If you did not run the OUI with the ignoreSysPrereqs option then the Kernel parameters prerequisite check will fail. This is because the OUI is looking at the /etc/system file to check the kernel parameters. As we discussed earlier, this file is not used by default in Solaris 10. This is documented in Metalink Note 363436.1.

Click Next to continue.

Select Database Configuration Select the option to “Install database software only.”

Remember that we will create the clustered database as a separate step using DBCA.

Summary Click on Install to start the installation!
Root Script Window – Run root.sh After the installation has completed, you will be prompted to run the root.sh script. It is important to keep in mind that the root.sh script will need to be run on all nodes in the RAC cluster one at a time starting with the node you are running the database installation from.

First, open a new console window on the node you are installing the Oracle 10g database software from as the root user account. For me, this was solaris1.

Navigate to the /oracle/product/10.2.0/db_1 directory and run root.sh.

After running the root.sh script on all nodes in the cluster, go back to the OUI and acknowledge the “Execute Configuration scripts” dialog window.

End of installation At the end of the installation, exit from the OUI.

After successfully install Oracle Database 10g (10.2.0.1), start the OUI for patching the database with the latest patch available (10.2.0.5). We can refer back above step for the patching activity.

Run the Network Configuration Assistant
To start NETCA, run the following:
$ netca

The following table walks you through the process of creating a new Oracle listener for our RAC environment.

Screen Name Response
Select the Type of Oracle
Net Services Configuration Select Cluster Configuration
Select the nodes to configure Select all of the nodes: soladb1 and soladb2.
Type of Configuration Select Listener configuration.
Listener Configuration – Next 6 Screens The following screens are now like any other normal listener configuration. You can simply accept the default parameters for the next six screens:
What do you want to do: Add
Listener name: LISTENER
Selected protocols: TCP
Port number: 1521
Configure another listener: No
Listener configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Select Naming Methods configuration.
Naming Methods Configuration The following screens are:
Selected Naming Methods: Local Naming
Naming Methods configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Click Finish to exit the NETCA.

The Oracle TNS listener process should now be running on all nodes in the RAC cluster.

$ crs_stat –t
Name Type Target State Host
------------------------------------------------------------
ora....B1.lsnr application ONLINE ONLINE soladb1
ora....db1.gsd application ONLINE ONLINE soladb1
ora....db1.ons application ONLINE ONLINE soladb1
ora....db1.vip application ONLINE ONLINE soladb1
ora....B2.lsnr application ONLINE ONLINE soladb2
ora....db2.gsd application ONLINE ONLINE soladb2
ora....db2.ons application ONLINE ONLINE soladb2
ora....db2.vip application ONLINE ONLINE soladb2

10. Create ASM instance and ASM diskgroup

To start the ASM instance creation process, run the following command on any nodes of the Oracle 10g RAC cluster as oracle user.

$ dbca

Screen Name Response
Welcome Screen Select “Oracle Real Application Clusters database.”
Operations Select Configure Automatic Storage Management
Node Selection Click on the Select All button to select all servers: soladb1 and soladb2.
Create ASM Instance Supply the SYS password to use for the new ASM instance.

Also, starting with Oracle 10g Release 2, the ASM instance server parameter file (SPFILE) needs to be on a shared disk. You will need to modify the default entry for “Create server parameter file (SPFILE)” to reside on the RAW partition as follows: /dev/rdsk/c2t7d0s2. All other options can stay at their defaults.

You will then be prompted with a dialog box asking if you want to create and start the ASM instance. Select the OK button to acknowledge this dialog.

The OUI will now create and start the ASM instance on all nodes in the RAC cluster.

ASM Disk Groups To start, click the Create New button. This will bring up the “Create Disk Group” window with the three of the partitions we created earlier. If you didn’t see any disk, click the Change Disk Discovery Path button and enter /dev/rdsk/*

For the first “Disk Group Name”, I used the string “DATA”. Select the first RAW partitions (in my case /dev/rdsk/c2t5d0s2) in the “Select Member Disks” window. Keep the “Redundancy” setting to “External”.

After verifying all values in this window are correct, click the [OK] button. This will present the “ASM Disk Group Creation” dialog. When the ASM Disk Group Creation process is finished, you will be returned to the “ASM Disk Groups” windows.

Click the Create New button again. For the second “Disk Group Name”, I used the string “ARCH”. Select the last RAW partition (/dev/rdsk/c2t6d0s2) in the “Select Member Disks” window. Keep the “Redundancy” setting to “External”.

After verifying all values in this window are correct, click the [OK] button. This will present the “ASM Disk Group Creation” dialog.

When the ASM Disk Group Creation process is finished, you will be returned to the “ASM Disk Groups” window with two disk groups created and selected.

End of ASM Instance creation Click the Finish button to complete the ASM instance creation.

The Oracle ASM instance process should now be running on all nodes in the RAC cluster.

$ crs_stat –t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE soladb1
ora....B1.lsnr application ONLINE ONLINE soladb1
ora....db1.gsd application ONLINE ONLINE soladb1
ora....db1.ons application ONLINE ONLINE soladb1
ora....db1.vip application ONLINE ONLINE soladb1
ora....SM2.asm application ONLINE ONLINE soladb2
ora....B2.lsnr application ONLINE ONLINE soladb2
ora....db2.gsd application ONLINE ONLINE soladb2
ora....db2.ons application ONLINE ONLINE soladb2
ora....db2.vip application ONLINE ONLINE soladb2

The last step is to create Oracle 10g Database using dbca.

Wednesday, August 11, 2010

Migrating from UFS Root File System to a ZFS Root File System (Without Zones)

Okay, say I have system with 2 disks.
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@1f,0/pci@1/scsi@8/sd@0,0
1. c1t1d0
/pci@1f,0/pci@1/scsi@8/sd@1,0

Disk 0 is formatted with UFS and is boot disk. I want to migrate UFS root file system to ZFS one (zpool will be on Disk 1).

ZFS root environment can be only created on pool consisting of slices (not whole disk).

So I partition disk 1 like below. This means that disk label must be SMI, not EFI.
partition> p
Current partition table (original):
Total disk cylinders available: 9770 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 1 - 9229 16.00GB (9229/0/0) 33556644
1 unassigned wu 0 0 (0/0/0) 0
2 backup wm 0 - 9769 16.94GB (9770/0/0) 35523720
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0

Then I create zpool named pool-0
# zpool create pool-0 c1t1d0s0

# zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool-0 106K 15.6G 18K /pool-0

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
pool-0 15.9G 111K 15.9G 0% ONLINE -

The current boot environment (BE) is ufsBE (I named it like this) and the new one will be created (using –n).
Obviously, the zpool has to be crated earlier (-p supports creation of new BE on ZFS).

# lucreate -c ufsBE -n zfsBE -p pool-0

Analyzing system configuration.
No name for current boot environment.
Current boot environment is named .
Creating initial configuration for primary boot environment .
The device is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name PBE Boot Device .
Comparing source boot environment file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
The device is not a root device for any boot environment; cannot get BE ID.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Creating file systems on boot environment .
Creating file system for in zone on .
Populating file systems on boot environment .
Checking selection integrity.
Integrity check OK.
Populating contents of mount point .
Copying.
Creating shared file system mount points.
Creating compare databases for boot environment .
Creating compare database for file system .
Creating compare database for file system .
Creating compare database for file system .
Updating compare databases on boot environment .
Making boot environment bootable.
Creating boot_archive for /.alt.tmp.b-0fc.mnt
updating /.alt.tmp.b-0fc.mnt/platform/sun4u/boot_archive
15+0 records in
15+0 records out
Population of boot environment successful.
Creation of boot environment successful.

Now see status of BE-s.
# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
ufsBE yes yes yes no -
zfsBE yes no no yes -

Check new ZFS file systems that have been created.
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool-0 4.06G 11.6G 92.5K /pool-0
pool-0/ROOT 1.56G 11.6G 18K /pool-0/ROOT
pool-0/ROOT/zfsBE 1.56G 11.6G 1.56G /
pool-0/dump 512M 12.1G 16K -
pool-0/swap 2.00G 13.6G 16K -

Let’s now activate newly created ZFS BE.
# luactivate zfsBE
A Live Upgrade Sync operation will be performed on startup of boot environment .

/usr/sbin/luactivate: /etc/lu/DelayUpdate/: cannot create

Okay, this is known issue. Fix follows.

For tcsh shell: setup environmental variable.
# setenv BOOT_MENU_FILE menu.lst

Try again:
# luactivate zfsBE

**********************************************************************
The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.
**********************************************************************
In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:

1. Enter the PROM monitor (ok prompt).
2. Change the boot device back to the original boot environment by typing:

setenv boot-device /pci@1f,0/pci@1/scsi@8/disk@0,0:a

3. Boot to the original boot environment by typing:
boot
**********************************************************************
Modifying boot archive service
Activation of boot environment successful.

Reboot (but read previous message to know what command to use)

# init 6

See console during boot ...
Sun Fire V120 (UltraSPARC-IIe 648MHz), No Keyboard
OpenBoot 4.0, 1024 MB memory installed, Serial #53828024.
Ethernet address 0:3:ba:35:59:b8, Host ID: 833559b8.
Executing last command: boot
Boot device: /pci@1f,0/pci@1/scsi@8/disk@1,0:a File and args:
SunOS Release 5.10 Version Generic_139555-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: counterstrike2
SUNW,eri0 : 100 Mbps full duplex link up
Configuring devices.
/dev/rdsk/c1t0d0s4 is clean
/dev/rdsk/c1t0d0s5 is clean
Reading ZFS config: done.
Mounting ZFS filesystems: (3/3)
NOTICE: setting nrnode to max value of 57843

New status of BEs follows.
# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
ufsBE yes no no yes -
zfsBE yes yes yes no -

Both UFS/ZFS file systems are visible. This is it!

# df -h -F zfs
Filesystem size used avail capacity Mounted on
pool-0/ROOT/zfsBE 16G 1.6G 12G 12% /
pool-0 16G 97K 12G 1% /pool-0
pool-0/ROOT 16G 18K 12G 1% /pool-0/ROOT
pool-0/.0 16G 50M 12G 1% /pool-0/.0
pool-0/backup 16G 18K 12G 1% /pool-0/backup

# df -h -F ufs
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t0d0s4 2.0G 130M 1.8G 7% /.0
/dev/dsk/c1t0d0s5 4.6G 1.6G 3.0G 35% /backup

Back to the main page

Mounting DVD ISO image

Mounting DVD ISO image
The lofiadm administers lofi, the loopback file driver.
lofi allows a file to be associated with a block device.
That file can then be accessed through the block device.
This is useful when the file contains an image of some filesystem (DVD image), because the block device can then be used with the normal system utilities for mounting, checking or repairing filesystems.

Example

Downlaod sol-10-u6-ga1-sparc-dvd.iso to /tmp
lofiadm -a /tmp/sol-10-u6-ga1-sparc-dvd.iso /dev/lofi/1 (assign device to file)
mount -F hsfs -o ro /dev/lofi/1 /mnt (mount device to /mnt)
cd /mnt/Solaris_10/Tools/ (go to desired dir)
./setup_install_server /export/jumpstart5.10u6 (do what you need to do, say install jumpstart server)
Back to the main page

Sunday, June 20, 2010

Mount iso file in Solaris

root@pracdb01 # lofiadm -a /u01/app/StorageTek_QFS_4[1].6.iso
/dev/lofi/1
root@pracdb01 #
root@pracdb01 #
root@pracdb01 #

root@pracdb01 # mount -F hsfs /dev/lofi/1 /mnt
root@pracdb01 #
root@pracdb01 # cd /mnt
root@pracdb01 #
root@pracdb01 #
root@pracdb01 # ls -l
total 20
drwxr-xr-x 6 root other 2048 Mar 20 2007 linux1
drwxr-xr-x 5 root other 2048 Mar 19 2007 linux2
drwxr-xr-x 5 root other 2048 Mar 19 2007 sparc
drwxr-xr-x 5 root other 2048 Mar 19 2007 worm
drwxr-xr-x 4 root other 2048 Mar 19 2007 x64

Brocade Switch Zone Configuration

Bkp_Bro1:admin> alicreate "PRACDB01_S1P21", "21:00:00:1b:32:1c:3c:0c"
Bkp_Bro1:admin>
Bkp_Bro1:admin> zonecreate "PRACDB01_s1_9990_1E", "PRACDB01_S1P21; SE9990_1E"

Bkp_Bro1:admin> cfgadd "PROD_TAPE_ZONE1", "PRACDB01_s1_9990_1E"
Bkp_Bro1:admin> cfgsave
You are about to save the Defined zoning configuration. This
action will only save the changes on Defined configuration.
Any changes made on the Effective configuration will not
take effect until it is re-enabled.
Do you want to save Defined zoning configuration only? (yes, y, no, n): [no] y
Updating flash ...

Bkp_Bro1:admin> cfgenable "PROD_TAPE_ZONE1"
You are about to enable a new zoning configuration.
This action will replace the old zoning configuration with the
current configuration selected.
Do you want to enable 'PROD_TAPE_ZONE1' configuration (yes, y, no, n): [no] y
zone config "PROD_TAPE_ZONE1" is in effect
Updating flash ...

2nd Port
============
Bkp_Bro1:admin> alicreate "PRACDB02_S1P22", "21:00:00:1b:32:83:7c:71"

Bkp_Bro1:admin>
Bkp_Bro1:admin> zonecreate "PRACDB02_s1_9990_1E", "PRACDB02_S1P22; SE9990_1E"

Bkp_Bro1:admin> cfgadd "PROD_TAPE_ZONE1", "PRACDB02_s1_9990_1E"
Bkp_Bro1:admin> cfgsave
You are about to save the Defined zoning configuration. This
action will only save the changes on Defined configuration.
Any changes made on the Effective configuration will not
take effect until it is re-enabled.
Do you want to save Defined zoning configuration only? (yes, y, no, n): [no] y
Updating flash ...

Bkp_Bro1:admin> cfgenable "PROD_TAPE_ZONE1"
You are about to enable a new zoning configuration.
This action will replace the old zoning configuration with the
current configuration selected.
Do you want to enable 'PROD_TAPE_ZONE1' configuration (yes, y, no, n): [no] y
zone config "PROD_TAPE_ZONE1" is in effect
Updating flash ...

Switch 134
=============

Bkp_Bro2:admin> alicreate "PRACDB01_S2P21", "21:00:00:1b:32:83:a0:77"
Bkp_Bro2:admin> zonecreate "PRACDB01_S2_9990_2E", "PRACDB01_S2P21; SE9990_2E"
Bkp_Bro2:admin> cfgadd "PROD_TAPE_ZONE2", "PRACDB01_S2_9990_2E"
Bkp_Bro2:admin> cfgsave
You are about to save the Defined zoning configuration. This
action will only save the changes on Defined configuration.
Any changes made on the Effective configuration will not
take effect until it is re-enabled.
Do you want to save Defined zoning configuration only? (yes, y, no, n): [no] y
Updating flash ...
Bkp_Bro2:admin> cfgenable "PROD_TAPE_ZONE2"
You are about to enable a new zoning configuration.
This action will replace the old zoning configuration with the
current configuration selected.
Do you want to enable 'PROD_TAPE_ZONE2' configuration (yes, y, no, n): [no] y
zone config "PROD_TAPE_ZONE2" is in effect
Updating flash ...
Bkp_Bro2:admin>

2nd port
========

Bkp_Bro2:admin> alicreate "PRACDB02_S2P22", "21:00:00:1b:32:84:76:72"
Bkp_Bro2:admin> zonecreate "PRACDB02_S2_9990_2E", "PRACDB02_S2P22; SE9990_2E"
Bkp_Bro2:admin> cfgadd "PROD_TAPE_ZONE2", "PRACDB02_S2_9990_2E"
Bkp_Bro2:admin> cfgsave
You are about to save the Defined zoning configuration. This
action will only save the changes on Defined configuration.
Any changes made on the Effective configuration will not
take effect until it is re-enabled.
Do you want to save Defined zoning configuration only? (yes, y, no, n): [no] y
Updating flash ...
Bkp_Bro2:admin> cfgenable "PROD_TAPE_ZONE2"
You are about to enable a new zoning configuration.
This action will replace the old zoning configuration with the
current configuration selected.
Do you want to enable 'PROD_TAPE_ZONE2' configuration (yes, y, no, n): [no] y
zone config "PROD_TAPE_ZONE2" is in effect
Updating flash ...

procedure to remove the scsi reservation

Please follow below procedure to remove the scsi reservation and then again reconfiguring the storage devices.

# /usr/cluster/lib/sc/scsi -c disfailfast -d /dev/did/rdsk/d5s2
# /usr/cluster/lib/sc/scsi -c release -d /dev/did/rdsk/d5s2
# /usr/cluster/lib/sc/scsi -c scrub -d /dev/did/rdsk/d5s2

# /usr/cluster/lib/sc/scsi -c disfailfast -d /dev/did/rdsk/d6s2
# /usr/cluster/lib/sc/scsi -c release -d /dev/did/rdsk/d6s2
# /usr/cluster/lib/sc/scsi -c scrub -d /dev/did/rdsk/d6s2

# /usr/cluster/lib/sc/scsi -c disfailfast -d /dev/did/rdsk/d1s2
# /usr/cluster/lib/sc/scsi -c release -d /dev/did/rdsk/d1s2
# /usr/cluster/lib/sc/scsi -c scrub -d /dev/did/rdsk/d1s2

Check the reservation keys on the storage devices, there should not be any key :

# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d5s2

# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d6s2

# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d1s2

Run the
#scgdevs
command to reconfigure the storage devices
Check reservation keys :

# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d5s2
# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d6s2
# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d1s2

===========
The other things which I found is that one of the path to the storage is not accessible :

From node "pracdb01 "

cores@fs-cores-brm-sc3b $ more ../disks/*port*
/devices/pci@1,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
/devices/pci@1,700000/SUNW,qlc@0,1/fp@0,0:devctl NOT CONNECTED
/devices/pci@3,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
/devices/pci@3,700000/SUNW,qlc@0,1/fp@0,0:devctl NOT CONNECTED
cores@fs-cores-brm-sc3b $ cd ../etc/driver_

From Node "pracdb02 "

cores@fs-cores-brm-sc3b $ more *port*
/devices/pci@1,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
/devices/pci@1,700000/SUNW,qlc@0,1/fp@0,0:devctl NOT CONNECTED
/devices/pci@3,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
/devices/pci@3,700000/SUNW,qlc@0,1/fp@0,0:devctl NOT CONNECTED
cores@fs-cores-brm-sc3b $

Could you please ensure that the storage is accessible from the host from both the paths?

root@pracdb01 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d5s2
Reservation keys(3):
0x4a6ec47800000001
0x4a6ec47800000002
0x4a6ec47800000003
root@pracdb01 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d6s2
Reservation keys(3):
0x4a6ec47800000001
0x4a6ec47800000002
0x4a6ec47800000003
root@pracdb01 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d1s2
Reservation keys(2):
0x4a6ec47800000001
0x4a6ec47800000002
root@pracdb01 # rsh pracdb02
Last login: Thu Sep 10 17:45:45 from pracdb01
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
Sourcing //.profile-EIS.....
root@pracdb02 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d5s2
Reservation keys(3):
0x4a6ec47800000001
0x4a6ec47800000002
0x4a6ec47800000003
root@pracdb02 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d6s2
Reservation keys(3):
0x4a6ec47800000001
0x4a6ec47800000002
0x4a6ec47800000003
root@pracdb02 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d1s2
Reservation keys(2):
0x4a6ec47800000001
0x4a6ec47800000002
root@pracdb02 # rsh pracdb03
Last login: Fri Sep 11 11:28:13 from pracdb01
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
Sourcing //.profile-EIS.....
root@pracdb03 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d5s2
Reservation keys(3):
0x4a6ec47800000001
0x4a6ec47800000002
0x4a6ec47800000003
root@pracdb03 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d6s2
Reservation keys(3):
0x4a6ec47800000001
0x4a6ec47800000002
0x4a6ec47800000003
root@pracdb03 # /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d1s2
Reservation keys(2):
0x4a6ec47800000001
0x4a6ec47800000002
root@pracdb03 # /usr/cluster/lib/sc/scsi -c disfailfast -d /dev/did/rdsk/d5s2
do_enfailfast returned 0
root@pracdb03 #

Friday, June 18, 2010

Cluser RG not switching to PSBLD008- Action Plan

Tata Sky : Action Plan for 11212245
Contents:
1) Problem details :
2) Service impact :
3) Action plan.
Problem details:
Cluster RG not switching to PSBLD008.
Service Impact:
Siebel services outage for 1 hr.
Action Plan
Detach the root mirror in both servers.
PSBLD008:
# metadetach main mirror sub mirror
PSBLD007:
# metadetach main mirror sub mirror
On node PSBLD008:
# init 0
on node PSBLD007:
shutdown oracle server, listener & file system resource using sun cluster.
# metaset -s Siebel-DG -f -d -h PSBLD008
(this may take a minute or so to return)
on node PSBLD008:
ok> boot
(wait for not to fully join the cluster)
on node PSBLD007:
# metaset -s Siebel-DG -a -h PSBLD008
on node PSBLD008:
# metaset (should now list the metaset Siebel-DG)
test switchover with a:
# scswitch -z -g Siebel-RG -h PSBLD008

RCA for Siebel problem and Best practices recommendation

Tata Sky : RCA for Siebel problem and Best practices recommendation
Problem Summary : Users were unable to contact the Web server of Siebel application
for apx 10 minutes.
Problem details:
On 4th Jan 07 Webserver was unable to communicate to load balancer for
10 mins. No users were able to access the application. This problem
was resolved automatically. Following message were observed in
Webserver logs
[04/Jan/2007:16:00:39] failure (17568): HTTP3068: Error receiving
request from 10.1.19.44 (Connection refused)
Event Time Lines:
Event Date: 4th January 2007
· Event Time: 15:50 to 16:01
· Problem reported: 8th January 2007
· Domains services restored: Siebel
· Diagnosis/Analysis time:
· Reboot time:
· H/W replacement time:
Diagnosis summary :
1) Corresponding to problem time only relevant message available is “web server was not
able to reach load balancer.”
2) Users also did not reach beyond load balancer.” As per onsite team.
Since no other data is available to pin point the cause of issue with load balancer or
network or web server , approach was taken to analyze full setup and plan all best
practices to prevent from re-occurrence.
Analysis:
Summary:
There were 4 cases logged related to Siebel setup problem.
10965815 Siebel application server restarted as user sessions were hung
10966494 Web server ping packet drop
10968351 Siebel db performance problems
10979783 Webserver unable to communicate with the application server
These are the highlights of the Analysis:
web server error message indicates that client opened the connection and it is
Tata Sky : RCA for Siebel problem and Best practices recommendation
closed from the client side before the webserver managed to read any data from that
connection. For Web server, Cisco Load balancer (Logical IP) is the immediate client. It
is possible that the connections between the load balancer and the client are also
disconnected.
There is no packet drop in the network between web server and Cisco load balancer while
the testing was carried out after the problem was observed. However there is a message
coming from the load balancer. Please check with Cisco for the message.
Workaround if any:
Suggested Fix and recommendations:
1) Implement the best practices
2) Collect most of the logical data at problem time.
3) Enable debug options in application and network level.
4) Implement the NFS option planned for image files store agreed in in phase 1b
architecture layout
Following are the best practice recommendation :
Web server:
1.From the given magnus.conf file of the webserver, KeepAliveTimeout is set to 1200
seconds (2 Hrs). Default value is 30 seconds. In the multi-tier architecture, it is best to set
the KeepAliveTimeout as zero.
2.Please modify following entry from /etc/system
Remove the following entry
set segkmem_lpsize=0x400000
add the following entry
set pcie:pcie_aer_ce_mask=0x1
3.Install the latest level EIS CD Patches.
4.Transition to e1000g
Convert from ipge to e1000g by installing patch 123334-02 and running
the script provided
5. Please refer to the following guide which provides the guidance for
the performance and tuning.
http://docs.sun.com/app/docs/doc/817-6249
Tata Sky : RCA for Siebel problem and Best practices recommendation
Action when case of web server is unresponsive:
Please confirm the webserver hang or unresponsive by accessing the static pages or telnet
to the system for http port. If both of them results in time out, we can confirm that
webserver is hung.
Identity the webserver child process as follows:-
1. ps -ef | grep webservd | grep
Highest number on the pid is the chile pid.
With this PID, we need to collect the following details:-
1. Open the terminal1, run prstat against the pid as given below.
# prstat -L -c -p -o prstat.hung
Run this command for 3 minutes, terminate it using +c keystrokes.
2. Meanwhile, open another terminal2, issue kill -3 command successively for 3 times
with the interval of a minute.
This will create the java thread dump in the errors log file.
3. In the terminal 2, run pstack, pmap, pldd and pfiles against pid.
# pstack pid > pstack.hung
# pmap pid > pmap.hung
# pldd pid > pldd.hung
# pfiles pid > pfiles.hung
4. In the terminal2, run gcore for generating the core file.
# gcore pid-- This will create the core file as core.pid in the present working directory.
5. Run the pkgcore script for collecting the binaries & libraries for root cause analysis.
#pkgcore.sh {case id} {core.pid} {pid}
This will create the packages such as caseid_corefiles.tar.gz & caseid_libraries.tar.gz
6. netstat -na > netstat.hung
Tata Sky : RCA for Siebel problem and Best practices recommendation
Application Server PSBLA001
1.Please modify following entry from /etc/system
Remove the following entry
set segkmem_lpsize=0x400000
set ip:ip_squeue_bind = 0
set ip:ip_squeue_fanout = 1
set ipge:ipge_tx_syncq=1
set ipge:ipge_bcopy_thresh = 512
set ipge:ipge_dvma_thresh = 1
set consistent_coloring=2
Add the following setting
set pcie:pcie_aer_ce_mask=0x1
2.Install the latest level EIS CD Patches.
3. Transition to e1000g
Convert from ipge to e1000g by installing patch 123334-02 and running
the script provided
Action When Application server is hung:
1.PID of the application process
2. truss -o truss.out -ealfd -vall -p "pid of the application"
3. pstack "pid of the application" ==> get it 3 times.
4. snoop -o snoop.out -d
5. ndd /dev/tcp tcp_listen_hash
6. Savecore -L
7. guds output
8. prstat -mvL -n 10 1 600
9. iostat -xnz 1 600
10. mpstat 1 600
11. vmstat 1 600
12.lockstat -C -s 50 sleep 30 lockstat -H -s 50 sleep 30
13.lockstat -kIW -s 50 -i 971 sleep 30
Siebel Database server:
1. Modify the following entries From /etc/system
Remove
set ce_reclaim_pending=1
exclude: lofs
Tata Sky : RCA for Siebel problem and Best practices recommendation
add
set ce:ce_bcopy_threash=97
set ce:ce_dvma_thresh=96
set ce:ce_ring_size=8192
set ce:ce_comp_ring_size=8192
set ce:ce_tx_ring_size=8192
set sq_max_size=100
2.Please note that Dumpdevice : /dev/dsk/c0t0d0s1 is a Submirror of
Swap-Metadevice /dev/md/dsk/d101
Change dumpdevice to Swapmirror : /dev/md/dsk/d101 with: "dumpadm -d
swap"
3. Install the latest level EIS CD Patches which includes cluster
patches.
Action Plan:
Team:
SSE: Vinod SAM: Rajesh
Onsite Team: Prashant Customer engineer/Sysadmin:Shams Khan

PKEND021 Secondary Disk Failure Action Plan

Hi Shams, CASEID:: 11269883

Thank you for the file.

root@PKEND021 # metadb
flags first blk block count
a m p lu 16 8192 /dev/dsk/c0t0d0s7
a p l 8208 8192 /dev/dsk/c0t0d0s7
a p l 16400 8192 /dev/dsk/c0t0d0s7
M p 16 unknown /dev/dsk/c1t0d0s7
M p 8208 unknown /dev/dsk/c1t0d0s7
M p 16400 unknown /dev/dsk/c1t0d0s7

From the output, we can see all the state replica on c1t0d0 are bad.

Use the metadb command to delete them. For example:

# metadb -d c1t0d0s7

Once that is deleted, the next steps is to replaced the disk.

Part II: Replacing failed boot device

1. Gracefully power-down the system with this command:

# init 5

2. Physically replace the failed boot device.

Part III: Repairing state replica database

1. If you use Solaris[TM] Volume Manager on Solaris[TM] 9 or later,
update the state database with the device ID for the new disk using
metadevadm -u c#t#d# .

# metadevadm -u c1t0d0s7

2. Once new boot disk is repartitioned, add new working state replicas
back into the newly replaced disk drive. For example:

# metadb -a -c 3 c1t0d0s7

(The -c #; specifies how many replicas to put into the specified partition)

Part IV: Resyncing the sub-mirrors

1. Run metstat to find all the metadevices that the failed boot device
belongs to. For example:

d0: Mirror
Submirror 0: d1
State: Needs maintenance
Submirror 1: d2
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 205200 blocks

d1: Submirror of d0
State: Needs maintenance
Size: 205200 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s0 0 No Okay

d2: Submirror of d0
State: Okay
Size: 205200 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t2d0s0 0 No Okay

2. Use the metareplace command to re-enable the sub-mirror. For example:

# metareplace -e d0 c0t0d0s0

(Resync operation may take about 15-20 minutes per every gigabyte of
filesystem)

3. Repeat metareplace command to re-enable the other sub-mirrors located
on the same disk:

# metareplace -e d c0t0d0smaintenance>

4. Reboot system to have it boot from the newly repaired boot device:

Before rebooting, wait for the resync : all metadevices must be in
'Okay' state, then :

# init 6

I will proceed to order the 146 GB disk on D240 storage.

E20K adding Board from PKEND021 to PSBLD008

bash-2.05$ showplatform -p domains

Domain configurations:
======================
Domain ID Domain Tag Solaris Nodename Domain Status
A - - Solaris Halted, in OBP
B - PKENA019 Running Solaris
C - PSBLD008 Running Solaris
D - - Powered Off
E - PEAID015 Running Solaris
F - - Powered Off
G - - Powered Off
H - - Powered Off
I - - Powered Off
J - - Powered Off
K - - Powered Off
L - - Powered Off
M - - Powered Off
N - - Powered Off
O - - Powered Off
P - - Powered Off
Q - - Powered Off
R - - Powered Off

bash-2.05$ showboards
Retrieving board information. Please wait.
.......
Location Pwr Type of Board Board Status Test Status Domain
-------- --- ------------- ------------ ----------- ------
SB0 - Empty Slot Assigned - A
SB1 On V3CPU Active Passed A
SB2 On V3CPU Active Passed B
SB3 On V3CPU Active Passed C
SB4 On V3CPU Active Passed C
SB5 - Empty Slot Assigned - C
SB6 On V3CPU Active Passed C
SB7 - Empty Slot Available - Isolated
SB8 On V3CPU Active Passed E
SB9 - Empty Slot Available - Isolated
SB10 - Empty Slot Available - Isolated
SB11 - Empty Slot Available - Isolated
SB12 - Empty Slot Available - Isolated
SB13 - Empty Slot Available - Isolated
SB14 - Empty Slot Available - Isolated
SB15 - Empty Slot Available - Isolated
SB16 - Empty Slot Available - Isolated
SB17 - Empty Slot Available - Isolated
IO0 On HPCI+ Active Passed A
IO1 On HPCI+ Active Passed A
IO2 On HPCI+ Active Passed B
IO3 On HPCI+ Active Passed B
IO4 On HPCI+ Active Passed C
IO5 On HPCI+ Active Passed C
IO6 Off HPCI+ Assigned Unknown D
IO7 Off HPCI+ Assigned Unknown D
IO8 On HPCI+ Active Passed E
IO9 - Empty Slot Available - Isolated
IO10 - Empty Slot Available - Isolated
IO11 - Empty Slot Available - Isolated
IO12 - Empty Slot Available - Isolated
IO13 - Empty Slot Available - Isolated
IO14 - Empty Slot Available - Isolated
IO15 - Empty Slot Available - Isolated
IO16 - Empty Slot Available - Isolated
IO17 - Empty Slot Available - Isolated

bash-2.05$ setkeyswitch -d A off
Current virtual key switch position is "ON".
Are you sure you want to change to the "OFF" position (yes/no)? yes
Domain is down.
Waiting on exclusive access to EXB(s): 3FFFF.
Component not present: SB0
Powering off: HPCI+ at IO0
Powering off: EXB at EX0
Powering off: V3CPU at SB1
Powering off: HPCI+ at IO1
Powering off: EXB at EX1
bash-2.05$

Domain configurations:
======================
Domain ID Domain Tag Solaris Nodename Domain Status
A - - Powered Off
B - PKENA019 Running Solaris
C - PSBLD008 Running Solaris
D - - Powered Off
E - PEAID015 Running Solaris
F - - Powered Off
G - - Powered Off
H - - Powered Off
I - - Powered Off
J - - Powered Off
K - - Powered Off
L - - Powered Off
M - - Powered Off
N - - Powered Off
O - - Powered Off
P - - Powered Off
Q - - Powered Off
R - - Powered Off

bash-2.05$ addboard -d C SB1
assign SB1
.
assign SB1 done
poweron SB1
.............
poweron SB1 done
test SB1 ........... test SB1 done
connect SB1 ........ connect SB1 done
configure SB1
.....
configure SB1 done
.
notify online SUNW_cpu/cpu32
notify online SUNW_cpu/cpu36
notify online SUNW_cpu/cpu33
notify online SUNW_cpu/cpu37
notify online SUNW_cpu/cpu34
notify online SUNW_cpu/cpu38
notify online SUNW_cpu/cpu35
notify online SUNW_cpu/cpu39
..
notify add capacity (8 cpus)
notify add capacity (2097152 pages)
notify add capacity SB1 done

Domain A Board SB0 and SB1 is active SB1 is Main Board

root@PKEND021 # cfgadm -alv |grep permanent
SB1::memory connected configured ok base address 0x2000000000, 16777216 KBytes total, 1996640 KBytes permanent
root@PKEND021 #

From PKEND021 Host
# cfgadm -c unconfigure SB0
# cfgadm -c disconnect SB0

Login to SC 10.1.18.122

root@PBAKB034 # rsh 10.1.18.122
Password:
Last login: Mon Jun 9 12:29:46 from 10.1.18.85
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
Sourcing //.profile-EIS.....
root@T-Sky-20K-2-sc1 # su - sms-svc
T-Sky-20K-2-sc1:sms-svc:1> bash
bash-2.05$ showplatform -p domains

Domain configurations:
======================
Domain ID Domain Tag Solaris Nodename Domain Status
A - PKEND021 Running Solaris
B - PKENA019 Running Solaris
C - PSBLD008 Running Solaris
D - PSAPA013 Running Solaris
E - PEAID015 Running Solaris

# deleteboard SB0

# addboard -d C SB0

Check the status of the board

# showboards -d C

Removing Board from PSBLD007

root@PSBLD007 #cfgadm -alv |grep -i perm
SB5::memory connected configured ok base address 0x1c000000000, 16777216 KBytes total, 3518488 KBytes permanent

root@PSBLD007 # cfgadm -c unconfigure SB4
root@PSBLD007 # cfgadm -c disconnect SB4

Login to SC 10.1.18.112
su - sms-svc
showplatform -p doamins

deleteboard SB4

Activity done on 31st July 2008

root@PSBLD007 # cfgadm -al |grep SB4
SB4 V3CPU connected configured ok
SB4::cpu0 cpu connected configured ok
SB4::cpu1 cpu connected configured ok
SB4::cpu2 cpu connected configured ok
SB4::cpu3 cpu connected configured ok
SB4::memory memory connected configured ok
root@PSBLD007 # Jul 31 14:08:14 PSBLD007 login: ROOT LOGIN /dev/pts/1 FROM PBAKB034

root@PSBLD007 # cfgadm -c unconfigure SB4
Jul 31 14:11:16 PSBLD007 dr: OS unconfigure dr@0:SB4::cpu0
Jul 31 14:11:28 PSBLD007 dr: OS unconfigure dr@0:SB4::cpu1
Jul 31 14:11:49 PSBLD007 dr: OS unconfigure dr@0:SB4::cpu2
Jul 31 14:12:00 PSBLD007 dr: OS unconfigure dr@0:SB4::cpu3
Jul 31 14:12:21 PSBLD007 dr: OS unconfigure dr@0:SB4::memory
you have mail

root@PSBLD007 # cfgadm -al
Ap_Id Type Receptacle Occupant Condition
IO4 HPCI+ connected configured ok
IO4::pci0 io connected configured ok
IO4::pci1 io connected configured ok
IO4::pci2 io connected configured ok
IO4::pci3 io connected configured ok
IO5 HPCI+ connected configured ok
IO5::pci0 io connected configured ok
IO5::pci1 io connected configured ok
IO5::pci2 io connected configured ok
IO5::pci3 io connected configured ok
SB4 V3CPU connected unconfigured ok
SB4::cpu0 cpu connected unconfigured ok
SB4::cpu1 cpu connected unconfigured ok
SB4::cpu2 cpu connected unconfigured ok
SB4::cpu3 cpu connected unconfigured ok
SB4::memory memory connected unconfigured ok

root@PSBLD007 # cfgadm -c disconnect SB4

root@PSBLD007 # cfgadm -al | more
Ap_Id Type Receptacle Occupant Condition
IO4 HPCI+ connected configured ok
IO4::pci0 io connected configured ok
IO4::pci1 io connected configured ok
IO4::pci2 io connected configured ok
IO4::pci3 io connected configured ok
IO5 HPCI+ connected configured ok
IO5::pci0 io connected configured ok
IO5::pci1 io connected configured ok
IO5::pci2 io connected configured ok
IO5::pci3 io connected configured ok
SB4 V3CPU disconnected unconfigured unknown
SB5 V3CPU connected configured ok

bash-2.05$ deleteboard sb4
SB4 successfully unassigned.

Jul 31 14:11:16 PSBLD007 unix: [ID 177789 kern.info] kphysm_delete: mem = 50331648K (0xc00000000)
Jul 31 14:11:16 PSBLD007 unix: [ID 585997 kern.info] kphysm_delete: avail mem = 47505080320
Jul 31 14:11:16 PSBLD007 dr: [ID 427603 kern.notice] OS unconfigure dr@0:SB4::cpu0
Jul 31 14:11:28 PSBLD007 dr: [ID 427603 kern.notice] OS unconfigure dr@0:SB4::cpu1
Jul 31 14:11:49 PSBLD007 dr: [ID 427603 kern.notice] OS unconfigure dr@0:SB4::cpu2
Jul 31 14:12:00 PSBLD007 dr: [ID 427603 kern.notice] OS unconfigure dr@0:SB4::cpu3
Jul 31 14:12:21 PSBLD007 dr: [ID 427603 kern.notice] OS unconfigure dr@0:SB4::memory
Jul 31 14:14:32 PSBLD007 genunix: [ID 408114 kern.info] /memory-controller@80,400000 (mc-us30) offline
Jul 31 14:14:32 PSBLD007 genunix: [ID 408114 kern.info] /memory-controller@81,400000 (mc-us31) offline
Jul 31 14:14:32 PSBLD007 genunix: [ID 408114 kern.info] /memory-controller@82,400000 (mc-us32) offline
Jul 31 14:14:32 PSBLD007 genunix: [ID 408114 kern.info] /memory-controller@83,400000 (mc-us33) offline
Jul 31 14:16:59 PSBLD007 genunix: [ID 408114 kern.info] /address-extender-queue@9e,0 (axq0) offline

PSBLD008 to PKEND021 Board Movement SB0

root@PSBLD008 # cfgadm -alv |grep permanent
SB4::memory connected configured ok base address 0x20000000000, 16777216 KBytes total, 5124096 KBytes permanent
root@PSBLD008 # cfgadm -al | grep SB
SB0 V3CPU connected configured ok
SB0::cpu0 cpu connected configured ok
SB0::cpu1 cpu connected configured ok
SB0::cpu2 cpu connected configured ok
SB0::cpu3 cpu connected configured ok
SB0::memory memory connected configured ok
SB3 V3CPU connected configured ok
SB3::cpu0 cpu connected configured ok
SB3::cpu1 cpu connected configured ok
SB3::cpu2 cpu connected configured ok
SB3::cpu3 cpu connected configured ok
SB3::memory memory connected configured ok
SB4 V3CPU connected configured ok
SB4::cpu0 cpu connected configured ok
SB4::cpu1 cpu connected configured ok
SB4::cpu2 cpu connected configured ok
SB4::cpu3 cpu connected configured ok
SB4::memory memory connected configured ok
SB5 V3CPU connected configured ok
SB5::cpu0 cpu connected configured ok
SB5::cpu1 cpu connected configured ok
SB5::cpu2 cpu connected configured ok
SB5::cpu3 cpu connected configured ok
SB5::memory memory connected configured ok

#cfgadm -c unconfigure SB0---------4:12pm--4:17pm
#cfgadm -c disconnect SB0----------4:18pm--4:19pm

telnet 10.1.18.122
root@T-Sky-20K-2-sc1 # su - sms-svc
T-Sky-20K-2-sc1:sms-svc:1> bash
bash-2.05$ showplatform -p domains

Domain configurations:
======================
Domain ID Domain Tag Solaris Nodename Domain Status
A - PKEND021 Running Solaris
B - PKENA019 Running Solaris
C - PSBLD008 Running Solaris
D - PSAPA013 Running Solaris
E - PEAID015 Running Solaris

# deleteboard SB0---------------4sec

# addboard -d A SB0-------------4:21pm--4:30pm

Check the status of the board

# showboards -d A

After Adding the boards

1. check the Domains with showboards -d A

bash-2.05$ showboards -d A
Retrieving board information. Please wait.
......
Location Pwr Type of Board Board Status Test Status Domain
-------- --- ------------- ------------ ----------- ------
SB0 On V3CPU Active Passed A
SB1 On V3CPU Active Passed A
SB2 On V3CPU Active Passed A
SB7 - Empty Slot Available - Isolated
SB9 - Empty Slot Available - Isolated
SB10 - Empty Slot Available - Isolated
SB11 - Empty Slot Available - Isolated
SB12 - Empty Slot Available - Isolated
SB13 - Empty Slot Available - Isolated
SB14 - Empty Slot Available - Isolated
SB15 - Empty Slot Available - Isolated
SB16 - Empty Slot Available - Isolated
SB17 - Empty Slot Available - Isolated
IO0 On HPCI+ Active Passed A
IO1 On HPCI+ Active Passed A
IO9 - Empty Slot Available - Isolated
IO10 - Empty Slot Available - Isolated
IO11 - Empty Slot Available - Isolated
IO12 - Empty Slot Available - Isolated
IO13 - Empty Slot Available - Isolated
IO14 - Empty Slot Available - Isolated
IO15 - Empty Slot Available - Isolated
IO16 - Empty Slot Available - Isolated
IO17 - Empty Slot Available - Isolated

2. Login to Domain A and check the permanent board

root@PKEND021 # cfgadm -alv |grep permanent
SB1::memory connected configured ok base address 0x2000000000, 16777216 KBytes total, 186763
2 KBytes permanent

3. Check the cfgadm output

root@PKEND021 # cfgdm -al | more
bash: cfgdm: command not found
root@PKEND021 # cfgadm -al | more
Ap_Id Type Receptacle Occupant Condition
IO0 HPCI+ connected configured ok
IO0::pci0 io connected configured ok
IO0::pci1 io connected configured ok
IO0::pci2 io connected configured ok
IO0::pci3 io connected configured ok
IO1 HPCI+ connected configured ok
IO1::pci0 io connected configured ok
IO1::pci1 io connected configured ok
IO1::pci2 io connected configured ok
IO1::pci3 io connected configured ok
SB0 V3CPU connected configured ok
SB0::cpu0 cpu connected configured ok
SB0::cpu1 cpu connected configured ok
SB0::cpu2 cpu connected configured ok
SB0::cpu3 cpu connected configured ok
SB0::memory memory connected configured ok
SB1 V3CPU connected configured ok
SB1::cpu0 cpu connected configured ok
SB1::cpu1 cpu connected configured ok
SB1::cpu2 cpu connected configured ok
SB1::cpu3 cpu connected configured ok
SB1::memory memory connected configured ok
SB2 V3CPU connected configured ok
SB2::cpu0 cpu connected configured ok
SB2::cpu1 cpu connected configured ok
SB2::cpu2 cpu connected configured ok
SB2::cpu3 cpu connected configured ok
SB2::memory memory connected configured ok
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t4d0 CD-ROM connected configured unknown
c0::es/ses0 processor connected configured unknown
c0::es/ses1 processor connected configured unknown
c0::rmt/0 tape connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t4d0 CD-ROM connected configured unknown
c1::dsk/c1t6d0 disk connected configured unknown
c1::es/ses2 processor connected configured unknown
c1::es/ses3 processor connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 scsi-bus connected unconfigured unknown
c4 fc-fabric connected configured unknown
c4::50060e80042d0a20 disk connected configured unknown
c5 fc connected unconfigured unknown
c6 fc-fabric connected configured unknown
c6::50060e80042d0a30 disk connected configured unknown
c7 fc connected unconfigured unknown
pci_pci0:e00b1slot1 unknown connected unconfigured unknown
pci_pci5:e01b1slot1 unknown connected unconfigured unknown
pcisch1:e00b1slot0 pci-pci/hp connected configured ok
pcisch2:e00b1slot3 mult/hp connected configured ok
pcisch3:e00b1slot2 pci-pci/hp connected configured ok
pcisch5:e01b1slot0 pci-pci/hp connected configured ok
pcisch6:e01b1slot3 mult/hp connected configured ok
pcisch7:e01b1slot2 pci-pci/hp connected configured ok
root@PKEND021 #

To Remove SB2 Board

#cfgadm -c unconfigure SB2
#

#Check the status of SB2 board by executing a command

root@PKEND021 # cfgadm -al |more
Ap_Id Type Receptacle Occupant Condition
IO0 HPCI+ connected configured ok
IO0::pci0 io connected configured ok
IO0::pci1 io connected configured ok
IO0::pci2 io connected configured ok
IO0::pci3 io connected configured ok
IO1 HPCI+ connected configured ok
IO1::pci0 io connected configured ok
IO1::pci1 io connected configured ok
IO1::pci2 io connected configured ok
IO1::pci3 io connected configured ok
SB0 V3CPU connected configured ok
SB0::cpu0 cpu connected configured ok
SB0::cpu1 cpu connected configured ok
SB0::cpu2 cpu connected configured ok
SB0::cpu3 cpu connected configured ok
SB0::memory memory connected configured ok
SB1 V3CPU connected configured ok
SB1::cpu0 cpu connected configured ok
SB1::cpu1 cpu connected configured ok
SB1::cpu2 cpu connected configured ok
SB1::cpu3 cpu connected configured ok
SB1::memory memory connected configured ok
SB2 V3CPU connected unconfigured ok
SB2::cpu0 cpu connected unconfigured ok
SB2::cpu1 cpu connected unconfigured ok
SB2::cpu2 cpu connected unconfigured ok
SB2::cpu3 cpu connected unconfigured ok
SB2::memory memory connected unconfigured ok
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t4d0 CD-ROM connected configured unknown
c0::es/ses0 processor connected configured unknown
c0::es/ses1 processor connected configured unknown
c0::rmt/0 tape connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t4d0 CD-ROM connected configured unknown
c1::dsk/c1t6d0 disk connected configured unknown
c1::es/ses2 processor connected configured unknown
c1::es/ses3 processor connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 scsi-bus connected unconfigured unknown
c4 fc-fabric connected configured unknown
c4::50060e80042d0a20 disk connected configured unknown
c5 fc connected unconfigured unknown
c6 fc-fabric connected configured unknown
c6::50060e80042d0a30 disk connected configured unknown
c7 fc connected unconfigured unknown
pci_pci0:e00b1slot1 unknown connected unconfigured unknown
pci_pci5:e01b1slot1 unknown connected unconfigured unknown
pcisch1:e00b1slot0 pci-pci/hp connected configured ok
pcisch2:e00b1slot3 mult/hp connected configured ok
pcisch3:e00b1slot2 pci-pci/hp connected configured ok
pcisch5:e01b1slot0 pci-pci/hp connected configured ok
pcisch6:e01b1slot3 mult/hp connected configured ok
pcisch7:e01b1slot2 pci-pci/hp connected configured ok
root@PKEND021 #

After unconfiguring the Board disconnect the board from Host by executing a command

root@PKEND021 # cfgadm -c disconnect SB2

Again check the status by executing a command

root@PKEND021 # cfgadm -al | more
Ap_Id Type Receptacle Occupant Condition
IO0 HPCI+ connected configured ok
IO0::pci0 io connected configured ok
IO0::pci1 io connected configured ok
IO0::pci2 io connected configured ok
IO0::pci3 io connected configured ok
IO1 HPCI+ connected configured ok
IO1::pci0 io connected configured ok
IO1::pci1 io connected configured ok
IO1::pci2 io connected configured ok
IO1::pci3 io connected configured ok
SB0 V3CPU connected configured ok
SB0::cpu0 cpu connected configured ok
SB0::cpu1 cpu connected configured ok
SB0::cpu2 cpu connected configured ok
SB0::cpu3 cpu connected configured ok
SB0::memory memory connected configured ok
SB1 V3CPU connected configured ok
SB1::cpu0 cpu connected configured ok
SB1::cpu1 cpu connected configured ok
SB1::cpu2 cpu connected configured ok
SB1::cpu3 cpu connected configured ok
SB1::memory memory connected configured ok
SB2 V3CPU disconnected unconfigured unknown
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown
c0::dsk/c0t4d0 CD-ROM connected configured unknown
c0::es/ses0 processor connected configured unknown
c0::es/ses1 processor connected configured unknown
c0::rmt/0 tape connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t4d0 CD-ROM connected configured unknown
c1::dsk/c1t6d0 disk connected configured unknown
c1::es/ses2 processor connected configured unknown
c1::es/ses3 processor connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 scsi-bus connected unconfigured unknown
c4 fc-fabric connected configured unknown
c4::50060e80042d0a20 disk connected configured unknown
c5 fc connected unconfigured unknown
c6 fc-fabric connected configured unknown
c6::50060e80042d0a30 disk connected configured unknown
c7 fc connected unconfigured unknown
pci_pci0:e00b1slot1 unknown connected unconfigured unknown
pci_pci5:e01b1slot1 unknown connected unconfigured unknown
pcisch1:e00b1slot0 pci-pci/hp connected configured ok
pcisch2:e00b1slot3 mult/hp connected configured ok
pcisch3:e00b1slot2 pci-pci/hp connected configured ok
pcisch5:e01b1slot0 pci-pci/hp connected configured ok
pcisch6:e01b1slot3 mult/hp connected configured ok
pcisch7:e01b1slot2 pci-pci/hp connected configured ok

Then Login to System Controller and check showboards -d A

bash-2.05$ showboards -d A
Retrieving board information. Please wait.
......
Location Pwr Type of Board Board Status Test Status Domain
-------- --- ------------- ------------ ----------- ------
SB0 On V3CPU Active Passed A
SB1 On V3CPU Active Passed A
SB2 Off V3CPU Assigned Unknown A
SB7 - Empty Slot Available - Isolated
SB9 - Empty Slot Available - Isolated
SB10 - Empty Slot Available - Isolated
SB11 - Empty Slot Available - Isolated
SB12 - Empty Slot Available - Isolated
SB13 - Empty Slot Available - Isolated
SB14 - Empty Slot Available - Isolated
SB15 - Empty Slot Available - Isolated
SB16 - Empty Slot Available - Isolated
SB17 - Empty Slot Available - Isolated
IO0 On HPCI+ Active Passed A
IO1 On HPCI+ Active Passed A
IO9 - Empty Slot Available - Isolated
IO10 - Empty Slot Available - Isolated
IO11 - Empty Slot Available - Isolated
IO12 - Empty Slot Available - Isolated
IO13 - Empty Slot Available - Isolated
IO14 - Empty Slot Available - Isolated
IO15 - Empty Slot Available - Isolated
IO16 - Empty Slot Available - Isolated
IO17 - Empty Slot Available - Isolated

check for all the domains.

bash-2.05$ showboards
Retrieving board information. Please wait.
.........
Location Pwr Type of Board Board Status Test Status Domain
-------- --- ------------- ------------ ----------- ------
SB0 On V3CPU Active Passed A
SB1 On V3CPU Active Passed A
SB2 Off V3CPU Assigned Unknown A
SB3 On V3CPU Active Passed C
SB4 On V3CPU Active Passed C
SB5 On V3CPU Active Passed C
SB6 On V3CPU Active Passed D
SB7 - Empty Slot Available - Isolated
SB8 On V3CPU Active Passed E
SB9 - Empty Slot Available - Isolated
SB10 - Empty Slot Available - Isolated
SB11 - Empty Slot Available - Isolated
SB12 - Empty Slot Available - Isolated
SB13 - Empty Slot Available - Isolated
SB14 - Empty Slot Available - Isolated
SB15 - Empty Slot Available - Isolated
SB16 - Empty Slot Available - Isolated
SB17 - Empty Slot Available - Isolated
IO0 On HPCI+ Active Passed A
IO1 On HPCI+ Active Passed A
IO2 Off HPCI+ Assigned Unknown B
IO3 Off HPCI+ Assigned Unknown B
IO4 On HPCI+ Active Passed C
IO5 On HPCI+ Active Passed C
IO6 On HPCI+ Active Passed D
IO7 On HPCI+ Active Passed D
IO8 On HPCI+ Active Passed E
IO9 - Empty Slot Available - Isolated
IO10 - Empty Slot Available - Isolated
IO11 - Empty Slot Available - Isolated
IO12 - Empty Slot Available - Isolated
IO13 - Empty Slot Available - Isolated
IO14 - Empty Slot Available - Isolated
IO15 - Empty Slot Available - Isolated
IO16 - Empty Slot Available - Isolated
IO17 - Empty Slot Available - Isolated

Now delete the Board which is showing unknown status

bash-2.05$ deleteboard SB2
SB2 successfully unassigned.

Check the board status now by executing the command

bash-2.05$ showboards
Retrieving board information. Please wait.
.......
Location Pwr Type of Board Board Status Test Status Domain
-------- --- ------------- ------------ ----------- ------
SB0 On V3CPU Active Passed A
SB1 On V3CPU Active Passed A
SB2 Off V3CPU Available Unknown Isolated
SB3 On V3CPU Active Passed C
SB4 On V3CPU Active Passed C
SB5 On V3CPU Active Passed C
SB6 On V3CPU Active Passed D
SB7 - Empty Slot Available - Isolated
SB8 On V3CPU Active Passed E
SB9 - Empty Slot Available - Isolated
SB10 - Empty Slot Available - Isolated
SB11 - Empty Slot Available - Isolated
SB12 - Empty Slot Available - Isolated
SB13 - Empty Slot Available - Isolated
SB14 - Empty Slot Available - Isolated
SB15 - Empty Slot Available - Isolated
SB16 - Empty Slot Available - Isolated
SB17 - Empty Slot Available - Isolated
IO0 On HPCI+ Active Passed A
IO1 On HPCI+ Active Passed A
IO2 Off HPCI+ Assigned Unknown B
IO3 Off HPCI+ Assigned Unknown B
IO4 On HPCI+ Active Passed C
IO5 On HPCI+ Active Passed C
IO6 On HPCI+ Active Passed D
IO7 On HPCI+ Active Passed D
IO8 On HPCI+ Active Passed E
IO9 - Empty Slot Available - Isolated
IO10 - Empty Slot Available - Isolated
IO11 - Empty Slot Available - Isolated
IO12 - Empty Slot Available - Isolated
IO13 - Empty Slot Available - Isolated
IO14 - Empty Slot Available - Isolated
IO15 - Empty Slot Available - Isolated
IO16 - Empty Slot Available - Isolated
IO17 - Empty Slot Available - Isolated

bash-2.05$ showplatform

Domain configurations:
======================
Domain ID Domain Tag Solaris Nodename Domain Status
A - PKEND021 Running Solaris
B - - Powered Off
C - PSBLD008 Running Solaris
D - PSAPA013 Running Solaris
E - PEAID015 Running Solaris
F - - Powered Off
G - - Powered Off
H - - Powered Off
I - - Powered Off
J - - Powered Off
K - - Powered Off
L - - Powered Off
M - - Powered Off
N - - Powered Off
O - - Powered Off
P - - Powered Off
Q - - Powered Off
R - - Powered Off

bash-2.05$ addboard -d B SB2

Domain: B is not running. You can only "configure" a component into
a running domain. Would you like to "assign" the component(s) to
domain B instead (yes/no)? yes
SB2 assigned to domain: B
bash-2.05$

Now power on the board by executing the command

bash-2.05$ setkeyswitch -d b on

Powering on: CSB at CS1
Already powered on: CSB at CS1
Powering on: CSB at CS0
Already powered on: CSB at CS0
Powering on: EXB at EX2
Already powered on: EXB at EX2
Powering on: HPCI+ at IO2
Powering on: V3CPU at SB2
Powering on: EXB at EX3
Already powered on: EXB at EX3
Powering on: HPCI+ at IO3

Significant contents of .postrc (platform)
/etc/opt/SUNWSMS/SMS1.5/config/platform/.postrc:
# ident "@(#)postrc 1.1 01/04/02 SMI"

Reading domain blacklist file /etc/opt/SUNWSMS/config/B/blacklist ...
# ident "@(#)blacklist 1.1 01/04/02 SMI"
Reading platform blacklist file /etc/opt/SUNWSMS/config/platform/blacklist ...
# ident "@(#)blacklist 1.1 01/04/02 SMI"
SEEPROM probe took 0 seconds.
Reading Component Health Status (CHS) information ...
stage lport_reset: Assert reset to IOC ports in -Q mode...
stage_lport_reset(): Not -Q mode; Skipping Stage lport_reset
stage bus_probe: Check in-use bus configurations...
stage asic_probe: ASIC probe and JTAG/CBus integrity test...
stage brd_rev_eval: Board Revision Evaluation and Compliance...
stage cpu_probe: CPU Module probe...
stage cdc_probe: CDC DIMM probe...
stage mem_probe: Memory dimm probe...
stage adapter_probe: I/O adapter probe...
stage cp_shorts: Centerplane Shorts...
stage lbist: Logic BIST...
stage ibist: Interconnect BIST...
stage field_ict: Field Interconnect Tests...
stage mbist1: Internal memory BIST...
stage mbist2: External memory BIST...
stage domain_sync: Domain sync test...
stage cbus_bbsram: Console Bus test of bootbus sram...
stage sc_interrupt: DARB to SC interrupt...
stage cdc_clear: CDC DIMM clear...
stage cpu_lpost: Test all L1 CPU boards...
Performing ASIC config with bus config a/d/r = 333...
Slot0 in domain: 00004
Slot1 in domain: 0000C
EXBs in use: 001FB
sgcpu.flash file: Version 5.19.6 Build 1.0 I/F 12 is newest supported
stage nmb_cpu_lpost: Non-Mem Board Proc tests...
Performing ASIC config with bus config a/d/r = 333...
Slot0 in domain: 00004
Slot1 in domain: 0000C
EXBs in use: 001FB
stage_cpu_lpost(): No NMB Boards in config. Skipping Stage nmb_cpu_lpost.
Acquiring licenses for all good processors...
stage wib_lpost: Wildcat interface board tests...
stage_wib_lpost(): No good Wcis; Skipping Stage wib_lpost
stage pci_lpost: Test all L1 I/O boards...
Performing ASIC config with bus config a/d/r = 333...
Slot0 in domain: 00004
Slot1 in domain: 0000C
EXBs in use: 001FB
pcilpost.elf Version 5.19.6 Build 1.0 I/F 12 is newest supported
NOTE: Mixed Minor numbers: 2
All LPOSTs in a domain should use the same version.
Table of version comparisons:
Fprom SB02/F0: 5.19.3 Build 1.0 I/F 12 vs pcilpost.elf: 5.19.6 Build 1.0 I/F 12
Fprom SB02/F1: 5.19.3 Build 1.0 I/F 12 vs pcilpost.elf: 5.19.6 Build 1.0 I/F 12
stage exp_lpost: Domain-level board and system tests...
explpost.elf Version 5.19.6 Build 1.0 I/F 12 is newest supported
NOTE: Mixed Minor numbers: 2
All LPOSTs in a domain should use the same version.
Table of version comparisons:
Fprom SB02/F0: 5.19.3 Build 1.0 I/F 12 vs explpost.elf: 5.19.6 Build 1.0 I/F 12
Fprom SB02/F1: 5.19.3 Build 1.0 I/F 12 vs explpost.elf: 5.19.6 Build 1.0 I/F 12
stage cpu_lpost_II: CPU L1 domain/system tests...
sgcpu.flash file: Version 5.19.6 Build 1.0 I/F 12 is newest supported
stage pci_lpost_Q: Init all L1 I/O boards under -Q...
stage cpu_lpost_II_Q: CPU L1 domain/system init under -Q...
stage final_config: Final configuration...
Creating CPU SRAM handoff structures...
Creating GDCD IOSRAM handoff structures in Slot IO2...
Writing domain information to PCD...

Key to resource status value codes:
?=Unknown p=Present c=Crunched _=Undefined m=Missing
i=Misconfig o=FailedOBP f=Failed b=Blacklisted r=Redlisted
x=NotInDomain u=G,unconfig P=Passed ==G,lockstep l=NoLicense
e=EmptyCasstt

CPU_Brds: PortCore
3 2 1 0 Mem P/B: 3/1 3/0 2/1 2/0 1/1 1/0 0/1 0/0
Slot Gen 10101010 /L: 10 10 10 10 10 10 10 10 CDC
SB02: P PPPPPPPP PP PP PP PP PP PP PP PP P

I/O_Brds: IOC P1/Bus/Adapt IOC P0/Bus/Adapt
Slot Gen Type P1 B1/10 B0/10 P0 B1/eb10 B0/10 (e=ENet, b=BBC)
IO02: P hsPCI+ P p _p p _p P p PP_e p _p
IO03: P hsPCI+ P p _p p _p P p PP_e p _p

Configured in 333 with 4 procs, 16.000 GBytes, 6 IO adapters.
Interconnect frequency is 149.978 MHz, Measured.
Golden sram is on Slot IO2.
POST (level=16, verbose=20) execution time 5:53

Now check the status of the Board

bash-2.05$ showboards
Retrieving board information. Please wait.
...........
Location Pwr Type of Board Board Status Test Status Domain
-------- --- ------------- ------------ ----------- ------
SB0 On V3CPU Active Passed A
SB1 On V3CPU Active Passed A
SB2 On V3CPU Active Passed B
SB3 On V3CPU Active Passed C
SB4 On V3CPU Active Passed C
SB5 On V3CPU Active Passed C
SB6 On V3CPU Active Passed D
SB7 - Empty Slot Available - Isolated
SB8 On V3CPU Active Passed E
SB9 - Empty Slot Available - Isolated
SB10 - Empty Slot Available - Isolated
SB11 - Empty Slot Available - Isolated
SB12 - Empty Slot Available - Isolated
SB13 - Empty Slot Available - Isolated
SB14 - Empty Slot Available - Isolated
SB15 - Empty Slot Available - Isolated
SB16 - Empty Slot Available - Isolated
SB17 - Empty Slot Available - Isolated
IO0 On HPCI+ Active Passed A
IO1 On HPCI+ Active Passed A
IO2 On HPCI+ Active Passed B
IO3 On HPCI+ Active Passed B
IO4 On HPCI+ Active Passed C
IO5 On HPCI+ Active Passed C
IO6 On HPCI+ Active Passed D
IO7 On HPCI+ Active Passed D
IO8 On HPCI+ Active Passed E
IO9 - Empty Slot Available - Isolated
IO10 - Empty Slot Available - Isolated
IO11 - Empty Slot Available - Isolated
IO12 - Empty Slot Available - Isolated
IO13 - Empty Slot Available - Isolated
IO14 - Empty Slot Available - Isolated
IO15 - Empty Slot Available - Isolated
IO16 - Empty Slot Available - Isolated
IO17 - Empty Slot Available - Isolated

bash-2.05$ showplatform

PLATFORM:
=========
Platform Type: Sun Fire E20K

CSN:
====
Chassis Serial Number: 0609AK20BA

COD:
====
Chassis HostID: 5014936D87943
Proc RTUs installed: 0
PROC Headroom Quantity: 0
Proc RTUs reserved for domain A: 0
Proc RTUs reserved for domain B: 0
Proc RTUs reserved for domain C: 0
Proc RTUs reserved for domain D: 0
Proc RTUs reserved for domain E: 0
Proc RTUs reserved for domain F: 0
Proc RTUs reserved for domain G: 0
Proc RTUs reserved for domain H: 0
Proc RTUs reserved for domain I: 0
Proc RTUs reserved for domain J: 0
Proc RTUs reserved for domain K: 0
Proc RTUs reserved for domain L: 0
Proc RTUs reserved for domain M: 0
Proc RTUs reserved for domain N: 0
Proc RTUs reserved for domain O: 0
Proc RTUs reserved for domain P: 0
Proc RTUs reserved for domain Q: 0
Proc RTUs reserved for domain R: 0

Available Component List for Domains:
=====================================
Available Component List for domain A:
No System boards
No IO boards

Available Component List for domain B:
No System boards
No IO boards

Available Component List for domain C:
No System boards
No IO boards

Available Component List for domain D:
No System boards
No IO boards

Available Component List for domain E:
No System boards
No IO boards

Available Component List for domain F:
No System boards
No IO boards

Available Component List for domain G:
No System boards
No IO boards

Available Component List for domain H:
No System boards
No IO boards

Available Component List for domain I:
No System boards
No IO boards

Available Component List for domain J:
No System boards
No IO boards

Available Component List for domain K:
No System boards
No IO boards

Available Component List for domain L:
No System boards
No IO boards

Available Component List for domain M:
No System boards
No IO boards

Available Component List for domain N:
No System boards
No IO boards

Available Component List for domain O:
No System boards
No IO boards

Available Component List for domain P:
No System boards
No IO boards

Available Component List for domain Q:
No System boards
No IO boards

Available Component List for domain R:
No System boards
No IO boards

Domain Ethernet Addresses:
==========================
Domain ID Domain Tag Ethernet Address
A - 0:0:be:a9:fc:24
B - 0:0:be:a9:fc:25
C - 0:0:be:a9:fc:26
D - 0:0:be:a9:fc:27
E - 0:0:be:a9:fc:28
F - 0:0:be:a9:fc:29
G - 0:0:be:a9:fc:2a
H - 0:0:be:a9:fc:2b
I - 0:0:be:a9:fc:2c
J - 0:0:be:a9:fc:2d
K - 0:0:be:a9:fc:2e
L - 0:0:be:a9:fc:2f
M - 0:0:be:a9:fc:30
N - 0:0:be:a9:fc:31
O - 0:0:be:a9:fc:32
P - 0:0:be:a9:fc:33
Q - 0:0:be:a9:fc:34
R - 0:0:be:a9:fc:35

Domain configurations:
======================
Domain ID Domain Tag Solaris Nodename Domain Status
A - PKEND021 Running Solaris
B - - Running OBP
C - PSBLD008 Running Solaris
D - PSAPA013 Running Solaris
E - PEAID015 Running Solaris
F - - Powered Off
G - - Powered Off
H - - Powered Off
I - - Powered Off
J - - Powered Off
K - - Powered Off
L - - Powered Off
M - - Powered Off
N - - Powered Off
O - - Powered Off
P - - Powered Off
Q - - Powered Off
R - - Powered Off

bash-2.05$

Now go to console of domain B. You will get a ok prompt execute the boot command

bash-2.05$ console -d b
Trying to connect...
Connected to Domain Server.
Your console is in exclusive mode now.

{40} ok boot
Boot device: /pci@5d,600000/pci@1/scsi@2/disk@0,0:a File and args:
SunOS Release 5.9 Version Generic_118558-21 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
NOTICE: Kernel Cage Splitting is ENABLED
WARNING: forceload of misc/md_trans failed
WARNING: forceload of misc/md_raid failed
WARNING: forceload of misc/md_hotspares failed
WARNING: forceload of misc/md_sp failed
WARNING: ce4: fault detected external to device; service degraded
WARNING: ce4: xcvr addr:0x01 - link down
NOTICE: ce4: fault cleared external to device; service available
NOTICE: ce4: xcvr addr:0x01 - link up 1000 Mbps full duplex
configuring IPv4 interfaces: ce0 ce6 eri0.
Hostname: PKENA019
WARNING: ce10: fault detected external to device; service degraded
WARNING: ce10: xcvr addr:0x01 - link down
NOTICE: ce10: fault cleared external to device; service available
NOTICE: ce10: xcvr addr:0x01 - link up 1000 Mbps full duplex
WARNING: ce10: fault detected external to device; service degraded
WARNING: ce10: xcvr addr:0x01 - link down
NOTICE: ce10: fault cleared external to device; service available
NOTICE: ce10: xcvr addr:0x01 - link up 1000 Mbps full duplex
ID[luxadm.create_fabric_device.2316] configuration failed for line (/devices/pci@5d,700000/SUNW,qlc@1,1/fp@0,0:fc::100000e002233b2b) in file: /etc/cfg/fp/fabric_WWN_map. I/O error
Could not open /dev/rmt/2l to verify device id.
No such device or address
Could not open /dev/rmt/1l to verify device id.
No such device or address
Booting as part of a cluster
NOTICE: CMM: Node PKENA018 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node PKENA019 (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 2 (/dev/did/rdsk/d9s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
NOTICE: clcomm: Adapter ce10 constructed
NOTICE: clcomm: Path PKENA019:ce10 - PKENA018:ce10 being constructed
NOTICE: clcomm: Adapter ce4 constructed
NOTICE: clcomm: Path PKENA019:ce4 - PKENA018:ce4 being constructed
NOTICE: CMM: Node PKENA019: attempting to join cluster.
NOTICE: clcomm: Path PKENA019:ce10 - PKENA018:ce10 being initiated
NOTICE: clcomm: Path PKENA019:ce4 - PKENA018:ce4 being initiated
NOTICE: CMM: Node PKENA018 (nodeid: 1, incarnation #: 1211187859) has become reachable.
NOTICE: clcomm: Path PKENA019:ce10 - PKENA018:ce10 online
NOTICE: clcomm: Path PKENA019:ce4 - PKENA018:ce4 online
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node PKENA018 (nodeid = 1) is up; new incarnation number = 1211187859.
NOTICE: CMM: Node PKENA019 (nodeid = 2) is up; new incarnation number = 1211352533.
NOTICE: CMM: Cluster members: PKENA018 PKENA019.
NOTICE: CMM: node reconfiguration #16 completed.
NOTICE: CMM: Node PKENA019: joined cluster.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
Could not open /dev/rmt/2l to verify device id.
No such device or address
Could not open /dev/rmt/1l to verify device id.
No such device or address
The system is coming up. Please wait.
checking ufs filesystems
/dev/rdsk/c0t0d0s3: is logging.
Starting DCE daemons in rc.dce
/opt/OV/dce/bin/dced -b
Finished DCE daemons in rc.dce
starting rpc services: rpcbind done.
Setting netmask of ce0 to 255.255.255.0
Setting netmask of ce0:1 to 255.255.255.0
Setting netmask of ce6 to 255.255.255.0
Setting netmask of eri0 to 255.255.255.0
Setting netmask of ce10 to 255.255.255.128
Setting netmask of ce4 to 255.255.255.128
Setting netmask of clprivnet0 to 255.255.255.0
Setting default IPv4 interface for multicast: add net 224.0/4: gateway PKENA019
syslog service starting.
obtaining access to all attached disks
May 21 12:19:34 PKENA019 sckmd: PF_KEY error: type=DELETE, errno=3, diagnostic code=0
Starting Sun Java(TM) Web Console Version 2.2...
See /var/log/webconsole/console_debug_log for server logging information
starting NetWorker daemons:
nsrexecd
share_nfs: /s1/kenan/htm: No such file or directory
volume management starting.
Using /var/run
Storing undefined to /var/run/psn
The system is ready.

PKENA019 console login:
========================================================

Board Movement

Login to 10.1.18.122(SC) and do su - sms-user.
give the showboards and showplatforms command to find out the board used by PKENA019 as it will contain the host name also along with board no.

Switching the cluster

Login to 10.1.18.37(PKENA019)
scswitch -z -g Kenact-RG -h PKENA018
After switchover give
init 0 on PKENA019.

Moving the board.

Login to 10.1.18.122(SC) which contains the board of PKENA019.
su - sms-user
sc:sms-user:> showplatforms -p domains
sc:sms-user:> showboards
sc:sms-user:> setkeyswitch -d off
Current virtual key switch position is on.Change it to off?
sc:sms-user:> deleteboard -c unassign SB
SB unassigned.

Pull out the board and put it in other E20K domain.

Login to 10.1.18.112(SC) which contains the board of PKENA018
su - sms-user

sc:sms-user:> showplatform -p domains
sc:sms-user:> addboard -d -c assign SB
sc:sms-user:>showboard -d

Login to the PKENA018 and verify if 16 CPU's are available.

Solaris sun cluster & SAN Storage