Oracle RAC installation hitting the bug: "11gR2 Grid Infrastructure root.sh or rootupgrade.sh Hangs if Voting Disk is on OCFS2 [ID 1321757.1]" on RHEL5 Server

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Kernel  2.6.18-238.12.1.0
  • Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later   [Release: 11.2 and later ]

Issue

  • Oracle RAC installation hitting the bug: "11gR2 Grid Infrastructure root.sh or rootupgrade.sh Hangs if Voting Disk is on OCFS2 [ID 1321757.1]" on RHEL5 Server

Resolution

  • Workaround: Proceed with the installation through the Kernel 2.6.18-194.el5

Root Cause

Diagnostic Steps

  • If running root.sh
• root.sh output  
..  
Creating OCR Keys for user 'root', privgrp 'root'..  
Operation successful.  
== then stuck here  

• $GRID\_HOME/cfgtoollogs/crsconfig/rootcrs\_node.log  
..  
2011-03-16 17:23:28: Creating voting files  
2011-03-16 17:23:28: Adding voting files /ocfs2/storage/voting  
2011-03-16 17:23:28: Executing crsctl add css votedisk /ocfs2/storage/voting  
2011-03-16 17:23:28: Executing /ocw/grid/bin/crsctl add css votedisk /ocfs2/storage/voting  
2011-03-16 17:23:28: Executing cmd: /ocw/grid/bin/crsctl add css votedisk /ocfs2/storage/voting  

• "ps" output, strace and call stack  
ps -ef| grep votedisk  
root 22026 17304 0 13:18 pts/2 00:00:00 /ocw/grid/bin/crsctl.bin add css votedisk /ocfs2/storage/voting  

strace -ftt -p pid-of-crsctl-from-above-ps-command  
Process 22026 attached - interrupt to quit  
13:59:35.663332 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744  
13:59:35.663538 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744  
13:59:35.663638 io_getevents(47273557176320, 1, 128,  
unfinished ...  
Process 22026 detached  

pstack pid-of-crsctl-from-above-ps-command  
Thread 2 (Thread 0x41b70940 (LWP 22028)):  
#0 0x0000003d0640aee9 in pthread\_cond\_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  
#1 0x00002b09a1e73548 in sltspcwait () from /ocw/grid/lib/libclntsh.so.11.1  
#2 0x00002b099fcf69b8 in clsd_logThread () from /ocw/grid/lib/libhasgen11.so  
#3 0x0000003d0640673d in start_thread () from /lib64/libpthread.so.0  
#4 0x0000003d058d44bd in clone () from /lib64/libc.so.6  
Thread 1 (Thread 0x2b09a4493dc0 (LWP 22026)):  
#0 0x00002b09a42915b4 in ?? () from /usr/lib64/libaio.so.1  
#1 0x00002b09a116cf6c in skgfrliopo () from /ocw/grid/lib/libclntsh.so.11.1  
#2 0x00002b09a116cd75 in skgfospo () from /ocw/grid/lib/libclntsh.so.11.1  
#3 0x00002b09a284f37f in skgfrwat () from /ocw/grid/lib/libclntsh.so.11.1  
#4 0x00002b09a326b51e in kgfkWaitIO () from /ocw/grid/lib/libasmclntsh11.so  
#5 0x00000000004cb848 in clsfDiscoverReap ()  
#6 0x00000000004ca62a in clsfDiscover ()  
#7 0x00000000004cad8a in clsfskidCreate ()  
#8 0x00000000004c8256 in clsscfg_vhcreate ()  
#9 0x00000000004c8d3e in clsscfg_vcreate ()  
#10 0x0000000000434e61 in crsctl_vformat ()  
#11 0x0000000000436257 in crsctl\_css\_votedisk ()  
#12 0x000000000053a1c5 in cls\_crsctl\_parser::AddParser::addCss() ()  
#13 0x0000000000535c23 in cls\_crsctl\_parser::AddParser::performOperation() ()  
#14 0x00000000004dd2df in cls\_crsctl\_parser::Command::performParsingAndOperation(int, unsigned char\*\*)()  
#15 0x00000000004d202b in crsctl_main(int, unsigned char\*\*) ()  
#16 0x00000000004306c8 in main ()
  • If running rootupgrade.sh:
• rootupgrade.sh output  
..  
ACFS-9323: Creating module dependencies - this may take some time.  
ACFS-9327: Verifying ADVM/ACFS devices.  
ACFS-9309: ADVM/ACFS installation correctness verified.  
== then stuck here  

• $GRID\_HOME/cfgtoollogs/crsconfig/rootcrs\_node.log  
..  
2011-05-12 13:54:38: Upgrading the existing voting disks!  
2011-05-12 13:54:38: Executing /ocw/grid/bin/cssvfupgd  
2011-05-12 13:54:38: Executing cmd: /ocw/grid/bin/cssvfupgd  

• "ps" output, strace and call stack  
ps -ef| grep cssvfupgd  
root 19530 14802 0 13:54 pts/0 00:00:00 /ocw/grid/bin/cssvfupgd.bin  

strace -ftt -p pid-of-cssvfupgd-from-above-ps-command  
Process 19530 attached - interrupt to quit  
13:59:35.663332 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744  
13:59:35.663538 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744  
13:59:35.663638 io_getevents(47273557176320, 1, 128,  
unfinished ...  
Process 19530 detached  

pstack pid-of-cssvfupgd-from-above-ps-command  
#0 0x00002afebb6c45b4 in ?? () from /usr/lib64/libaio.so.1  
#1 0x00002afeb8d756d4 in skgfrliopo () from /ocw/grid/lib/libclntsh.so.11.1  
#2 0x00002afeb8d754dd in skgfospo () from /ocw/grid/lib/libclntsh.so.11.1  
#3 0x00002afeba454eb7 in skgfrwat () from /ocw/grid/lib/libclntsh.so.11.1  
#4 0x00002afebae7351e in kgfkWaitIO () from /ocw/grid/lib/libasmclntsh11.so  
#5 0x0000000000407bbc in clsfDiscoverReap ()  
#6 0x000000000040699e in clsfDiscover ()  
#7 0x0000000000406b66 in clsfpath2skid ()  
#8 0x00000000004029dc in cssvfupgd_GetGUID ()  
#9 0x00000000004032d3 in cssvfupgd_GetConfig ()  
#10 0x000000000040376d in cssvfupgd_main ()  
#11 0x00000000004042c6 in clsutlmain ()  
#12 0x0000000000401fc8 in main ()
  • For both root.sh and rootupgrade.sh
• $GRID\_HOME/log/node/client/crsctl\_root.log for root.sh or $GRID_HOME/log/node/client/cssvfupgdn.log for rootupgrade.sh  
..  
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]Discovery with str:/ocfs2/storage/voting:  
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]UFS discovery with :/ocfs2/storage/voting:  
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]Fetching UFS disk :/ocfs2/storage/voting:  
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]OSS discovery with :/ocfs2/storage/voting:  
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]Handle 0x154724a0 from lib :UFS:: for disk :/ocfs2/storage/voting:  
2011-03-16 18:35:03.761: \[ SKGFD\]\[2754190768\]WARNING:io_getevents timed out 4294 sec  

• mount option  
mount | grep ocfs2  
/dev/sdc1 on /ocfs2 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)

ls -l /ocfs2/storage/voting  
-rw-r--r-- 1 oracle oinstall 10240000 Jun 19 2009 /ocfs2/storage/voting  

[root@servername oraInventory]\# cd /d02/app/11.2.0/grid

[root@servername grid]\# ./root.sh  
Running Oracle 11g root script...  
  • The following environment variables are set as:
 ORACLE_OWNER= oragrid  
 ORACLE_HOME= /d02/app/11.2.0/grid  

Enter the full pathname of the local bin directory: [/usr/local/bin]:  
 Copying dbhome to /usr/local/bin ...  
 Copying oraenv to /usr/local/bin ...  
 Copying coraenv to /usr/local/bin ...  


Creating /etc/oratab file...  
Entries will be added to the /etc/oratab file as needed by  
Database Configuration Assistant when a database is created  
Finished running generic part of root script.  
Now product-specific root actions will be performed.  
Using configuration parameter file: /d02/app/11.2.0/grid/crs/install/crsconfig_params  

Creating trace directory  
LOCAL ADD MODE  
Creating OCR keys for user 'root', privgrp 'root'..  
Operation successful.  
OLR initialization - successful  

 root wallet  
 root wallet cert  
 root cert export  
 peer wallet  
 profile reader wallet  
 pa wallet  
 peer wallet keys  
 pa wallet keys  
 peer cert request  
 pa cert request  
 peer cert  
 pa cert  
 peer root cert TP  
 profile reader root cert TP  
 pa root cert TP  
 peer pa cert TP  
 pa peer cert TP  
 profile reader pa cert TP  
 profile reader peer cert TP  
 peer user cert  
 pa user cert  

Adding daemon to inittab  

ACFS-9200: Supported  
ACFS-9300: ADVM/ACFS distribution files found.  
ACFS-9307: Installing requested ADVM/ACFS software.  
ACFS-9308: Loading installed ADVM/ACFS drivers.  
ACFS-9321: Creating udev for ADVM/ACFS.  
ACFS-9323: Creating module dependencies - this may take some time.  

Modulefile is /etc/modprobe.conf  
Modulefile is /etc/modprobe.conf  
Modulefile is /etc/modprobe.conf  
Modulefile is /etc/modprobe.conf  
Modulefile is /etc/modprobe.conf  

ACFS-9327: Verifying ADVM/ACFS devices.  
ACFS-9309: ADVM/ACFS installation correctness verified.  

CRS-2672: Attempting to start 'ora.mdnsd' on 'servername'  
CRS-2676: Start of 'ora.mdnsd' on 'servername' succeeded  
CRS-2672: Attempting to start 'ora.gpnpd' on 'servername'  
CRS-2676: Start of 'ora.gpnpd' on 'servername' succeeded  
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'servername'  
CRS-2672: Attempting to start 'ora.gipcd' on 'servername'  
CRS-2676: Start of 'ora.cssdmonitor' on 'servername' succeeded  
CRS-2676: Start of 'ora.gipcd' on 'servername' succeeded  
CRS-2672: Attempting to start 'ora.cssd' on 'servername'  
CRS-2672: Attempting to start 'ora.diskmon' on 'servername'  
CRS-2676: Start of 'ora.diskmon' on 'servername' succeeded  
CRS-2676: Start of 'ora.cssd' on 'servername' succeeded  

clscfg: -install mode specified  

Successfully accumulated necessary OCR keys.  

Creating OCR keys for user 'root', privgrp 'root'..  

Operation successful.  

/etc/modprobe.conf
  • root.sh is hung at the above.
[root@servername crsconfig]\# pwd  
/d02/app/11.2.0/grid/cfgtoollogs/crsconfig  

[root@servername crsconfig]\# ls -l  
total 92  
-rwxrwxr-x 1 oragrid oinstall 86182 Sep 1 03:01 rootcrs_servername.log

[root@servername crsconfig]\# tail -10 rootcrs_servername.log  
2011-09-01 03:01:45: Command output:  


CRS-2672: Attempting to start 'ora.crsd' on 'servername'  
CRS-2676: Start of 'ora.crsd' on 'servername' succeeded  
End Command output  


2011-09-01 03:01:45: Start of resource "ora.crsd" Succeeded  
2011-09-01 03:01:45: Creating voting files  
2011-09-01 03:01:45: Adding voting files /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3  
2011-09-01 03:01:45: Executing crsctl add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3  
2011-09-01 03:01:45: Executing /d02/app/11.2.0/grid/bin/crsctl add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3  
2011-09-01 03:01:45: Executing cmd: /d02/app/11.2.0/grid/bin/crsctl add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3

[root@servername ~]\# ps -ef| grep votedisk  
root 3890 1937 0 09:56 pts/4 00:00:00 grep votedisk  
root 32502 25545 0 03:01 pts/1 00:00:00 /d02/app/11.2.0/grid/bin/crsctl.bin add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3

[root@servername ~]\# strace -ftt -p 32502  
09:57:11.890923 fstatfs(120486672, ptrace: umoven: Input/output error  
{...}) = 435168409  
09:57:11.891237 fstatfs(120486672, ptrace: umoven: Input/output error  
{...}) = 435168409  
09:57:11.891291 ipc_subcall(0x33073608, 0xffffffff, 0x80, 0x1 unfinished ...

[root@servername ~]\# pstack 32502  
Thread 2 (Thread 0x42845940 (LWP 32505)):  
#0 0x00000036e7a0aee9 in pthread\_cond\_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  
#1 0x00002b9823da11a8 in sltspcwait () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1  
#2 0x00002b9821c23ab0 in clsd_logThread () from /d02/app/11.2.0/grid/lib/libhasgen11.so  
#3 0x00000036e7a0673d in start_thread () from /lib64/libpthread.so.0  
#4 0x00000036e6ed44bd in clone () from /lib64/libc.so.6  


Thread 1 (Thread 0x2b98263bbdc0 (LWP 32502)):  


#0 0x00002b98261b95b4 in ?? () from /usr/lib64/libaio.so.1  
#1 0x00002b982309a6d4 in skgfrliopo () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1  
#2 0x00002b982309a4dd in skgfospo () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1  
#3 0x00002b9824779eb7 in skgfrwat () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1  
#4 0x00002b982519851e in kgfkWaitIO () from /d02/app/11.2.0/grid/lib/libasmclntsh11.so  
#5 0x00000000004cb798 in clsfDiscoverReap ()  
#6 0x00000000004ca57a in clsfDiscover ()  
#7 0x00000000004cacda in clsfskidCreate ()  
#8 0x00000000004c81a6 in clsscfg_vhcreate ()  
#9 0x00000000004c8c8e in clsscfg_vcreate ()  
#10 0x0000000000434db1 in crsctl_vformat ()  
#11 0x00000000004361a7 in crsctl\_css\_votedisk ()  
#12 0x000000000053a115 in cls\_crsctl\_parser::AddParser::addCss() ()  
#13 0x0000000000535b73 in cls\_crsctl\_parser::AddParser::performOperation() ()  
#14 0x00000000004dd22f in cls\_crsctl\_parser::Command::performParsingAndOperation(int, unsigned char\*\*) ()  
#15 0x00000000004d1f7b in crsctl_main(int, unsigned char\*\*) ()  
#16 0x0000000000430618 in main ()

[root@servername /]\# mount | grep ocfs2  
ocfs2\_dlmfs on /dlm type ocfs2\_dlmfs (rw)  
/dev/emcpowerd on /d03 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)  
/dev/emcpowerc on /oraarchive type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)  
/dev/emcpowerb on /orabackup type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)

[root@servername /]\# ls -l /d03/oragrid/griddata/  

total 1376  

-rw-r\----- 1 root oinstall 272756736 Sep 1 03:16 ocr1  
-rw-r\----- 1 root oinstall 272756736 Sep 1 03:16 ocr2  
-rw-r\----- 1 root oinstall 272756736 Sep 1 03:16 ocr3  
-rw-r\----- 1 root root 21004800 Sep 1 03:01 vdsk1

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments