Oracle RAC installation hitting the bug: "11gR2 Grid Infrastructure root.sh or rootupgrade.sh Hangs if Voting Disk is on OCFS2 [ID 1321757.1]" on RHEL5 Server
Environment
- Red Hat Enterprise Linux 5
- Kernel 2.6.18-238.12.1.0
- Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Issue
- Oracle RAC installation hitting the bug: "11gR2 Grid Infrastructure root.sh or rootupgrade.sh Hangs if Voting Disk is on OCFS2 [ID 1321757.1]" on RHEL5 Server
Resolution
- Workaround: Proceed with the installation through the Kernel 2.6.18-194.el5
Root Cause
- Bugzilla 727504, work in progress. https://bugzilla.redhat.com/show_bug.cgi?id=727504
Diagnostic Steps
- If running root.sh
• root.sh output
..
Creating OCR Keys for user 'root', privgrp 'root'..
Operation successful.
== then stuck here
• $GRID\_HOME/cfgtoollogs/crsconfig/rootcrs\_node.log
..
2011-03-16 17:23:28: Creating voting files
2011-03-16 17:23:28: Adding voting files /ocfs2/storage/voting
2011-03-16 17:23:28: Executing crsctl add css votedisk /ocfs2/storage/voting
2011-03-16 17:23:28: Executing /ocw/grid/bin/crsctl add css votedisk /ocfs2/storage/voting
2011-03-16 17:23:28: Executing cmd: /ocw/grid/bin/crsctl add css votedisk /ocfs2/storage/voting
• "ps" output, strace and call stack
ps -ef| grep votedisk
root 22026 17304 0 13:18 pts/2 00:00:00 /ocw/grid/bin/crsctl.bin add css votedisk /ocfs2/storage/voting
strace -ftt -p pid-of-crsctl-from-above-ps-command
Process 22026 attached - interrupt to quit
13:59:35.663332 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744
13:59:35.663538 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744
13:59:35.663638 io_getevents(47273557176320, 1, 128,
unfinished ...
Process 22026 detached
pstack pid-of-crsctl-from-above-ps-command
Thread 2 (Thread 0x41b70940 (LWP 22028)):
#0 0x0000003d0640aee9 in pthread\_cond\_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002b09a1e73548 in sltspcwait () from /ocw/grid/lib/libclntsh.so.11.1
#2 0x00002b099fcf69b8 in clsd_logThread () from /ocw/grid/lib/libhasgen11.so
#3 0x0000003d0640673d in start_thread () from /lib64/libpthread.so.0
#4 0x0000003d058d44bd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b09a4493dc0 (LWP 22026)):
#0 0x00002b09a42915b4 in ?? () from /usr/lib64/libaio.so.1
#1 0x00002b09a116cf6c in skgfrliopo () from /ocw/grid/lib/libclntsh.so.11.1
#2 0x00002b09a116cd75 in skgfospo () from /ocw/grid/lib/libclntsh.so.11.1
#3 0x00002b09a284f37f in skgfrwat () from /ocw/grid/lib/libclntsh.so.11.1
#4 0x00002b09a326b51e in kgfkWaitIO () from /ocw/grid/lib/libasmclntsh11.so
#5 0x00000000004cb848 in clsfDiscoverReap ()
#6 0x00000000004ca62a in clsfDiscover ()
#7 0x00000000004cad8a in clsfskidCreate ()
#8 0x00000000004c8256 in clsscfg_vhcreate ()
#9 0x00000000004c8d3e in clsscfg_vcreate ()
#10 0x0000000000434e61 in crsctl_vformat ()
#11 0x0000000000436257 in crsctl\_css\_votedisk ()
#12 0x000000000053a1c5 in cls\_crsctl\_parser::AddParser::addCss() ()
#13 0x0000000000535c23 in cls\_crsctl\_parser::AddParser::performOperation() ()
#14 0x00000000004dd2df in cls\_crsctl\_parser::Command::performParsingAndOperation(int, unsigned char\*\*)()
#15 0x00000000004d202b in crsctl_main(int, unsigned char\*\*) ()
#16 0x00000000004306c8 in main ()
- If running rootupgrade.sh:
• rootupgrade.sh output
..
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
== then stuck here
• $GRID\_HOME/cfgtoollogs/crsconfig/rootcrs\_node.log
..
2011-05-12 13:54:38: Upgrading the existing voting disks!
2011-05-12 13:54:38: Executing /ocw/grid/bin/cssvfupgd
2011-05-12 13:54:38: Executing cmd: /ocw/grid/bin/cssvfupgd
• "ps" output, strace and call stack
ps -ef| grep cssvfupgd
root 19530 14802 0 13:54 pts/0 00:00:00 /ocw/grid/bin/cssvfupgd.bin
strace -ftt -p pid-of-cssvfupgd-from-above-ps-command
Process 19530 attached - interrupt to quit
13:59:35.663332 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744
13:59:35.663538 times({tms\_utime=1, tms\_stime=0, tms\_cutime=0, tms\_cstime=0}) = 429525744
13:59:35.663638 io_getevents(47273557176320, 1, 128,
unfinished ...
Process 19530 detached
pstack pid-of-cssvfupgd-from-above-ps-command
#0 0x00002afebb6c45b4 in ?? () from /usr/lib64/libaio.so.1
#1 0x00002afeb8d756d4 in skgfrliopo () from /ocw/grid/lib/libclntsh.so.11.1
#2 0x00002afeb8d754dd in skgfospo () from /ocw/grid/lib/libclntsh.so.11.1
#3 0x00002afeba454eb7 in skgfrwat () from /ocw/grid/lib/libclntsh.so.11.1
#4 0x00002afebae7351e in kgfkWaitIO () from /ocw/grid/lib/libasmclntsh11.so
#5 0x0000000000407bbc in clsfDiscoverReap ()
#6 0x000000000040699e in clsfDiscover ()
#7 0x0000000000406b66 in clsfpath2skid ()
#8 0x00000000004029dc in cssvfupgd_GetGUID ()
#9 0x00000000004032d3 in cssvfupgd_GetConfig ()
#10 0x000000000040376d in cssvfupgd_main ()
#11 0x00000000004042c6 in clsutlmain ()
#12 0x0000000000401fc8 in main ()
- For both root.sh and rootupgrade.sh
• $GRID\_HOME/log/node/client/crsctl\_root.log for root.sh or $GRID_HOME/log/node/client/cssvfupgdn.log for rootupgrade.sh
..
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]Discovery with str:/ocfs2/storage/voting:
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]UFS discovery with :/ocfs2/storage/voting:
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]Fetching UFS disk :/ocfs2/storage/voting:
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]OSS discovery with :/ocfs2/storage/voting:
2011-03-16 17:23:28.143: \[ SKGFD\]\[2754190768\]Handle 0x154724a0 from lib :UFS:: for disk :/ocfs2/storage/voting:
2011-03-16 18:35:03.761: \[ SKGFD\]\[2754190768\]WARNING:io_getevents timed out 4294 sec
• mount option
mount | grep ocfs2
/dev/sdc1 on /ocfs2 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)
ls -l /ocfs2/storage/voting
-rw-r--r-- 1 oracle oinstall 10240000 Jun 19 2009 /ocfs2/storage/voting
[root@servername oraInventory]\# cd /d02/app/11.2.0/grid
[root@servername grid]\# ./root.sh
Running Oracle 11g root script...
- The following environment variables are set as:
ORACLE_OWNER= oragrid
ORACLE_HOME= /d02/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /d02/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
Modulefile is /etc/modprobe.conf
Modulefile is /etc/modprobe.conf
Modulefile is /etc/modprobe.conf
Modulefile is /etc/modprobe.conf
Modulefile is /etc/modprobe.conf
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-2672: Attempting to start 'ora.mdnsd' on 'servername'
CRS-2676: Start of 'ora.mdnsd' on 'servername' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'servername'
CRS-2676: Start of 'ora.gpnpd' on 'servername' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'servername'
CRS-2672: Attempting to start 'ora.gipcd' on 'servername'
CRS-2676: Start of 'ora.cssdmonitor' on 'servername' succeeded
CRS-2676: Start of 'ora.gipcd' on 'servername' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'servername'
CRS-2672: Attempting to start 'ora.diskmon' on 'servername'
CRS-2676: Start of 'ora.diskmon' on 'servername' succeeded
CRS-2676: Start of 'ora.cssd' on 'servername' succeeded
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
/etc/modprobe.conf
- root.sh is hung at the above.
[root@servername crsconfig]\# pwd
/d02/app/11.2.0/grid/cfgtoollogs/crsconfig
[root@servername crsconfig]\# ls -l
total 92
-rwxrwxr-x 1 oragrid oinstall 86182 Sep 1 03:01 rootcrs_servername.log
[root@servername crsconfig]\# tail -10 rootcrs_servername.log
2011-09-01 03:01:45: Command output:
CRS-2672: Attempting to start 'ora.crsd' on 'servername'
CRS-2676: Start of 'ora.crsd' on 'servername' succeeded
End Command output
2011-09-01 03:01:45: Start of resource "ora.crsd" Succeeded
2011-09-01 03:01:45: Creating voting files
2011-09-01 03:01:45: Adding voting files /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3
2011-09-01 03:01:45: Executing crsctl add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3
2011-09-01 03:01:45: Executing /d02/app/11.2.0/grid/bin/crsctl add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3
2011-09-01 03:01:45: Executing cmd: /d02/app/11.2.0/grid/bin/crsctl add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3
[root@servername ~]\# ps -ef| grep votedisk
root 3890 1937 0 09:56 pts/4 00:00:00 grep votedisk
root 32502 25545 0 03:01 pts/1 00:00:00 /d02/app/11.2.0/grid/bin/crsctl.bin add css votedisk /d03/oragrid/griddata/vdsk1 /d03/oragrid/griddata/vdsk2 /d03/oragrid/griddata/vdsk3
[root@servername ~]\# strace -ftt -p 32502
09:57:11.890923 fstatfs(120486672, ptrace: umoven: Input/output error
{...}) = 435168409
09:57:11.891237 fstatfs(120486672, ptrace: umoven: Input/output error
{...}) = 435168409
09:57:11.891291 ipc_subcall(0x33073608, 0xffffffff, 0x80, 0x1 unfinished ...
[root@servername ~]\# pstack 32502
Thread 2 (Thread 0x42845940 (LWP 32505)):
#0 0x00000036e7a0aee9 in pthread\_cond\_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002b9823da11a8 in sltspcwait () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1
#2 0x00002b9821c23ab0 in clsd_logThread () from /d02/app/11.2.0/grid/lib/libhasgen11.so
#3 0x00000036e7a0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000036e6ed44bd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b98263bbdc0 (LWP 32502)):
#0 0x00002b98261b95b4 in ?? () from /usr/lib64/libaio.so.1
#1 0x00002b982309a6d4 in skgfrliopo () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1
#2 0x00002b982309a4dd in skgfospo () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1
#3 0x00002b9824779eb7 in skgfrwat () from /d02/app/11.2.0/grid/lib/libclntsh.so.11.1
#4 0x00002b982519851e in kgfkWaitIO () from /d02/app/11.2.0/grid/lib/libasmclntsh11.so
#5 0x00000000004cb798 in clsfDiscoverReap ()
#6 0x00000000004ca57a in clsfDiscover ()
#7 0x00000000004cacda in clsfskidCreate ()
#8 0x00000000004c81a6 in clsscfg_vhcreate ()
#9 0x00000000004c8c8e in clsscfg_vcreate ()
#10 0x0000000000434db1 in crsctl_vformat ()
#11 0x00000000004361a7 in crsctl\_css\_votedisk ()
#12 0x000000000053a115 in cls\_crsctl\_parser::AddParser::addCss() ()
#13 0x0000000000535b73 in cls\_crsctl\_parser::AddParser::performOperation() ()
#14 0x00000000004dd22f in cls\_crsctl\_parser::Command::performParsingAndOperation(int, unsigned char\*\*) ()
#15 0x00000000004d1f7b in crsctl_main(int, unsigned char\*\*) ()
#16 0x0000000000430618 in main ()
[root@servername /]\# mount | grep ocfs2
ocfs2\_dlmfs on /dlm type ocfs2\_dlmfs (rw)
/dev/emcpowerd on /d03 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)
/dev/emcpowerc on /oraarchive type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)
/dev/emcpowerb on /orabackup type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)
[root@servername /]\# ls -l /d03/oragrid/griddata/
total 1376
-rw-r\----- 1 root oinstall 272756736 Sep 1 03:16 ocr1
-rw-r\----- 1 root oinstall 272756736 Sep 1 03:16 ocr2
-rw-r\----- 1 root oinstall 272756736 Sep 1 03:16 ocr3
-rw-r\----- 1 root root 21004800 Sep 1 03:01 vdsk1
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments