OS rebooted due to os stall.
Issue
- Our customer encountered an unexpected reboot problem.
- From our investigation, we found a kernl panic occurred at that time and the panic was caused due to timeout of a timer monitoring by clusterpro software. The timeout occurs if cluserpro-related process, which runs the timer monitoring, stalls.
- According to our vmcore analysis, several processes were waiting I/O. Perhaps, the clusterpro-related process became unable to work and reached timeout of the timer monitoring. System processes such as kjounald and pdflush were waiting I/O?
- Process lists which ST is 'UN'
PID PPID CPU TASK ST %MEM VSZ RSS COMM
645 83 1 ffff81023dcff7e0 UN 0.0 0 0 [kjournald]
3297 83 1 ffff8101b6c8c7a0 UN 0.0 0 0 [pdflush]
3426 83 1 ffff81008875e100 UN 0.0 0 0 [pdflush]
5083 1 0 ffff81023bbe3080 UN 0.0 5904 696 syslogd
6219 6218 1 ffff810232cbe0c0 UN 0.0 23824 1780 clpevent
16845 16840 0 ffff8100850987e0 UN 0.0 3844 616 sadc
29187 29183 1 ffff810110d3a0c0 UN 0.0 29696 4628 actlog_cpuload
^^
Environment
- Red Hat Enterprise Linux 5.5
- kernel-2.6.18-194.32.1.el5
- CFQ I/O scheduler and usage of the IOPRIO_CLASS_IDLE priority
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.