[RHEL5.7.z] Kernel panic in part_round_stats+19 initializing Veritas disk(s)

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.7.z
    • kernel 2.6.18-274.3.1.el5 (x86_64)
  • Veritas VCS node (of 4 node Veritas Cluster)
    • Memory size: 98867996 kB
    • Cpus: 16
  • Storage Stack
    • VxVM / VxDMP <---> (>= 1TB) SCSI <---> EMC Clariion (Active/Passive)
  • Veritas software versions
    • CDS (Cross-platform Data Sharing w/ Active/Passive storage array)
    • VRTSvxfen-5.1.132.000-SP1RP2_RHEL5.x86_64 Fri 14 Oct 2011 07:44:04 AM CDT
    • VRTSvxfs-5.1.132.000-SP1RP2_RHEL5.x86_64 Fri 14 Oct 2011 07:43:58 AM CDT
    • VRTSvxvm-5.1.132.000-SP1RP2_RHEL5.x86_64 Fri 14 Oct 2011 07:43:26 AM CDT

Issue

  • While executing the vxvm commnads e.g. 'vxdisksetup', 'vxdg init' etc. the system panics.
  • All of the core dumps (4) point to the OS call "part_round_stats()" as the cause of the crash irrespective of the command that invokes it.  It appears to occur when reading the partition statistics for the device. The commands that called this routine were vxdg, vxisforeign and cmaperfd (HP performance agent for Linux).
  • During start up the Linux kernel is unable to read the partition tables on the secondary path of the A/P-F array. When VxVM initializes its internal configuration, in some configurations with large CDS disks, it can dynamically create a single partition on the secondary paths. A subsequent system call to retrieve the disk statistics from the kernel can result in a panic:

    [exception RIP: part_round_stats+19]  
    RIP: ffffffff8014803a  RSP: ffff8117f44098f8  RFLAGS: 00010046  
    ...
    #4 [ffff8117f4409910] drive_stat_acct at ffffffff801481cd  
    #5 [ffff8117f4409930] __make_request at ffffffff8000c270  
    #6 [ffff8117f44099b0] generic_make_request at ffffffff8001c452  
    #7 [ffff8117f4409a30] gendmpstrategy at ffffffff88983744  
    #8 [ffff8117f4409a70] generic_make_request at ffffffff8001c452  
    #9 [ffff8117f4409af0] submit_bio at ffffffff8003318a  
    #10 [ffff8117f4409b30] submit_bh at ffffffff8001abc1
    

Resolution

  • Symantec has released a patch to address this issue. The creation of the extra partition on the passive paths is skipped to prevent the panic from occurring.
  • VSFHA 5.1 SP1 RP2 P2 HF12
  • vm-rhel5_x86_64-5.1SP1RP2P2HF12-rpms.tar.gz

Root Cause

  • Known issue with Veritas usage of OS disk stack. Symantec to provide a code change to allow Veritias to work the OS disk stack cleanly.
  • Veritas PATCH ID:5.1.132.212 INCIDENT NO:2701657 TRACKING ID:2701654

Diagnostic Steps

  • Analyze the core and find the following summary. When initializing disks via Veritas vxdiskunsetup command, a kernel panic is experienced.  It was determined by that it seems some how the device's partition map and the global partition hashtable are out of sync. We have managed to confirm that the partition was not found in the partstats_hash hashtable in get_partstats().
  • The panic observed looks very similar to the one described in https://bugzilla.redhat.com/show_bug.cgi?id=493517. A patch was applied into kernel-2.6.18-194 which looked as though it would address the problem as it was understood.  However, reading the bug comments, it is unclear if the problem was fully isolated, reproduced in the lab, and fully fixed. It could be the problem was fixed and this is a new, but similar, problem.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.