System panic with message : VXFEN CRITICAL V-11-1-20 Local cluster node ejected from cluster to prevent potential data corruption.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8
  • VxFS
  • Veritas Cluster Suite

Issue

  • The system paniced with the following messages in the logs :
        LLT INFO V-14-1-10509 link 1 (link1) node 1 expired
        LLT INFO V-14-1-10510 sent hbreq (NULL) on link 0 (link0) node 2. 0 more to go.
        LLT INFO V-14-1-10509 link 1 (link1) node 2 expired
        LLT INFO V-14-1-10032 link 0 (link0) node 1 inactive 15 sec (1155787)
        LLT INFO V-14-1-10032 link 0 (link0) node 0 inactive 15 sec (1155787)
        LLT INFO V-14-1-10032 link 0 (link0) node 2 inactive 15 sec (1155787)
        LLT INFO V-14-1-10509 link 0 (link0) node 0 expired
        LLT INFO V-14-1-10509 link 0 (link0) node 1 expired
        LLT INFO V-14-1-10509 link 0 (link0) node 2 expired
        GAB INFO V-15-1-20036 Port b gen   853304 membership ;  3
        GAB INFO V-15-1-20036 Port h gen   853308 membership ;  3
        GAB INFO V-15-1-20036 Port a gen   853303 membership ;  3
        Kernel panic - not syncing: VXFEN CRITICAL V-11-1-20 Local cluster node 
                ejected from cluster to prevent potential data corruption.
  • Alternatively, the following could be seen as well:
        VCS INFO V-16-1-10077 Received new cluster membership
        GAB INFO V-15-1-20036 Port h[GAB_USER_CLIENT (refcount 0)] gen   f92c7d membership
        GAB INFO V-15-1-20036 Port f[GAB_LEGACY_CLIENT (refcount 0)] gen   f92c6e membershi
        GAB INFO V-15-1-20036 Port d[GAB_LEGACY_CLIENT (refcount 0)] gen   f92c62 membershi
        GAB INFO V-15-1-20036 Port b[VxFen (refcount 2)] gen   f92c61 membership ;    5
        GAB INFO V-15-1-20036 Port v[GAB_LEGACY_CLIENT (refcount 0)] gen   f92c64 membershi
        GAB INFO V-15-1-20036 Port u[GAB_USER_CLIENT (refcount 0)] gen   f92c6a membership
        GAB INFO V-15-1-20036 Port y[GAB_LEGACY_CLIENT (refcount 0)] gen   f92c63 membershi
        GAB INFO V-15-1-20036 Port a[GAB_Control (refcount 1)] gen   f92c60 membership ;
        GAB INFO V-15-1-20036 Port w[GAB_USER_CLIENT (refcount 0)] gen   f92c66 membership
        VXFEN INFO V-11-1-80 RACER Node is: 5
        VXFEN INFO V-11-1-87 Initiating Race for Coordination Point

OR

[Sun Feb 18 10:15:20 EST 2024] VXFEN INFO V-11-1-80 RACER Node is: 1
[Sun Feb 18 10:15:20 EST 2024] VXFEN INFO V-11-1-100 Current LBOLT: 33820740033
[Sun Feb 18 10:15:20 EST 2024] VXFEN INFO V-11-1-87 Initiating VxFen Race
[Sun Feb 18 10:15:20 EST 2024] VXFEN INFO V-11-1-111 VxFen Pre-Race Delay: 0
[Sun Feb 18 10:15:20 EST 2024] VXFEN INFO V-11-1-113 Local Subcluster Size: 1, Peer Subcluster Size: 2
[Sun Feb 18 10:15:20 EST 2024] VXFEN INFO V-11-1-115 Smaller Subcluster Delay: 20
[Sun Feb 18 10:15:20 EST 2024] vxglm INFO V-42-106 GLM port m Recovery complete, gen f40f8c proto 90 mbr 2/0/0/0
[Sun Feb 18 10:15:20 EST 2024] vxglm INFO V-42-107 Times: skew 1 ms, remaster 1 ms, completion 0 ms
[Sun Feb 18 10:15:25 EST 2024] sd 1:0:0:29: Parameters changed
[Sun Feb 18 10:15:25 EST 2024] sd 1:0:0:30: Parameters changed
[Sun Feb 18 10:15:25 EST 2024] VXFEN CRITICAL V-11-1-89 RACER Node lost the VxFen race
[Sun Feb 18 10:15:25 EST 2024] VXFEN INFO V-11-1-112 VxFen Post-Race Delay: 0
[Sun Feb 18 10:15:25 EST 2024] VXFEN NOTICE V-11-1-92 Sending LOST_RACE
[Sun Feb 18 10:15:25 EST 2024] Kernel panic - not syncing: VXFEN CRITICAL V-11-1-20 Local cluster node
                                ejected from cluster to prevent potential data corruption.

Resolution

V-11-1-20

Local cluster node ejected from cluster to prevent potential data corruption

After a split brain condition, the nodes in each of the sub-clusters race to grab majority of the coordinator disks in case of SCSI-3 based I/O fencing. If the local node is not registered with a disk, it is assumed that the node has been ejected from the cluster. The local node issues a self-panic with this message.

See the Veritas Cluster Server User's Guide for information on resolving the split brain condition.
  • NOTE: The above link and information refers to information that is not authored by Red Hat directly. Its accuracy cannot be verified by Red Hat support directly.

  • Open a case with Veritas for further diagnosis.

Diagnostic Steps

  • Collect vmcore.
crash > log

Kernel panic - not syncing: VXFEN CRITICAL V-11-1-20 Local cluster node ejected from cluster to prevent potential data corruption.
  • System information including ldavg and panic condition.
crash > sys

KERNEL: ./usr/lib/debug/lib/modules/2.6.18-128.7.1.el5/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Sat Mar 24 21:41:36 2012
      UPTIME: 7 days, 05:56:34
LOAD AVERAGE: 673.72, 815.22, 914.28
       TASKS: 12388
    NODENAME: <hostname>
     RELEASE: 2.6.18-128.7.1.el5
     VERSION: #1 SMP Wed Aug 19 04:00:49 EDT 2009
     MACHINE: x86_64  (2411 Mhz)
      MEMORY: 126.2 GB
       PANIC: "Kernel panic - not syncing: VXFEN CRITICAL V-11-1-20 Local cluster node " <---
         PID: 7219
     COMMAND: "vxfen"
        TASK: ffff81207e133860  [THREAD_INFO: ffff81207dd62000]
         CPU: 9
       STATE: TASK_RUNNING (PANIC)
  • Backtrace of the offending process:
crash > bt

PID: 7219   TASK: ffff81207e133860  CPU: 9   COMMAND: "vxfen"
 #0 [ffff81207dd63bf8] crash_kexec at ffffffff800aaa72
 #1 [ffff81207dd63cb8] panic at ffffffff8008ef07
 #2 [ffff81207dd63da8] vxfen_plat_panic at ffffffff88bef5de [vxfen]
 #3 [ffff81207dd63dc8] vxfen_grab_coord_disks at ffffffff88bf3e30 [vxfen]
 #4 [ffff81207dd63e08] vxfen_grab_coord_pt at ffffffff88bdd206 [vxfen]
 #5 [ffff81207dd63e28] vxfen_msg_node_left_ack at ffffffff88be32e9 [vxfen]
 #6 [ffff81207dd63e48] vxfen_process_client_msg at ffffffff88be3de5 [vxfen]
 #7 [ffff81207dd63e78] vxfen_vrfsm_cback at ffffffff88be428c [vxfen]
 #8 [ffff81207dd63eb8] vrfsm_step at ffffffff88bf749a [vxfen]
 #9 [ffff81207dd63ef8] vrfsm_recv_thread at ffffffff88bf8a90 [vxfen]
#10 [ffff81207dd63f28] vxplat_lx_thread_base at ffffffff88bf92d7 [vxfen]
#11 [ffff81207dd63f48] kernel_thread at ffffffff8005dfb1
  • Kernel ring buffer at the time of the event:
crash> log | tail -8
sd 6:0:1:118: reservation conflict
sd 6:0:1:118: reservation conflict
sd 6:0:1:118: reservation conflict
VXFEN CRITICAL V-11-1-89 RACER Node lost the race for the coordination points
VXFEN NOTICE V-11-1-92 Sending LOST_RACE
Kernel panic - not syncing: VXFEN CRITICAL V-11-1-20 Local cluster node
 ejected from cluster to prevent potential data corruption.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments