Select Your Language

Infrastructure and Management

Cloud Computing

Storage

Runtimes

Integration and Automation

  • Comments
  • RHEV behaviour in various failure scenarios - help me to understand

    Posted on

     

    Hello colleagues. I need your help in order to understand some rhev behaviours.

     

    Here is my setup:

     

    rhevm - rhev-m on esx virtual machine, rhel 6.3

    node1 - physical hp gen8 server, with rhel 6.3 (not installed from hypervisor iso)

    node2 - physical hp gen8 server, with rhel 6.3 (not installed from hypervisor iso)

    storage is fc domain, a 1TB shared lun connected to both servers

    only one, management network (name: rhevm) between all nodes

    power management configured and tested (ipmilan)

     

    Test 1

    initial setup

    node1, up, SPM

    node2, up, none

     

    kernel panic on node1:

    echo "c" > /proc/sysrq-trigger

     

    status is still:

    node1, up, SPM

    node2, up, none

     

    after 3 or 4 minutes status changed to:

    node1, connecting, SPM

    node2, up, none

     

    node1 was rebooting. when RedHat loaded, node1 was fenced, status changed to:

    node1, non responsive, none

    node2, up, contending, then SPM

     

    After node1 came up, status changed to:

    node1, up, none

    node2, up, SPM

     

    Test 2:

    initial setup

    node1, up, SPM

    node2, up, none

     

    reboot node1 with reboot command

     

    status changed immediately to:

    node1, connecting, SPM

    node2, up, none

     

    after node 1 came up after reboot it was fenced. Status changed to:

    node1, non responsive, none

    node2, up, contending, then SPM

     

    After node1 came up, status changed to:

    node1, up, none

    node2, up, SPM

     

    Test 3:

    initial setup

    node1, up, SPM

    node2, up, none

     

    Restart of the node from Power Management menu

     

    status changed immediately to:

    node1, non responsive, none

    node2, up, contending, then SPM

     

    After node1 came up, status changed to

    node1, up, none

    node2, up, SPM

     

    Questions:

    - Why it took so long to detect node1 failure in test 1. 

    - Why it was fenced after it came up after kernel panic and reboot in test1 and 2?

    - How can I reduce the failure detection time in test1?

     

    Many thanks for your help.

    by

    points

    Responses

    Red Hat LinkedIn YouTube Facebook X, formerly Twitter

    Quick Links

    Help

    Site Info

    Related Sites

    © 2026 Red Hat