During snapshot/backup operation, RHEL 7.4 Guest VM in MS Hyper-V gets crashed with Out of Memory errors.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7.4 Guest in MS Hyper-V
  • Issue may appear during backup with "veeam" tool.
  • kernel-3.10.0-693.2.2.el7

Issue

  • During snapshot/backup operation, RHEL 7.4 Guest VM in MS Hyper-V gets crashed with Out of Memory errors:

    journal: Hyper-V VSS: VSS: op=FREEZE: succeeded
    journal: Hyper-V VSS: VSS: op=THAW: succeeded
    [...]
    sd 4:0:1:1: [sdd] Sector size 0 reported, assuming 512.
    sd 3:0:1:0: [storvsc] Sense Key : Unit Attention [current] 
    sd 3:0:1:0: [storvsc] Add. Sense: Changed operating definition
    sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
    INFO: task kworker/u128:2:268 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kworker/u128:2  D ffff88003666b868     0   268      2 0x00000000
    Workqueue: events_unbound async_run_entry_fn
    ffff880402663cc0 0000000000000046 ffff880403bf0000 ffff880402663fd8
    ffff880402663fd8 ffff880402663fd8 ffff880403bf0000 ffff88003666b860
    ffff88003666b864 ffff880403bf0000 00000000ffffffff ffff88003666b868
    Call Trace:
    [<ffffffff816aa409>] schedule_preempt_disabled+0x29/0x70
    [<ffffffff816a8337>] __mutex_lock_slowpath+0xc7/0x1d0
    [<ffffffff810ce8ad>] ? check_preempt_wakeup+0x11d/0x250
    [<ffffffff816a774f>] mutex_lock+0x1f/0x2f
    [<ffffffff81471d2e>] scsi_scan_host_selected+0x7e/0x1c0     <---
    [<ffffffff810c1309>] ? ttwu_do_wakeup+0x19/0xd0
    [<ffffffff81471f01>] do_scsi_scan_host+0x91/0xa0
    [<ffffffff814720fc>] do_scan_async+0x1c/0x160
    [<ffffffff810b7cc9>] async_run_entry_fn+0x39/0x120
    [<ffffffff810a881a>] process_one_work+0x17a/0x440
    [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
    [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
    [<ffffffff810b098f>] kthread+0xcf/0xe0
    [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
    [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
    [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
    [...]
    

Resolution

Red Hat Enterprise Linux (RHEL) 7.5 or later

Upgrade to 3.10.0-862.el7.x86_64 from Errata RHSA-2018:1062 or later

Red Hat Enterprise Linux (RHEL) 7.4.z

Upgrade to kernel-3.10.0-693.21.1.el7 from Errata RHSA-2018:0395

Root Cause

  • There is a bug with hv_storvsc modules shipped with RHEL 7.4 kernel-3.10.0-693.2.2.el7 due to which driver can schedule large amounts of storvsc_remove_lun work items to the lun during snapshot operation. If there has been heavy IO to the MS virtual disk for which a snapshot is being taken, then there are chances that these IO requests may fail. A remove lun work item is issued per failed IO. If the outstanding Test Unit Ready SCSI commands have not been completed in a timely manner the scan_mutex for SCSI host is never released or released too late. Consequently the many remove lun work items are not completed as scsi_remove_device also tries to take host->scan_mutex. This long delays in getting host->scan_mutex lock results in worker threads stuck in dead-lock, and out of memory situation/crash.

  • Below commits will resolve issue

commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f
Author: Stephen Hemminger <stephen@networkplumber.org>
Date:   Tue Mar 7 09:15:53 2017 -0800

Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
version 0 (UNKNOWN) but is still capable of supporting REPORTLUN.

Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
successfully with virtual DVD ROM device. What happens is that the SCSI
scan process falls back to doing sequential probing by INQUIRY.  But the
storvsc driver has a previous workaround that masks/blocks all errors
reports from INQUIRY (or MODE_SENSE) commands.  This workaround causes
the scan to then populate a full set of bogus LUN's on the target and
then sends kernel spinning off into a death spiral doing block reads on
the non-existent LUNs.

By setting the correct blacklist flags, the target with the DVD device
is scanned with REPORTLUN and that works correctly.

Patch needs to go in current 4.11, it is safe but not necessary in older
kernels.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Cathy Avery <cavery@redhat.com>

Diagnostic Steps

  • Just before a crash with out of memory condition, following messages were logged which indicates that configuration parameters associated with virtual disks provisioned to above VM were changed. These changes in configuration of virtual disks seems to have caused during the snapshot operation:

    journal: Hyper-V VSS: VSS: op=FREEZE: succeeded
    journal: Hyper-V VSS: VSS: op=THAW: succeeded
    [...]
    sd 2:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
    sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
    sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
    
  • Each of above error messages are accompanied by following Unit Attention sense key returned from target side (i.e. in this case virtual disk):

    sd 2:0:0:0: [storvsc] Sense Key : Unit Attention [current] 
    sd 2:0:0:0: [storvsc] Add. Sense: Changed operating definition
    sd 2:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
    
  • Above Unit Attention sense key is typically returned from target side to infom the Linux systems that there has been some changes/reconfiguration occurred with disk devices.

  • Just after the above messages in logs, we see that scsi_prep_async_scan was called to rescan the paravirtualized MS hyperv SCSI adapters, but the worker threads were stuck in UN state waiting for completion of scsi host scan:

    sd 4:0:1:1: [sdd] Sector size 0 reported, assuming 512.
    sd 3:0:1:0: [storvsc] Sense Key : Unit Attention [current] 
    sd 3:0:1:0: [storvsc] Add. Sense: Changed operating definition
    sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
    INFO: task kworker/u128:2:268 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kworker/u128:2  D ffff88003666b868     0   268      2 0x00000000
    Workqueue: events_unbound async_run_entry_fn
    ffff880402663cc0 0000000000000046 ffff880403bf0000 ffff880402663fd8
    ffff880402663fd8 ffff880402663fd8 ffff880403bf0000 ffff88003666b860
    ffff88003666b864 ffff880403bf0000 00000000ffffffff ffff88003666b868
    Call Trace:
    [<ffffffff816aa409>] schedule_preempt_disabled+0x29/0x70
    [<ffffffff816a8337>] __mutex_lock_slowpath+0xc7/0x1d0
    [<ffffffff810ce8ad>] ? check_preempt_wakeup+0x11d/0x250
    [<ffffffff816a774f>] mutex_lock+0x1f/0x2f
    [<ffffffff81471d2e>] scsi_scan_host_selected+0x7e/0x1c0     <---
    [<ffffffff810c1309>] ? ttwu_do_wakeup+0x19/0xd0
    [<ffffffff81471f01>] do_scsi_scan_host+0x91/0xa0
    [<ffffffff814720fc>] do_scan_async+0x1c/0x160
    [<ffffffff810b7cc9>] async_run_entry_fn+0x39/0x120
    [<ffffffff810a881a>] process_one_work+0x17a/0x440
    [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
    [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
    [<ffffffff810b098f>] kthread+0xcf/0xe0
    [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
    [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
    [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
    [...]
    
  • The vmcore shows that there were large number of kworker threads stuck in UN (uninterruptible) state waiting for compltion of scan:

    crash> ps -m|grep UN
    [...]
    [0 00:56:30.116] [UN]  PID: 45133  TASK: ffff88003553cf10  CPU: 0   COMMAND: "kworker/0:14"
    [0 00:56:30.116] [UN]  PID: 45132  TASK: ffff880035539fa0  CPU: 0   COMMAND: "kworker/0:13"
    [0 00:56:30.116] [UN]  PID: 45131  TASK: ffff88003553eeb0  CPU: 0   COMMAND: "kworker/0:12"
    [0 00:56:30.116] [UN]  PID: 45130  TASK: ffff8802d174cf10  CPU: 0   COMMAND: "kworker/0:11"
    [0 00:56:30.116] [UN]  PID: 45128  TASK: ffff8802d1748000  CPU: 0   COMMAND: "kworker/0:9"
    [0 00:56:30.116] [UN]  PID: 45126  TASK: ffff8802d174af70  CPU: 0   COMMAND: "kworker/0:7"
    [0 00:56:30.116] [UN]  PID: 45127  TASK: ffff8802d174bf40  CPU: 0   COMMAND: "kworker/0:8"
    [0 00:56:30.116] [UN]  PID: 45124  TASK: ffff8803777a0000  CPU: 0   COMMAND: "kworker/0:5"
    [0 00:56:30.116] [UN]  PID: 45125  TASK: ffff8803777a4f10  CPU: 0   COMMAND: "kworker/0:6"
    [0 00:56:30.116] [UN]  PID: 44608  TASK: ffff8803777a0fd0  CPU: 0   COMMAND: "kworker/0:2"
    [0 00:56:30.116] [UN]  PID: 45123  TASK: ffff8803777a2f70  CPU: 0   COMMAND: "kworker/0:1"
    [0 00:56:30.117] [UN]  PID: 43588  TASK: ffff8802baa0eeb0  CPU: 0   COMMAND: "kworker/0:4"
    crash> 
    
    crash> ps -m|grep UN|wc -l
    286
    crash> 
    
  • Most of these tasks were stuck waiting for a mutex lock:

    crash> bt 43588
    PID: 43588  TASK: ffff8802baa0eeb0  CPU: 0   COMMAND: "kworker/0:4"
     #0 [ffff88030f2fbcf0] __schedule at ffffffff816a8f65
     #1 [ffff88030f2fbd58] schedule_preempt_disabled at ffffffff816aa409
     #2 [ffff88030f2fbd68] __mutex_lock_slowpath at ffffffff816a8337
     #3 [ffff88030f2fbdc8] mutex_lock at ffffffff816a774f
     #4 [ffff88030f2fbde0] scsi_remove_device at ffffffff81473fde
     #5 [ffff88030f2fbe00] storvsc_remove_lun at ffffffffc0172340 [hv_storvsc]
     #6 [ffff88030f2fbe20] process_one_work at ffffffff810a881a
     #7 [ffff88030f2fbe68] worker_thread at ffffffff810a94e6
     #8 [ffff88030f2fbec8] kthread at ffffffff810b098f
     #9 [ffff88030f2fbf50] ret_from_fork at ffffffff816b4f58
    crash> 
    
  • It was the following mutex lock for which above kworker threads were waiting:

    crash> p ((struct Scsi_Host *) 0xffff88003666b800)->scan_mutex
    $1 = {
      count = {
        counter = 0xffffff1b
      }, 
      wait_lock = {
        {
          rlock = {
        raw_lock = {
          val = {
            counter = 0x0
          }
        }
          }
        }
      }, 
      wait_list = {
        next = 0xffff88030f2fbd70, 
        prev = 0xffff8800a1223d70
      }, 
      owner = 0xffff880373f78000, 
      {
        osq = {
          tail = {
        counter = 0x0
          }
        }, 
        __UNIQUE_ID_rh_kabi_hide2 = {
          spin_mlock = 0x0
        }, 
        {<No data fields>}
      }
    }
    crash> 
    
  • This mutex lock was held open by following process:

    crash> bt 0xffff880373f78000
    PID: 44764  TASK: ffff880373f78000  CPU: 0   COMMAND: "kworker/0:3"
     #0 [ffff8802cfaffc18] __schedule at ffffffff816a8f65
     #1 [ffff8802cfaffc80] schedule at ffffffff816a94e9
     #2 [ffff8802cfaffc90] async_synchronize_cookie_domain at ffffffff810b7e85
     #3 [ffff8802cfaffce8] async_synchronize_full_domain at ffffffff810b7f68
     #4 [ffff8802cfaffcf8] sd_remove at ffffffffc0180342 [sd_mod]
     #5 [ffff8802cfaffd28] __device_release_driver at ffffffff8143f84f
     #6 [ffff8802cfaffd48] device_release_driver at ffffffff8143f8e3
     #7 [ffff8802cfaffd68] bus_remove_device at ffffffff8143efa8
     #8 [ffff8802cfaffd90] device_del at ffffffff8143b075
     #9 [ffff8802cfaffdc0] __scsi_remove_device at ffffffff81473fb3
    #10 [ffff8802cfaffde0] scsi_remove_device at ffffffff81473fe6
    #11 [ffff8802cfaffe00] storvsc_remove_lun at ffffffffc0172340 [hv_storvsc]
    #12 [ffff8802cfaffe20] process_one_work at ffffffff810a881a
    #13 [ffff8802cfaffe68] worker_thread at ffffffff810a94e6
    #14 [ffff8802cfaffec8] kthread at ffffffff810b098f
    #15 [ffff8802cfafff50] ret_from_fork at ffffffff816b4f58
    crash> 
    
  • blacklist flag in scsi_device is 0x20000
    sdev_bflags = 0x20000

    20 #define BLIST_MS_SKIP_PAGE_08 0x2000 /* do not send ms page 0x08 */

  • The backtrace of above process shows that hv_storvsc driver had called storvsc_remove_lun routine to remove SCSI device /dev/sdc connected through paravirtualized SCSI adapter scsi4. But this task was also stuck in UN state and many processes remained stuck waiting for the mutex lock:

    crash> ps -m|grep "PID: 44764"
    [0 00:56:30.110] [UN]  PID: 44764  TASK: ffff880373f78000  CPU: 0   COMMAND: "kworker/0:3"
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments