During snapshot/backup operation, RHEL 7.4 Guest VM in MS Hyper-V gets crashed with Out of Memory errors.
Environment
- Red Hat Enterprise Linux 7.4 Guest in MS Hyper-V
- Issue may appear during backup with "veeam" tool.
kernel-3.10.0-693.2.2.el7
Issue
-
During snapshot/backup operation, RHEL 7.4 Guest VM in MS Hyper-V gets crashed with Out of Memory errors:
journal: Hyper-V VSS: VSS: op=FREEZE: succeeded journal: Hyper-V VSS: VSS: op=THAW: succeeded [...] sd 4:0:1:1: [sdd] Sector size 0 reported, assuming 512. sd 3:0:1:0: [storvsc] Sense Key : Unit Attention [current] sd 3:0:1:0: [storvsc] Add. Sense: Changed operating definition sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa INFO: task kworker/u128:2:268 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u128:2 D ffff88003666b868 0 268 2 0x00000000 Workqueue: events_unbound async_run_entry_fn ffff880402663cc0 0000000000000046 ffff880403bf0000 ffff880402663fd8 ffff880402663fd8 ffff880402663fd8 ffff880403bf0000 ffff88003666b860 ffff88003666b864 ffff880403bf0000 00000000ffffffff ffff88003666b868 Call Trace: [<ffffffff816aa409>] schedule_preempt_disabled+0x29/0x70 [<ffffffff816a8337>] __mutex_lock_slowpath+0xc7/0x1d0 [<ffffffff810ce8ad>] ? check_preempt_wakeup+0x11d/0x250 [<ffffffff816a774f>] mutex_lock+0x1f/0x2f [<ffffffff81471d2e>] scsi_scan_host_selected+0x7e/0x1c0 <--- [<ffffffff810c1309>] ? ttwu_do_wakeup+0x19/0xd0 [<ffffffff81471f01>] do_scsi_scan_host+0x91/0xa0 [<ffffffff814720fc>] do_scan_async+0x1c/0x160 [<ffffffff810b7cc9>] async_run_entry_fn+0x39/0x120 [<ffffffff810a881a>] process_one_work+0x17a/0x440 [<ffffffff810a94e6>] worker_thread+0x126/0x3c0 [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff810b098f>] kthread+0xcf/0xe0 [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [<ffffffff816b4f58>] ret_from_fork+0x58/0x90 [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [...]
Resolution
Red Hat Enterprise Linux (RHEL) 7.5 or later
Upgrade to 3.10.0-862.el7.x86_64
from Errata RHSA-2018:1062 or later
Red Hat Enterprise Linux (RHEL) 7.4.z
Upgrade to kernel-3.10.0-693.21.1.el7
from Errata RHSA-2018:0395
Root Cause
-
There is a bug with
hv_storvsc
modules shipped with RHEL 7.4kernel-3.10.0-693.2.2.el7
due to which driver can schedule large amounts ofstorvsc_remove_lun
work items to the lun during snapshot operation. If there has been heavy IO to the MS virtual disk for which a snapshot is being taken, then there are chances that these IO requests may fail. A remove lun work item is issued per failed IO. If the outstanding Test Unit Ready SCSI commands have not been completed in a timely manner thescan_mutex
for SCSI host is never released or released too late. Consequently the many remove lun work items are not completed asscsi_remove_device
also tries to takehost->scan_mutex
. This long delays in gettinghost->scan_mutex
lock results in worker threads stuck in dead-lock, and out of memory situation/crash. -
Below commits will resolve issue
commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f
Author: Stephen Hemminger <stephen@networkplumber.org>
Date: Tue Mar 7 09:15:53 2017 -0800
Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
version 0 (UNKNOWN) but is still capable of supporting REPORTLUN.
Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
successfully with virtual DVD ROM device. What happens is that the SCSI
scan process falls back to doing sequential probing by INQUIRY. But the
storvsc driver has a previous workaround that masks/blocks all errors
reports from INQUIRY (or MODE_SENSE) commands. This workaround causes
the scan to then populate a full set of bogus LUN's on the target and
then sends kernel spinning off into a death spiral doing block reads on
the non-existent LUNs.
By setting the correct blacklist flags, the target with the DVD device
is scanned with REPORTLUN and that works correctly.
Patch needs to go in current 4.11, it is safe but not necessary in older
kernels.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Cathy Avery <cavery@redhat.com>
Diagnostic Steps
-
Just before a crash with out of memory condition, following messages were logged which indicates that configuration parameters associated with virtual disks provisioned to above VM were changed. These changes in configuration of virtual disks seems to have caused during the snapshot operation:
journal: Hyper-V VSS: VSS: op=FREEZE: succeeded journal: Hyper-V VSS: VSS: op=THAW: succeeded [...] sd 2:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
-
Each of above error messages are accompanied by following
Unit Attention
sense key returned from target side (i.e. in this case virtual disk):sd 2:0:0:0: [storvsc] Sense Key : Unit Attention [current] sd 2:0:0:0: [storvsc] Add. Sense: Changed operating definition sd 2:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
-
Above
Unit Attention
sense key is typically returned from target side to infom the Linux systems that there has been some changes/reconfiguration occurred with disk devices. -
Just after the above messages in logs, we see that
scsi_prep_async_scan
was called to rescan the paravirtualized MS hyperv SCSI adapters, but the worker threads were stuck in UN state waiting for completion of scsi host scan:sd 4:0:1:1: [sdd] Sector size 0 reported, assuming 512. sd 3:0:1:0: [storvsc] Sense Key : Unit Attention [current] sd 3:0:1:0: [storvsc] Add. Sense: Changed operating definition sd 3:0:1:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa INFO: task kworker/u128:2:268 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u128:2 D ffff88003666b868 0 268 2 0x00000000 Workqueue: events_unbound async_run_entry_fn ffff880402663cc0 0000000000000046 ffff880403bf0000 ffff880402663fd8 ffff880402663fd8 ffff880402663fd8 ffff880403bf0000 ffff88003666b860 ffff88003666b864 ffff880403bf0000 00000000ffffffff ffff88003666b868 Call Trace: [<ffffffff816aa409>] schedule_preempt_disabled+0x29/0x70 [<ffffffff816a8337>] __mutex_lock_slowpath+0xc7/0x1d0 [<ffffffff810ce8ad>] ? check_preempt_wakeup+0x11d/0x250 [<ffffffff816a774f>] mutex_lock+0x1f/0x2f [<ffffffff81471d2e>] scsi_scan_host_selected+0x7e/0x1c0 <--- [<ffffffff810c1309>] ? ttwu_do_wakeup+0x19/0xd0 [<ffffffff81471f01>] do_scsi_scan_host+0x91/0xa0 [<ffffffff814720fc>] do_scan_async+0x1c/0x160 [<ffffffff810b7cc9>] async_run_entry_fn+0x39/0x120 [<ffffffff810a881a>] process_one_work+0x17a/0x440 [<ffffffff810a94e6>] worker_thread+0x126/0x3c0 [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff810b098f>] kthread+0xcf/0xe0 [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [<ffffffff816b4f58>] ret_from_fork+0x58/0x90 [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [...]
-
The vmcore shows that there were large number of
kworker
threads stuck inUN
(uninterruptible) state waiting for compltion of scan:crash> ps -m|grep UN [...] [0 00:56:30.116] [UN] PID: 45133 TASK: ffff88003553cf10 CPU: 0 COMMAND: "kworker/0:14" [0 00:56:30.116] [UN] PID: 45132 TASK: ffff880035539fa0 CPU: 0 COMMAND: "kworker/0:13" [0 00:56:30.116] [UN] PID: 45131 TASK: ffff88003553eeb0 CPU: 0 COMMAND: "kworker/0:12" [0 00:56:30.116] [UN] PID: 45130 TASK: ffff8802d174cf10 CPU: 0 COMMAND: "kworker/0:11" [0 00:56:30.116] [UN] PID: 45128 TASK: ffff8802d1748000 CPU: 0 COMMAND: "kworker/0:9" [0 00:56:30.116] [UN] PID: 45126 TASK: ffff8802d174af70 CPU: 0 COMMAND: "kworker/0:7" [0 00:56:30.116] [UN] PID: 45127 TASK: ffff8802d174bf40 CPU: 0 COMMAND: "kworker/0:8" [0 00:56:30.116] [UN] PID: 45124 TASK: ffff8803777a0000 CPU: 0 COMMAND: "kworker/0:5" [0 00:56:30.116] [UN] PID: 45125 TASK: ffff8803777a4f10 CPU: 0 COMMAND: "kworker/0:6" [0 00:56:30.116] [UN] PID: 44608 TASK: ffff8803777a0fd0 CPU: 0 COMMAND: "kworker/0:2" [0 00:56:30.116] [UN] PID: 45123 TASK: ffff8803777a2f70 CPU: 0 COMMAND: "kworker/0:1" [0 00:56:30.117] [UN] PID: 43588 TASK: ffff8802baa0eeb0 CPU: 0 COMMAND: "kworker/0:4" crash> crash> ps -m|grep UN|wc -l 286 crash>
-
Most of these tasks were stuck waiting for a
mutex
lock:crash> bt 43588 PID: 43588 TASK: ffff8802baa0eeb0 CPU: 0 COMMAND: "kworker/0:4" #0 [ffff88030f2fbcf0] __schedule at ffffffff816a8f65 #1 [ffff88030f2fbd58] schedule_preempt_disabled at ffffffff816aa409 #2 [ffff88030f2fbd68] __mutex_lock_slowpath at ffffffff816a8337 #3 [ffff88030f2fbdc8] mutex_lock at ffffffff816a774f #4 [ffff88030f2fbde0] scsi_remove_device at ffffffff81473fde #5 [ffff88030f2fbe00] storvsc_remove_lun at ffffffffc0172340 [hv_storvsc] #6 [ffff88030f2fbe20] process_one_work at ffffffff810a881a #7 [ffff88030f2fbe68] worker_thread at ffffffff810a94e6 #8 [ffff88030f2fbec8] kthread at ffffffff810b098f #9 [ffff88030f2fbf50] ret_from_fork at ffffffff816b4f58 crash>
-
It was the following
mutex
lock for which abovekworker
threads were waiting:crash> p ((struct Scsi_Host *) 0xffff88003666b800)->scan_mutex $1 = { count = { counter = 0xffffff1b }, wait_lock = { { rlock = { raw_lock = { val = { counter = 0x0 } } } } }, wait_list = { next = 0xffff88030f2fbd70, prev = 0xffff8800a1223d70 }, owner = 0xffff880373f78000, { osq = { tail = { counter = 0x0 } }, __UNIQUE_ID_rh_kabi_hide2 = { spin_mlock = 0x0 }, {<No data fields>} } } crash>
-
This
mutex
lock was held open by following process:crash> bt 0xffff880373f78000 PID: 44764 TASK: ffff880373f78000 CPU: 0 COMMAND: "kworker/0:3" #0 [ffff8802cfaffc18] __schedule at ffffffff816a8f65 #1 [ffff8802cfaffc80] schedule at ffffffff816a94e9 #2 [ffff8802cfaffc90] async_synchronize_cookie_domain at ffffffff810b7e85 #3 [ffff8802cfaffce8] async_synchronize_full_domain at ffffffff810b7f68 #4 [ffff8802cfaffcf8] sd_remove at ffffffffc0180342 [sd_mod] #5 [ffff8802cfaffd28] __device_release_driver at ffffffff8143f84f #6 [ffff8802cfaffd48] device_release_driver at ffffffff8143f8e3 #7 [ffff8802cfaffd68] bus_remove_device at ffffffff8143efa8 #8 [ffff8802cfaffd90] device_del at ffffffff8143b075 #9 [ffff8802cfaffdc0] __scsi_remove_device at ffffffff81473fb3 #10 [ffff8802cfaffde0] scsi_remove_device at ffffffff81473fe6 #11 [ffff8802cfaffe00] storvsc_remove_lun at ffffffffc0172340 [hv_storvsc] #12 [ffff8802cfaffe20] process_one_work at ffffffff810a881a #13 [ffff8802cfaffe68] worker_thread at ffffffff810a94e6 #14 [ffff8802cfaffec8] kthread at ffffffff810b098f #15 [ffff8802cfafff50] ret_from_fork at ffffffff816b4f58 crash>
-
blacklist flag in scsi_device is 0x20000
sdev_bflags = 0x2000020 #define BLIST_MS_SKIP_PAGE_08 0x2000 /* do not send ms page 0x08 */
-
The backtrace of above process shows that
hv_storvsc
driver had calledstorvsc_remove_lun
routine to remove SCSI device/dev/sdc
connected through paravirtualized SCSI adapter scsi4. But this task was also stuck in UN state and many processes remained stuck waiting for themutex
lock:crash> ps -m|grep "PID: 44764" [0 00:56:30.110] [UN] PID: 44764 TASK: ffff880373f78000 CPU: 0 COMMAND: "kworker/0:3"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments