qemu-kvm process aborts with a core dump under heavy virtual disk I/O load
Environment
The problem has been reported to occur in the following Red Hat Enterprise Linux 7 based environments.
-
Red Hat Enterprise Linux OpenStack Platform with qemu-kvm-rhev-1.5.3-60.el7_0.10
-
Red Hat Enterprise Linux with qemu-kvm-1.5.3-60.el7_0.7
Issue
-
A qemu-kvm process aborts with a core dump.
-
The problem typically occurs under heavy virtual disk I/O load.
Resolution
This issue is tracked by Red Hat private Bugzilla 1142857. Please contact your Red Hat Support representative if more information is required.
Root Cause
A qemu-kvm process provides the execution context for a KVM virtual machine. One of qemu-kvm's various tasks is to emulate virtual disk I/O. The problem occurs while a disk I/O request is being completed. The completion of an I/O request involves two categories of threads in qemu-kvm that can execute concurrently on different CPUs.
-
Worker threads that perform disk I/O on behalf of the guest.
-
The main thread which notifies the guest when a worker thread has completed a disk I/O.
Due to a flaw in the worker_thread()
function of qemu-kvm, a race is possible where the main thread fetches an invalid completion status from the disk I/O request.
worker thread on CPU A | main thread on CPU B
--------------------------------------|-----------------------------------------
set the state of the request to DONE |
|
| find the request in the state DONE
| fetch completion status from the request
|
set completion status in the request |
As shown in the diagram above, it is possible that the main thread accesses the completion status in the request before the worker thread actually sets the status. Hence, the fetched status is invalid. Subsequently, a plausibility check in the bdrv_error_action()
function of qemu-kvm detects the invalid status and thus aborts the process.
Diagnostic Steps
- The stack trace of the qemu-kvm main thread looks similar to the following example.
(gdb) bt
#0 0x00007fa719ffb5c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fa719ffccd8 in __GI_abort () at abort.c:90
#2 0x00007fa719ff4536 in __assert_fail_base (fmt=0x7fa71a144008 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7fa71f649be9 "error >= 0", file=file@entry=0x7fa71f6497eb "block.c", line=line@entry=3322,
function=function@entry=0x7fa71f64a730 <__PRETTY_FUNCTION__.27538> "bdrv_error_action") at assert.c:92
#3 0x00007fa719ff45e2 in __GI___assert_fail (assertion=assertion@entry=0x7fa71f649be9 "error >= 0", file=file@entry=0x7fa71f6497eb "block.c",
line=line@entry=3322, function=function@entry=0x7fa71f64a730 <__PRETTY_FUNCTION__.27538> "bdrv_error_action") at assert.c:101
#4 0x00007fa71f402d57 in bdrv_error_action (bs=<optimized out>, action=action@entry=BDRV_ACTION_REPORT, is_read=is_read@entry=true,
error=error@entry=-1886218542) at block.c:3322
#5 0x00007fa71f551979 in virtio_blk_handle_rw_error (req=req@entry=0x7fa72399d930, error=-1886218542, is_read=true)
at /usr/src/debug/qemu-1.5.3/hw/block/virtio-blk.c:69
#6 0x00007fa71f551a4f in virtio_blk_rw_complete (opaque=0x7fa72399d930, ret=<optimized out>)
at /usr/src/debug/qemu-1.5.3/hw/block/virtio-blk.c:81
#7 0x00007fa71f3fbbde in bdrv_co_em_bh (opaque=0x7fa7269f3830) at block.c:4469
#8 0x00007fa71f3ee797 in aio_bh_poll (ctx=ctx@entry=0x7fa7205ec400) at async.c:81
#9 0x00007fa71f3ee3e8 in aio_poll (ctx=ctx@entry=0x7fa7205ec400, blocking=blocking@entry=false) at aio-posix.c:185
#10 0x00007fa71f3ee6a0 in aio_ctx_dispatch (source=source@entry=0x7fa7205ec400, callback=<optimized out>, user_data=<optimized out>)
at async.c:194
#11 0x00007fa71e829ac6 in g_main_dispatch (context=0x7fa7205ece00) at gmain.c:3058
#12 g_main_context_dispatch (context=context@entry=0x7fa7205ece00) at gmain.c:3634
#13 0x00007fa71f4c8dda in glib_pollfds_poll () at main-loop.c:187
#14 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232
#15 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464
#16 0x00007fa71f3ea110 in main_loop () at vl.c:1988
#17 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4359
- The
bdrv_error_action()
function (here: at stack frame #4) expectserror
to be a positive integer value. However, it is actually a huge negative integer value (here: -1886218542).
(gdb) up 4
#4 0x00007fa71f402d57 in bdrv_error_action (bs=<optimized out>, action=action@entry=BDRV_ACTION_REPORT, is_read=is_read@entry=true,
error=error@entry=-1886218542) at block.c:3322
3322 assert(error >= 0);
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments