md_raid5 process using 100% CPU and IO gets hung if a drive is removed during initialization of raid5 array

Solution Unverified - Updated -

Issue

  • In RAID5 array, if a disk is removed during initialization and the same time if IO is happening to that mdraid device, then IO is getting struck, and md_raid5 thread is using 100 % of CPU. Also the md state showing as resync=PENDING.

    After marking a disk as failed, IO got hung and observed following errors on the console:

    kernel: md/raid:md0: Disk failure on vdd1, disabling device.
    kernel: md/raid:md0: Operation continuing on 3 devices.
    udevd[428]: worker [1793] unexpectedly returned with status 0x0100
    udevd[428]: worker [1793] failed while handling '/devices/virtual/block/md0'
    kernel: INFO: task md0_resync:2226 blocked for more than 120 seconds.
    kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
    kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kernel: md0_resync    D 0000000000000000     0  2226      2 0x00000080
    kernel: ffff88003af17b50 0000000000000046 0000000000000001 0000000000000082
    kernel: ffff88003af17ae0 ffffffff8105bd23 ffff88003af17b00 ffff88003e0066c0
    kernel: ffff8800372216d0 ffff88003e7f51b8 ffff88003aa725f8 ffff88003af17fd8
    kernel: Call Trace:
    kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70
    kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456]
    kernel: [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80
    kernel: [<ffffffffa02e6d45>] get_active_stripe+0x2d5/0x8c0 [raid456]
    kernel: [<ffffffff8140e800>] ? md_wakeup_thread+0x0/0x70
    kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
    kernel: [<ffffffffa02eb87c>] sync_request+0x38c/0x3a0 [raid456]
    kernel: [<ffffffff81414e47>] md_do_sync+0x6c7/0xd20
    kernel: [<ffffffff814158f5>] md_thread+0x115/0x150
    kernel: [<ffffffff814157e0>] ? md_thread+0x0/0x150
    kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0
    kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
    kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
    kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
    kernel: INFO: task fio:2234 blocked for more than 120 seconds.
    kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
    kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kernel: fio           D 0000000000000000     0  2234   1842 0x00000080
    kernel: ffff88003af25668 0000000000000086 0000000000000001 0000000000000082
    kernel: ffff88003af255f8 ffffffff8105bd23 ffff88003af25618 ffff88003e0066c0
    kernel: ffff8800372216d0 ffff88003e7f51b8 ffff880037b54638 ffff88003af25fd8
    kernel: Call Trace:
    kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70
    kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456]
    [...]
    

    top - output show, md_raid5 using 100% cpu

    top - 17:55:06 up  1:09,  3 users,  load average: 11.98, 8.53, 3.99
    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    2690 root      20   0     0    0    0 R 100.0  0.0   6:44.41 md0_raid5
    

Environment

  • Red Hat Enterprise Linux 6.5
  • kernel-2.6.32-431.el6.x86_64
  • mdraid

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content