md_raid5 process using 100% CPU and IO gets hung if a drive is removed during initialization of raid5 array
Issue
-
In RAID5 array, if a disk is removed during initialization and the same time if IO is happening to that mdraid device, then IO is getting struck, and md_raid5 thread is using 100 % of CPU. Also the md state showing as
resync=PENDING.After marking a disk as failed, IO got hung and observed following errors on the console:
kernel: md/raid:md0: Disk failure on vdd1, disabling device. kernel: md/raid:md0: Operation continuing on 3 devices. udevd[428]: worker [1793] unexpectedly returned with status 0x0100 udevd[428]: worker [1793] failed while handling '/devices/virtual/block/md0' kernel: INFO: task md0_resync:2226 blocked for more than 120 seconds. kernel: Not tainted 2.6.32-504.el6.x86_64 #1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: md0_resync D 0000000000000000 0 2226 2 0x00000080 kernel: ffff88003af17b50 0000000000000046 0000000000000001 0000000000000082 kernel: ffff88003af17ae0 ffffffff8105bd23 ffff88003af17b00 ffff88003e0066c0 kernel: ffff8800372216d0 ffff88003e7f51b8 ffff88003aa725f8 ffff88003af17fd8 kernel: Call Trace: kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70 kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456] kernel: [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80 kernel: [<ffffffffa02e6d45>] get_active_stripe+0x2d5/0x8c0 [raid456] kernel: [<ffffffff8140e800>] ? md_wakeup_thread+0x0/0x70 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 kernel: [<ffffffffa02eb87c>] sync_request+0x38c/0x3a0 [raid456] kernel: [<ffffffff81414e47>] md_do_sync+0x6c7/0xd20 kernel: [<ffffffff814158f5>] md_thread+0x115/0x150 kernel: [<ffffffff814157e0>] ? md_thread+0x0/0x150 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 kernel: INFO: task fio:2234 blocked for more than 120 seconds. kernel: Not tainted 2.6.32-504.el6.x86_64 #1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: fio D 0000000000000000 0 2234 1842 0x00000080 kernel: ffff88003af25668 0000000000000086 0000000000000001 0000000000000082 kernel: ffff88003af255f8 ffffffff8105bd23 ffff88003af25618 ffff88003e0066c0 kernel: ffff8800372216d0 ffff88003e7f51b8 ffff880037b54638 ffff88003af25fd8 kernel: Call Trace: kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70 kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456] [...]top- output show, md_raid5 using 100% cputop - 17:55:06 up 1:09, 3 users, load average: 11.98, 8.53, 3.99 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2690 root 20 0 0 0 0 R 100.0 0.0 6:44.41 md0_raid5
Environment
- Red Hat Enterprise Linux 6.5
- kernel-2.6.32-431.el6.x86_64
- mdraid
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
