md_raid5 process using 100% CPU and IO gets hung if a drive is removed during initialization of raid5 array
Issue
-
In RAID5 array, if a disk is removed during initialization and the same time if IO is happening to that mdraid device, then IO is getting struck, and md_raid5 thread is using 100 % of CPU. Also the md state showing as
resync=PENDING.After marking a disk as failed, IO got hung and observed following errors on the console:
kernel: md/raid:md0: Disk failure on vdd1, disabling device. kernel: md/raid:md0: Operation continuing on 3 devices. udevd[428]: worker [1793] unexpectedly returned with status 0x0100 udevd[428]: worker [1793] failed while handling '/devices/virtual/block/md0' kernel: INFO: task md0_resync:2226 blocked for more than 120 seconds. kernel: Not tainted 2.6.32-504.el6.x86_64 #1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: md0_resync D 0000000000000000 0 2226 2 0x00000080 kernel: ffff88003af17b50 0000000000000046 0000000000000001 0000000000000082 kernel: ffff88003af17ae0 ffffffff8105bd23 ffff88003af17b00 ffff88003e0066c0 kernel: ffff8800372216d0 ffff88003e7f51b8 ffff88003aa725f8 ffff88003af17fd8 kernel: Call Trace: kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70 kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456] kernel: [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80 kernel: [<ffffffffa02e6d45>] get_active_stripe+0x2d5/0x8c0 [raid456] kernel: [<ffffffff8140e800>] ? md_wakeup_thread+0x0/0x70 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 kernel: [<ffffffffa02eb87c>] sync_request+0x38c/0x3a0 [raid456] kernel: [<ffffffff81414e47>] md_do_sync+0x6c7/0xd20 kernel: [<ffffffff814158f5>] md_thread+0x115/0x150 kernel: [<ffffffff814157e0>] ? md_thread+0x0/0x150 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 kernel: INFO: task fio:2234 blocked for more than 120 seconds. kernel: Not tainted 2.6.32-504.el6.x86_64 #1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: fio D 0000000000000000 0 2234 1842 0x00000080 kernel: ffff88003af25668 0000000000000086 0000000000000001 0000000000000082 kernel: ffff88003af255f8 ffffffff8105bd23 ffff88003af25618 ffff88003e0066c0 kernel: ffff8800372216d0 ffff88003e7f51b8 ffff880037b54638 ffff88003af25fd8 kernel: Call Trace: kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70 kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456] [...]top- output show, md_raid5 using 100% cputop - 17:55:06 up 1:09, 3 users, load average: 11.98, 8.53, 3.99 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2690 root 20 0 0 0 0 R 100.0 0.0 6:44.41 md0_raid5
Environment
- Red Hat Enterprise Linux 6.5
- kernel-2.6.32-431.el6.x86_64
- mdraid
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.