md_raid5 process using 100% CPU and IO gets hung if a drive is removed during initialization of raid5 array

Solution Unverified - Updated 2024-08-05T05:49:58+00:00 -

Issue

In RAID5 array, if a disk is removed during initialization and the same time if IO is happening to that mdraid device, then IO is getting struck, and md_raid5 thread is using 100 % of CPU. Also the md state showing as resync=PENDING.

After marking a disk as failed, IO got hung and observed following errors on the console:

kernel: md/raid:md0: Disk failure on vdd1, disabling device.
kernel: md/raid:md0: Operation continuing on 3 devices.
udevd[428]: worker [1793] unexpectedly returned with status 0x0100
udevd[428]: worker [1793] failed while handling '/devices/virtual/block/md0'
kernel: INFO: task md0_resync:2226 blocked for more than 120 seconds.
kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: md0_resync    D 0000000000000000     0  2226      2 0x00000080
kernel: ffff88003af17b50 0000000000000046 0000000000000001 0000000000000082
kernel: ffff88003af17ae0 ffffffff8105bd23 ffff88003af17b00 ffff88003e0066c0
kernel: ffff8800372216d0 ffff88003e7f51b8 ffff88003aa725f8 ffff88003af17fd8
kernel: Call Trace:
kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70
kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456]
kernel: [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80
kernel: [<ffffffffa02e6d45>] get_active_stripe+0x2d5/0x8c0 [raid456]
kernel: [<ffffffff8140e800>] ? md_wakeup_thread+0x0/0x70
kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffffa02eb87c>] sync_request+0x38c/0x3a0 [raid456]
kernel: [<ffffffff81414e47>] md_do_sync+0x6c7/0xd20
kernel: [<ffffffff814158f5>] md_thread+0x115/0x150
kernel: [<ffffffff814157e0>] ? md_thread+0x0/0x150
kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0
kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
kernel: INFO: task fio:2234 blocked for more than 120 seconds.
kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fio           D 0000000000000000     0  2234   1842 0x00000080
kernel: ffff88003af25668 0000000000000086 0000000000000001 0000000000000082
kernel: ffff88003af255f8 ffffffff8105bd23 ffff88003af25618 ffff88003e0066c0
kernel: ffff8800372216d0 ffff88003e7f51b8 ffff880037b54638 ffff88003af25fd8
kernel: Call Trace:
kernel: [<ffffffff8105bd23>] ? __wake_up+0x53/0x70
kernel: [<ffffffffa02e396b>] ? md_raid5_unplug_device+0x7b/0x120 [raid456]
[...]

top - output show, md_raid5 using 100% cpu

top - 17:55:06 up  1:09,  3 users,  load average: 11.98, 8.53, 3.99
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
2690 root      20   0     0    0    0 R 100.0  0.0   6:44.41 md0_raid5

Environment

Red Hat Enterprise Linux 6.5
kernel-2.6.32-431.el6.x86_64
mdraid

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

md_raid5 process using 100% CPU and IO gets hung if a drive is removed during initialization of raid5 array

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links