'kworker' process hangs for more than 120 seconds while trying to remove a device from the lost target.
Issue
-
While doing firmware upgrade activity on the storage, storage controller ports were rebooted one by one. Devices started losing the paths and during this process,
kworker
process hang.kworker
process hangs for more than 120 seconds while trying to remove a device from the lost target.Mar 1 00:49:35 hostname kernel: sd 2:0:5:255: alua: Detached Mar 1 00:49:35 hostname kernel: sd 2:0:5:256: alua: Detached Mar 1 00:49:40 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:242 -- 1 2002. Mar 1 00:49:40 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:201 -- 1 2002. Mar 1 00:49:40 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:211 -- 1 2002. Mar 1 00:49:41 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:228 -- 1 2002. Mar 1 00:49:41 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:3:205 -- 1 2002. Mar 1 00:49:41 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:3:206 -- 1 2002. Mar 1 00:49:42 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:252 -- 1 2002. Mar 1 00:49:42 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:200 -- 1 2002. Mar 1 00:49:43 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:253 -- 1 2002. Mar 1 00:49:43 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:212 -- 1 2002. Mar 1 00:49:43 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:241 -- 1 2002. Mar 1 00:49:44 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:218 -- 1 2002. Mar 1 00:49:44 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:245 -- 1 2002. Mar 1 00:49:44 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:250 -- 1 2002. Mar 1 00:49:45 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:225 -- 1 2002. Mar 1 00:49:45 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:251 -- 1 2002. Mar 1 00:49:45 hostname multipathd: 3600507640081801ec0000000000012fc: load table [0 67108864 multipath 0 0 2 1 service-time 0 4 1 129:432 1 129:720 1 67:688 1 66:912 1 service-time 0 3 1 65:800 1 128:912 1 131:560 1] Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 65:800. Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 128:912. Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 131:560. Mar 1 00:49:45 hostname multipathd: sdaip [129:976]: path removed from map 3600507640081801ec0000000000012fc Mar 1 00:49:45 hostname multipathd: sdair: remove path (uevent) Mar 1 00:49:45 hostname multipathd: 3600507640081801ec0000000000012fd: load table [0 67108864 multipath 0 0 2 2 service-time 0 3 1 65:832 1 128:960 1 131:608 1 service-time 0 4 1 129:464 1 129:736 1 67:720 1 66:944 1] Mar 1 00:49:45 hostname multipathd: sdair [129:1008]: path removed from map 3600507640081801ec0000000000012fd Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 65:832. Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 128:960. Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 131:608. Mar 1 00:49:45 hostname multipathd: sdait: remove path (uevent) Mar 1 00:49:45 hostname multipathd: 3600507640081801ec0000000000012d0: load table [0 2147483648 multipath 0 0 2 2 service-time 0 3 1 65:864 1 128:992 1 131:640 1 service-time 0 4 1 129:480 1 130:528 1 67:752 1 66:976 1] Mar 1 00:49:45 hostname multipathd: sdait [130:784]: path removed from map 3600507640081801ec0000000000012d0 Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 65:864. Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 128:992. Mar 1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 131:640. Mar 1 00:49:45 hostname multipathd: sdaiv: remove path (uevent) [.... ] Mar 1 00:49:47 hostname multipathd: sdadv [66:784]: path removed from map 3600507640081801ec000000000001336 Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:784. Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:896. Mar 1 00:49:47 hostname multipathd: sdadx: remove path (uevent) Mar 1 00:49:47 hostname multipathd: 3600507640081801ec000000000001390: load table [0 2147483648 multipath 0 0 2 1 service-time 0 4 1 8:704 1 8:944 1 130:720 1 129:880 1 service-time 0 2 1 128:816 1 135:928 1] Mar 1 00:49:47 hostname multipathd: sdadx [66:816]: path removed from map 3600507640081801ec000000000001390 Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:816. Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:928. Mar 1 00:49:47 hostname multipathd: sdadz: remove path (uevent) Mar 1 00:49:47 hostname multipathd: 3600507640081801ec000000000001391: load table [0 2147483648 multipath 0 0 2 1 service-time 0 4 1 8:736 1 8:976 1 130:752 1 129:912 1 service-time 0 2 1 128:848 1 135:960 1] Mar 1 00:49:47 hostname multipathd: sdadz [66:848]: path removed from map 3600507640081801ec000000000001391 Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:848. Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:960. Mar 1 00:49:47 hostname multipathd: sdaeb: remove path (uevent) Mar 1 00:49:47 hostname multipathd: 3600507640081801ec000000000001392: load table [0 2147483648 multipath 0 0 2 2 service-time 0 2 1 128:880 1 135:992 1 service-time 0 4 1 65:512 1 65:768 1 131:528 1 129:944 1] Mar 1 00:49:47 hostname multipathd: sdaeb [66:880]: path removed from map 3600507640081801ec000000000001392 Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:880. Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:992. Mar 1 00:50:01 hostname systemd: Created slice user-0.slice. Mar 1 00:50:01 hostname systemd: Starting user-0.slice. Mar 1 00:50:01 hostname systemd: Started Session 789 of user root. Mar 1 00:50:01 hostname systemd: Starting Session 789 of user root. Mar 1 00:50:01 hostname systemd: Removed slice user-0.slice. Mar 1 00:50:01 hostname systemd: Stopping user-0.slice. Mar 1 00:49:47 hostname multipathd: sdaeb [66:880]: path removed from map 3600507640081801ec000000000001392 Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:880. Mar 1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:992. Mar 1 00:50:01 hostname systemd: Created slice user-0.slice. Mar 1 00:50:01 hostname systemd: Starting user-0.slice. Mar 1 00:50:01 hostname systemd: Started Session 789 of user root. Mar 1 00:50:01 hostname systemd: Starting Session 789 of user root. Mar 1 00:50:01 hostname systemd: Removed slice user-0.slice. Mar 1 00:50:01 hostname systemd: Stopping user-0.slice. Mar 1 00:52:07 hostname kernel: INFO: task kworker/0:2:49490 blocked for more than 120 seconds. Mar 1 00:52:07 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 1 00:52:07 hostname kernel: kworker/0:2 D ffff883ef36e1f60 0 49490 2 0x00000080 Mar 1 00:52:07 hostname kernel: Workqueue: fc_wq_1 fc_starget_delete [scsi_transport_fc] Mar 1 00:52:07 hostname kernel: ffff883e70ed7d10 0000000000000046 ffff883f04293ec0 ffff883e70ed7fd8 Mar 1 00:52:07 hostname kernel: ffff883e70ed7fd8 ffff883e70ed7fd8 ffff883f04293ec0 ffff883f38f62060 Mar 1 00:52:07 hostname kernel: ffff883f38f62064 ffff883f04293ec0 00000000ffffffff ffff883f38f62068 Mar 1 00:52:07 hostname kernel: Call Trace: Mar 1 00:52:07 hostname kernel: [<ffffffff8168c889>] schedule_preempt_disabled+0x29/0x70 Mar 1 00:52:07 hostname kernel: [<ffffffff8168a4e5>] __mutex_lock_slowpath+0xc5/0x1c0 Mar 1 00:52:07 hostname kernel: [<ffffffff81673d6f>] ? klist_next+0x7f/0xf0 Mar 1 00:52:07 hostname kernel: [<ffffffff8168994f>] mutex_lock+0x1f/0x2f Mar 1 00:52:07 hostname kernel: [<ffffffff8146064e>] scsi_remove_device+0x1e/0x40 Mar 1 00:52:07 hostname kernel: [<ffffffff814607f0>] scsi_remove_target+0x160/0x210 Mar 1 00:52:07 hostname kernel: [<ffffffffa0123212>] fc_starget_delete+0x22/0x30 [scsi_transport_fc] Mar 1 00:52:07 hostname kernel: [<ffffffff810a805b>] process_one_work+0x17b/0x470 Mar 1 00:52:07 hostname kernel: [<ffffffff810a8e96>] worker_thread+0x126/0x410 Mar 1 00:52:07 hostname kernel: [<ffffffff810a8d70>] ? rescuer_thread+0x460/0x460 Mar 1 00:52:07 hostname kernel: [<ffffffff810b064f>] kthread+0xcf/0xe0 Mar 1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 Mar 1 00:52:07 hostname kernel: [<ffffffff81696618>] ret_from_fork+0x58/0x90 Mar 1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 Mar 1 00:52:07 hostname kernel: INFO: task kworker/0:0:2687 blocked for more than 120 seconds. Mar 1 00:52:07 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 1 00:52:07 hostname kernel: kworker/0:0 D ffff883f04293ec0 0 2687 2 0x00000080 Mar 1 00:52:07 hostname kernel: Workqueue: fc_wq_1 fc_starget_delete [scsi_transport_fc] Mar 1 00:52:07 hostname kernel: ffff883f3e9afd10 0000000000000046 ffff8822f3e50fb0 ffff883f3e9affd8 Mar 1 00:52:07 hostname kernel: ffff883f3e9affd8 ffff883f3e9affd8 ffff8822f3e50fb0 ffff883f38f62060 Mar 1 00:52:07 hostname kernel: ffff883f38f62064 ffff8822f3e50fb0 00000000ffffffff ffff883f38f62068 Mar 1 00:52:07 hostname kernel: Call Trace: Mar 1 00:52:07 hostname kernel: [<ffffffff8168c889>] schedule_preempt_disabled+0x29/0x70 Mar 1 00:52:07 hostname kernel: [<ffffffff8168a4e5>] __mutex_lock_slowpath+0xc5/0x1c0 Mar 1 00:52:07 hostname kernel: [<ffffffff81673d6f>] ? klist_next+0x7f/0xf0 Mar 1 00:52:07 hostname kernel: [<ffffffff8168994f>] mutex_lock+0x1f/0x2f Mar 1 00:52:07 hostname kernel: [<ffffffff8146064e>] scsi_remove_device+0x1e/0x40 Mar 1 00:52:07 hostname kernel: [<ffffffff814607f0>] scsi_remove_target+0x160/0x210 Mar 1 00:52:07 hostname kernel: [<ffffffffa0123212>] fc_starget_delete+0x22/0x30 [scsi_transport_fc] Mar 1 00:52:07 hostname kernel: [<ffffffff810a805b>] process_one_work+0x17b/0x470 Mar 1 00:52:07 hostname kernel: [<ffffffff810a8e96>] worker_thread+0x126/0x410 Mar 1 00:52:07 hostname kernel: [<ffffffff810a8d70>] ? rescuer_thread+0x460/0x460 Mar 1 00:52:07 hostname kernel: [<ffffffff810b064f>] kthread+0xcf/0xe0 Mar 1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 Mar 1 00:52:07 hostname kernel: [<ffffffff81696618>] ret_from_fork+0x58/0x90 Mar 1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
-
Each time the issue appears we get two threads hung within kernel trying to remove devices.
Environment
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.