'kworker' process hangs for more than 120 seconds while trying to remove a device from the lost target.

Solution Verified - Updated -

Issue

  • While doing firmware upgrade activity on the storage, storage controller ports were rebooted one by one. Devices started losing the paths and during this process, kworker process hang. kworker process hangs for more than 120 seconds while trying to remove a device from the lost target.

    Mar  1 00:49:35 hostname kernel: sd 2:0:5:255: alua: Detached
    Mar  1 00:49:35 hostname kernel: sd 2:0:5:256: alua: Detached
    Mar  1 00:49:40 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:242 --  1 2002.
    Mar  1 00:49:40 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:201 --  1 2002.
    Mar  1 00:49:40 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:211 --  1 2002.
    Mar  1 00:49:41 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:228 --  1 2002.
    Mar  1 00:49:41 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:3:205 --  1 2002.
    Mar  1 00:49:41 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:3:206 --  1 2002.
    Mar  1 00:49:42 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:252 --  1 2002.
    Mar  1 00:49:42 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:200 --  1 2002.
    Mar  1 00:49:43 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:253 --  1 2002.
    Mar  1 00:49:43 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:212 --  1 2002.
    Mar  1 00:49:43 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:241 --  1 2002.
    Mar  1 00:49:44 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:218 --  1 2002.
    Mar  1 00:49:44 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:245 --  1 2002.
    Mar  1 00:49:44 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:250 --  1 2002.
    Mar  1 00:49:45 hostname kernel: qla2xxx [0000:05:00.0]-801c:1: Abort command issued nexus=1:5:225 --  1 2002.
    Mar  1 00:49:45 hostname kernel: qla2xxx [0000:05:00.1]-801c:2: Abort command issued nexus=2:6:251 --  1 2002.
    Mar  1 00:49:45 hostname multipathd: 3600507640081801ec0000000000012fc: load table [0 67108864 multipath 0 0 2 1 service-time 0 4 1 129:432 1 129:720 1 67:688 1 66:912 1 service-time 0 3 1 65:800 1 128:912 1 131:560 1]
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 65:800.
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 128:912.
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 131:560.
    Mar  1 00:49:45 hostname multipathd: sdaip [129:976]: path removed from map 3600507640081801ec0000000000012fc
    Mar  1 00:49:45 hostname multipathd: sdair: remove path (uevent)
    Mar  1 00:49:45 hostname multipathd: 3600507640081801ec0000000000012fd: load table [0 67108864 multipath 0 0 2 2 service-time 0 3 1 65:832 1 128:960 1 131:608 1 service-time 0 4 1 129:464 1 129:736 1 67:720 1 66:944 1]
    Mar  1 00:49:45 hostname multipathd: sdair [129:1008]: path removed from map 3600507640081801ec0000000000012fd
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 65:832.
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 128:960.
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 131:608.
    Mar  1 00:49:45 hostname multipathd: sdait: remove path (uevent)
    Mar  1 00:49:45 hostname multipathd: 3600507640081801ec0000000000012d0: load table [0 2147483648 multipath 0 0 2 2 service-time 0 3 1 65:864 1 128:992 1 131:640 1 service-time 0 4 1 129:480 1 130:528 1 67:752 1 66:976 1]
    Mar  1 00:49:45 hostname multipathd: sdait [130:784]: path removed from map 3600507640081801ec0000000000012d0
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 65:864.
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 128:992.
    Mar  1 00:49:45 hostname kernel: device-mapper: multipath: Failing path 131:640.
    Mar  1 00:49:45 hostname multipathd: sdaiv: remove path (uevent)
    [.... ]
    
    
    Mar  1 00:49:47 hostname multipathd: sdadv [66:784]: path removed from map 3600507640081801ec000000000001336
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:784.
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:896.
    Mar  1 00:49:47 hostname multipathd: sdadx: remove path (uevent)
    Mar  1 00:49:47 hostname multipathd: 3600507640081801ec000000000001390: load table [0 2147483648 multipath 0 0 2 1 service-time 0 4 1 8:704 1 8:944 1 130:720 1 129:880 1 service-time 0 2 1 128:816 1 135:928 1]
    Mar  1 00:49:47 hostname multipathd: sdadx [66:816]: path removed from map 3600507640081801ec000000000001390
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:816.
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:928.
    Mar  1 00:49:47 hostname multipathd: sdadz: remove path (uevent)
    Mar  1 00:49:47 hostname multipathd: 3600507640081801ec000000000001391: load table [0 2147483648 multipath 0 0 2 1 service-time 0 4 1 8:736 1 8:976 1 130:752 1 129:912 1 service-time 0 2 1 128:848 1 135:960 1]
    Mar  1 00:49:47 hostname multipathd: sdadz [66:848]: path removed from map 3600507640081801ec000000000001391
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:848.
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:960.
    Mar  1 00:49:47 hostname multipathd: sdaeb: remove path (uevent)
    Mar  1 00:49:47 hostname multipathd: 3600507640081801ec000000000001392: load table [0 2147483648 multipath 0 0 2 2 service-time 0 2 1 128:880 1 135:992 1 service-time 0 4 1 65:512 1 65:768 1 131:528 1 129:944 1]
    Mar  1 00:49:47 hostname multipathd: sdaeb [66:880]: path removed from map 3600507640081801ec000000000001392
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:880.
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:992.
    Mar  1 00:50:01 hostname systemd: Created slice user-0.slice.
    Mar  1 00:50:01 hostname systemd: Starting user-0.slice.
    Mar  1 00:50:01 hostname systemd: Started Session 789 of user root.
    Mar  1 00:50:01 hostname systemd: Starting Session 789 of user root.
    Mar  1 00:50:01 hostname systemd: Removed slice user-0.slice.
    Mar  1 00:50:01 hostname systemd: Stopping user-0.slice.
    Mar  1 00:49:47 hostname multipathd: sdaeb [66:880]: path removed from map 3600507640081801ec000000000001392
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 128:880.
    Mar  1 00:49:47 hostname kernel: device-mapper: multipath: Failing path 135:992.
    Mar  1 00:50:01 hostname systemd: Created slice user-0.slice.
    Mar  1 00:50:01 hostname systemd: Starting user-0.slice.
    Mar  1 00:50:01 hostname systemd: Started Session 789 of user root.
    Mar  1 00:50:01 hostname systemd: Starting Session 789 of user root.
    Mar  1 00:50:01 hostname systemd: Removed slice user-0.slice.
    Mar  1 00:50:01 hostname systemd: Stopping user-0.slice.
    Mar  1 00:52:07 hostname kernel: INFO: task kworker/0:2:49490 blocked for more than 120 seconds.
    Mar  1 00:52:07 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar  1 00:52:07 hostname kernel: kworker/0:2     D ffff883ef36e1f60     0 49490      2 0x00000080
    Mar  1 00:52:07 hostname kernel: Workqueue: fc_wq_1 fc_starget_delete [scsi_transport_fc]
    Mar  1 00:52:07 hostname kernel: ffff883e70ed7d10 0000000000000046 ffff883f04293ec0 ffff883e70ed7fd8
    Mar  1 00:52:07 hostname kernel: ffff883e70ed7fd8 ffff883e70ed7fd8 ffff883f04293ec0 ffff883f38f62060
    Mar  1 00:52:07 hostname kernel: ffff883f38f62064 ffff883f04293ec0 00000000ffffffff ffff883f38f62068
    Mar  1 00:52:07 hostname kernel: Call Trace:
    Mar  1 00:52:07 hostname kernel: [<ffffffff8168c889>] schedule_preempt_disabled+0x29/0x70
    Mar  1 00:52:07 hostname kernel: [<ffffffff8168a4e5>] __mutex_lock_slowpath+0xc5/0x1c0
    Mar  1 00:52:07 hostname kernel: [<ffffffff81673d6f>] ? klist_next+0x7f/0xf0
    Mar  1 00:52:07 hostname kernel: [<ffffffff8168994f>] mutex_lock+0x1f/0x2f
    Mar  1 00:52:07 hostname kernel: [<ffffffff8146064e>] scsi_remove_device+0x1e/0x40
    Mar  1 00:52:07 hostname kernel: [<ffffffff814607f0>] scsi_remove_target+0x160/0x210
    Mar  1 00:52:07 hostname kernel: [<ffffffffa0123212>] fc_starget_delete+0x22/0x30 [scsi_transport_fc]
    Mar  1 00:52:07 hostname kernel: [<ffffffff810a805b>] process_one_work+0x17b/0x470
    Mar  1 00:52:07 hostname kernel: [<ffffffff810a8e96>] worker_thread+0x126/0x410
    Mar  1 00:52:07 hostname kernel: [<ffffffff810a8d70>] ? rescuer_thread+0x460/0x460
    Mar  1 00:52:07 hostname kernel: [<ffffffff810b064f>] kthread+0xcf/0xe0
    Mar  1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
    Mar  1 00:52:07 hostname kernel: [<ffffffff81696618>] ret_from_fork+0x58/0x90
    Mar  1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
    Mar  1 00:52:07 hostname kernel: INFO: task kworker/0:0:2687 blocked for more than 120 seconds.
    Mar  1 00:52:07 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Mar  1 00:52:07 hostname kernel: kworker/0:0     D ffff883f04293ec0     0  2687      2 0x00000080
    Mar  1 00:52:07 hostname kernel: Workqueue: fc_wq_1 fc_starget_delete [scsi_transport_fc]
    Mar  1 00:52:07 hostname kernel: ffff883f3e9afd10 0000000000000046 ffff8822f3e50fb0 ffff883f3e9affd8
    Mar  1 00:52:07 hostname kernel: ffff883f3e9affd8 ffff883f3e9affd8 ffff8822f3e50fb0 ffff883f38f62060
    Mar  1 00:52:07 hostname kernel: ffff883f38f62064 ffff8822f3e50fb0 00000000ffffffff ffff883f38f62068
    Mar  1 00:52:07 hostname kernel: Call Trace:
    Mar  1 00:52:07 hostname kernel: [<ffffffff8168c889>] schedule_preempt_disabled+0x29/0x70
    Mar  1 00:52:07 hostname kernel: [<ffffffff8168a4e5>] __mutex_lock_slowpath+0xc5/0x1c0
    Mar  1 00:52:07 hostname kernel: [<ffffffff81673d6f>] ? klist_next+0x7f/0xf0
    Mar  1 00:52:07 hostname kernel: [<ffffffff8168994f>] mutex_lock+0x1f/0x2f
    Mar  1 00:52:07 hostname kernel: [<ffffffff8146064e>] scsi_remove_device+0x1e/0x40
    Mar  1 00:52:07 hostname kernel: [<ffffffff814607f0>] scsi_remove_target+0x160/0x210
    Mar  1 00:52:07 hostname kernel: [<ffffffffa0123212>] fc_starget_delete+0x22/0x30 [scsi_transport_fc]
    Mar  1 00:52:07 hostname kernel: [<ffffffff810a805b>] process_one_work+0x17b/0x470
    Mar  1 00:52:07 hostname kernel: [<ffffffff810a8e96>] worker_thread+0x126/0x410
    Mar  1 00:52:07 hostname kernel: [<ffffffff810a8d70>] ? rescuer_thread+0x460/0x460
    Mar  1 00:52:07 hostname kernel: [<ffffffff810b064f>] kthread+0xcf/0xe0
    Mar  1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
    Mar  1 00:52:07 hostname kernel: [<ffffffff81696618>] ret_from_fork+0x58/0x90
    Mar  1 00:52:07 hostname kernel: [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
    
  • Each time the issue appears we get two threads hung within kernel trying to remove devices.

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content