INFO: task docker blocked for more than 120 seconds.

Solution In Progress - Updated -

Issue

  • Docker daemon is stucked in one of the openshift nodes, so the Openshift masters see the node as "not ready" and deploys are failing.

  • There are a few messages in dmesg speaking about this getting stucked:

[ 4082.854242] INFO: task docker:111571 blocked for more than 120 seconds.
[ 4082.855441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4082.856124] docker          D 0000000000000000     0 111571      1 0x00000080
[ 4082.856127]  ffff881c01527ab0 0000000000000086 ffff881c332f5080 ffff881c01527fd8
[ 4082.856130]  ffff881c01527fd8 ffff881c01527fd8 ffff881c332f5080 ffff881c01527bf0
[ 4082.856132]  ffff881c01527bf8 7fffffffffffffff ffff881c332f5080 0000000000000000
[ 4082.856135] Call Trace:
[ 4082.856142]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 4082.856144]  [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[ 4082.856149]  [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
[ 4082.856151]  [<ffffffff8163acd6>] wait_for_completion+0x116/0x170
[ 4082.856156]  [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
[ 4082.856159]  [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
[ 4082.856166]  [<ffffffff810ab190>] ? call_srcu+0x70/0x70
[ 4082.856171]  [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40
[ 4082.856173]  [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
[ 4082.856191]  [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod]
[ 4082.856197]  [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod]
[ 4082.856202]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 4082.856207]  [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod]
[ 4082.856211]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 4082.856215]  [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
[ 4082.856220]  [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 4082.856224]  [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0
[ 4082.856227]  [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
[ 4082.856229]  [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0
[ 4082.856232]  [<ffffffff816408d9>] ? do_async_page_fault+0x29/0xe0
[ 4082.856235]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b

Following this guide (https://access.redhat.com/solutions/31453) I've tried to reproduced by stopping the docker service (after being unschedule the node) with "systemctl stop docker" and the prompt was stucked but using other ssh connection I was able to collect the required files in that guide. Also journal logs for docker service:

dic 01 16:40:16 hostname.example.com systemd[1]: Stopping Docker Application Container Engine...
dic 01 16:40:16 hostname.example.com docker[38182]: time="2015-12-01T16:40:16.387328403+01:00" level=info msg="Processing signal 'terminated'"
dic 01 16:41:46 hostname.example.com systemd[1]: docker.service stop-final-sigterm timed out. Killing.
dic 01 16:43:16 hostname.example.com systemd[1]: docker.service still around after final SIGKILL. Entering failed mode.
dic 01 16:43:16 hostname.example.com systemd[1]: Stopped Docker Application Container Engine.
dic 01 16:43:16 hostname.example.com systemd[1]: Unit docker.service entered failed state.
dic 01 16:43:16 hostname.example.com systemd[1]: docker.service failed.

Environment

  • Openshift 3.1

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In