INFO: task docker blocked for more than 120 seconds.

Solution In Progress - Updated -

Issue

  • Docker daemon is stucked in one of the openshift nodes, so the Openshift masters see the node as "not ready" and deploys are failing.

  • There are a few messages in dmesg speaking about this getting stucked:

[ 4082.854242] INFO: task docker:111571 blocked for more than 120 seconds.
[ 4082.855441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4082.856124] docker          D 0000000000000000     0 111571      1 0x00000080
[ 4082.856127]  ffff881c01527ab0 0000000000000086 ffff881c332f5080 ffff881c01527fd8
[ 4082.856130]  ffff881c01527fd8 ffff881c01527fd8 ffff881c332f5080 ffff881c01527bf0
[ 4082.856132]  ffff881c01527bf8 7fffffffffffffff ffff881c332f5080 0000000000000000
[ 4082.856135] Call Trace:
[ 4082.856142]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 4082.856144]  [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[ 4082.856149]  [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
[ 4082.856151]  [<ffffffff8163acd6>] wait_for_completion+0x116/0x170
[ 4082.856156]  [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
[ 4082.856159]  [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
[ 4082.856166]  [<ffffffff810ab190>] ? call_srcu+0x70/0x70
[ 4082.856171]  [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40
[ 4082.856173]  [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
[ 4082.856191]  [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod]
[ 4082.856197]  [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod]
[ 4082.856202]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 4082.856207]  [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod]
[ 4082.856211]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 4082.856215]  [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
[ 4082.856220]  [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 4082.856224]  [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0
[ 4082.856227]  [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
[ 4082.856229]  [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0
[ 4082.856232]  [<ffffffff816408d9>] ? do_async_page_fault+0x29/0xe0
[ 4082.856235]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b

Following this guide (https://access.redhat.com/solutions/31453) I've tried to reproduced by stopping the docker service (after being unschedule the node) with "systemctl stop docker" and the prompt was stucked but using other ssh connection I was able to collect the required files in that guide. Also journal logs for docker service:

dic 01 16:40:16 hostname.example.com systemd[1]: Stopping Docker Application Container Engine...
dic 01 16:40:16 hostname.example.com docker[38182]: time="2015-12-01T16:40:16.387328403+01:00" level=info msg="Processing signal 'terminated'"
dic 01 16:41:46 hostname.example.com systemd[1]: docker.service stop-final-sigterm timed out. Killing.
dic 01 16:43:16 hostname.example.com systemd[1]: docker.service still around after final SIGKILL. Entering failed mode.
dic 01 16:43:16 hostname.example.com systemd[1]: Stopped Docker Application Container Engine.
dic 01 16:43:16 hostname.example.com systemd[1]: Unit docker.service entered failed state.
dic 01 16:43:16 hostname.example.com systemd[1]: docker.service failed.

Environment

  • Openshift 3.1

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content