OCS/ODF: Linux shutdown / reboot hangs due to CephFS kernel client, "libceph: connect (1)192.168.xx.76:6801 error -101"
Issue
Linux shutdown / reboot hangs due to CephFS kernel client, "libceph: connect (1)192.168.xx.76:6801 error -101"
This is a snippet of dmesg -T
from the node stuck in Linux shutdown / reboot:
[Thu Apr 27 17:10:56 2023] libceph: osd9 (1)192.168.xx.51:6801 connect error
[Thu Apr 27 17:10:56 2023] libceph: connect (1)192.168.xx.62:6801 error -101
[Thu Apr 27 17:10:56 2023] libceph: osd0 (1)192.168.xx.62:6801 connect error
[Thu Apr 27 17:10:56 2023] libceph: connect (1)192.168.xx.72:6801 error -101
[Thu Apr 27 17:10:56 2023] libceph: osd16 (1)192.168.xx.72:6801 connect error
[Thu Apr 27 17:10:57 2023] libceph: connect (1)192.168.xx.46:6801 error -101
[Thu Apr 27 17:10:57 2023] libceph: osd2 (1)192.168.xx.46:6801 connect error
[Thu Apr 27 17:10:57 2023] libceph: connect (1)192.168.xx.76:6801 error -101
[Thu Apr 27 17:10:57 2023] libceph: osd22 (1)192.168.xx.76:6801 connect error
[Thu Apr 27 17:10:58 2023] libceph: connect (1)192.168.xx.77:6801 error -101
[Thu Apr 27 17:10:58 2023] libceph: osd3 (1)192.168.xx.77:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.74:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd14 (1)192.168.xx.74:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.52:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd12 (1)192.168.xx.52:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.38:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd18 (1)192.168.xx.38:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.49:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd8 (1)192.168.xx.49:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.56:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd23 (1)192.168.xx.56:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.66:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd6 (1)192.168.xx.66:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.44:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd7 (1)192.168.xx.44:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.11:6801 error -101
[Thu Apr 27 17:11:00 2023] libceph: osd19 (1)192.168.xx.11:6801 connect error
[Thu Apr 27 17:11:00 2023] libceph: connect (1)192.168.xx.3:6801 error -101
From Upstream Tracker:
1) Mount CephFS on client
2) Shutdown a node containing an OSD+MON or somehow make that node unreachable
3) While client is accessing the mount(simple ls on directory), reboot the CephFS client node
4) Observe the client node will be stuck forever until it can reach the Ceph nodes or unless hard reset is done
Environment
Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Red Hat Ceph Storage (RHCS) 7.x
Ceph File System (CephFS)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.