Gluster Brick cannot be mounted with error: failed: No space left on device - Pod in Container Creating state
Issue
We have a pod "my_pod" in “container creating” state due to a "mount failed" error:
5m 1d 1492 my_pod.123456789 Pod Warning FailedMount kubelet, node-8.example.com (combined from similar events): MountVolume.SetUp failed for volume "pvc-11111-2222-3333" : mount failed: mount failed: exit status 1 <<----
This pod uses PV "pvc-11111-2222-3333" that is related to gluster volume "vol_112233445566778899" :
[root@node-9 ~]# oc describe pv pvc-11111-2222-3333
Name: pvc-11111-2222-3333
Labels: <none>
Annotations: Description=Gluster-Internal: Dynamically provisioned PV
gluster.kubernetes.io/heketi-volume-id=112233445566778899 <<----
...
StorageClass: glusterfs-storage
This pod uses pv "pvc-11111-2222-3333"
The error come from the fact that the disk cannot be served by Glusterfs,
and we found that is because two bricks were not online :
The gluster volume has two bricks offline:
sh-4.2# gluster volume status vol_112233445566778899
Status of volume: vol_112233445566778899
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.10.1:/var/lib/heketi/mounts/
vg_11111111/brick_0101010101/brick N/A N/A N N/A <<---
Brick 10.10.10.2:/var/lib/heketi/mounts/
vg_22222222/brick_0202020202/brick 49153 0 Y 368
Brick 10.10.10.3:/var/lib/heketi/mounts/
vg_33333333/brick_0303030303/brick N/A N/A N N/A <<---
Self-heal Daemon on localhost N/A N/A Y 84934
Self-heal Daemon on 10.10.10.3 N/A N/A Y 10993
Self-heal Daemon on node-2.host.example.com N/A N/A Y 1908
Task Status of Volume vol_112233445566778899
------------------------------------------------------------------------------
There are no active volume tasks
from this log /var/log/glusterfs/glfsheal-vol_112233445566778899.log we can see when the problem started :
[2019-10-28 00:01:00.000287] E [MSGID: 114058] [client-handshake.c:1484:client_query_portmap_cbk] 0-vol_112233445566778899-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2019-10-28 00:01:00.000369] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-vol_112233445566778899-client-0: disconnected from vol_112233445566778899-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-28 00:01:00.000409] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol_112233445566778899-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. <<--------------!!!!
We connect to both pods and see with command mount that the bricks are not mounted.
On each pod we manually try to mount them with this command:
mount -a -T /var/lib/heketi/fstab
In one pod it fails with error No space left on device :
sh-4.2# mount -a -T /var/lib/heketi/fstab
mount: mount /dev/mapper/vg_11111111-brick_0101010101 on /var/lib/heketi/mounts/vg_11111111/brick_0101010101 failed: No space left on device
Environment
- OCS 3.11 , converged mode
- glusterfs-3.12.2-47.4.el7rhgs.x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.