Gluster Brick cannot be mounted with error: failed: No space left on device - Pod in Container Creating state

Solution Verified - Updated -

Issue

We have a pod "my_pod" in “container creating” state due to a "mount failed" error:

5m          1d           1492      my_pod.123456789   Pod                   Warning   FailedMount   kubelet, node-8.example.com   (combined from similar events): MountVolume.SetUp failed for volume "pvc-11111-2222-3333" : mount failed: mount failed: exit status 1  <<----

This pod uses PV "pvc-11111-2222-3333" that is related to gluster volume "vol_112233445566778899" :

[root@node-9 ~]# oc describe pv pvc-11111-2222-3333
Name:            pvc-11111-2222-3333
Labels:          <none>
Annotations:     Description=Gluster-Internal: Dynamically provisioned PV
                 gluster.kubernetes.io/heketi-volume-id=112233445566778899  <<----
...
StorageClass:    glusterfs-storage   

This pod uses pv "pvc-11111-2222-3333"
The error come from the fact that the disk cannot be served by Glusterfs,
and we found that is because two bricks were not online :

The gluster volume has two bricks offline:

sh-4.2# gluster volume status vol_112233445566778899
Status of volume: vol_112233445566778899
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.10.1:/var/lib/heketi/mounts/
vg_11111111/brick_0101010101/brick       N/A       N/A        N       N/A  <<---
Brick 10.10.10.2:/var/lib/heketi/mounts/
vg_22222222/brick_0202020202/brick       49153     0          Y       368
Brick 10.10.10.3:/var/lib/heketi/mounts/
vg_33333333/brick_0303030303/brick       N/A       N/A        N       N/A  <<---
Self-heal Daemon on localhost            N/A       N/A        Y       84934
Self-heal Daemon on 10.10.10.3           N/A       N/A        Y       10993
Self-heal Daemon on node-2.host.example.com                                    N/A       N/A        Y       1908

Task Status of Volume vol_112233445566778899
------------------------------------------------------------------------------
There are no active volume tasks

from this log /var/log/glusterfs/glfsheal-vol_112233445566778899.log we can see when the problem started :

[2019-10-28 00:01:00.000287] E [MSGID: 114058] [client-handshake.c:1484:client_query_portmap_cbk] 0-vol_112233445566778899-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2019-10-28 00:01:00.000369] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-vol_112233445566778899-client-0: disconnected from vol_112233445566778899-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2019-10-28 00:01:00.000409] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol_112233445566778899-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.  <<--------------!!!!

We connect to both pods and see with command mount that the bricks are not mounted.

On each pod we manually try to mount them with this command:
mount -a -T /var/lib/heketi/fstab

In one pod it fails with error No space left on device :

sh-4.2# mount -a -T /var/lib/heketi/fstab
mount: mount /dev/mapper/vg_11111111-brick_0101010101 on /var/lib/heketi/mounts/vg_11111111/brick_0101010101 failed: No space left on device

Environment

  • OCS 3.11 , converged mode
  • glusterfs-3.12.2-47.4.el7rhgs.x86_64

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content