RHEVM : Critical, Low Diskspace warnings for /var/log for hypervisor

Latest response

RHEVM 3.1.0-50

RHEVH 20130318

Can anyone help me with this one ?   I've raised a support ticket but fear I may run out of diskspace beforehand !

RHEVM is reporting "Critical, Low disk space" on /var/log for a hypervisor.

 I would like to understand how to clear this down and stop the alerts.

I have seen knowledge base article (https://access.redhat.com/site/solutions/289583) but it is not relevant as we are already running the vdsm package with the supposed fix (vdsm-4.10.2-1.6.el6). 

It tells me I only have 50M of space left on /var/log"

 

Moreover, I'm slightly confused because /var/log appears to be both a memory based (tmpfs) filesystem and also a logical volume.  Which is it,  or is it somehow both ?   I'm probably misunderstanding something.

 Either way, df -h on /var/log shows 1.8G used of 2.0G  , but du -sh of /var/log show only 290M.

 [root@rhphyp03a log]# df -h /var/log

Filesystem            Size  Used Avail Use% Mounted on

-                     2.0G  1.8G   92M  96% /var/log

[root@rhphyp03a log]# pwd

/var/log

[root@rhphyp03a log]# du -sh

290M    .

 [root@rhphyp03a log]# cat /proc/mounts | grep "/var/log"

none /var/log tmpfs rw,rootcontext=system_u:object_r:var_lib_t:s0,seclabel,relatime 0 0 /dev/mapper/HostVG-Logging /var/log ext4 rw,seclabel,noatime,barrier=1,stripe=64,data=ordered 0 0"

Responses

Sorry, that last bit didn't format very well....

[root@rhphyp03a log]# cat /proc/mounts | grep "/var/log"

none /var/log tmpfs rw,rootcontext=system_u:object_r:var_lib_t:s0,seclabel,relatime 0 0
/dev/mapper/HostVG-Logging /var/log ext4 rw,seclabel,noatime,barrier=1,stripe=64,data=ordered 0 0"

[Duplicate post removed]

Previously, I have filed a support case for a similar issue.  GSS provided the following workaround which seems to have resolved the issue on our server.

1. Migrate VMs to other hosts (if possible) or schedule downtime for the VMs running on the host experiencing the problem

2. After migrating or shutting down all VMs on the affected host, place the host into Maintenance mode

3. Add 'copytruncate' to the /etc/logrotate.d/vdsm file in the '/var/log/vdsm/*.log {' stanza of the /etc/logrotate.d/vdsm file, like so:

 /var/log/vdsm/*.log {
    rotate 100
    missingok
    size 15M
    compress
    compresscmd /usr/bin/xz
    uncompresscmd /usr/bin/unxz
    compressext .xz
    copytruncate
 }

If you should need to make a change to the file to correct for this configuration you will need to unpersist the file first, make the changes, and then persist the updated file again, like so:

A. Unpersist the file:

 # unpersist /etc/logrotate.d/vdsm

B. Make the changes to the vdsm from above

C. Persist the updated file again:

 # persist /etc/logrotate.d/vdsm

Please ignore this directions if you already placed the 'copytruncate' directive in the '/var/log/vdsm/*.log {' stanza of the /etc/logrotate.d/vdsm file.  

4. Persist the /etc/logrotate.d/vdsm so the file will be persistent across reboots of the hypervisor

 # persist /etc/logrotate.d/vdsm

5. Restart the vdsmd service from the command line of the affected host

 # service vdsmd restart

6. Activate the hypervisor via the RHEVm Admin Portal

Brilliant Aram.   Thanks for that.  I'll give it a shot.

Why is this stuff not up on the Knowledge base ? 

I recently had a similar issue. Then it was the file /var/log/vdsm-re/vdsm-reg.log.1 which had been deleted, but kept growing since it was still open. I.e.

 

[root@rhev3 log]# lsof |egrep  '^COMMAND|deleted'
COMMAND     PID      USER   FD      TYPE             DEVICE     SIZE/OFF       NODE    NAME
vdsm-reg-     4140         root    4w      REG              253,8       1234300622         38       /var/log/vdsm-reg/vdsm-reg.log.1 (deleted)   
 
 
 
 

I freed up /var/log a couple of times by truncating the deleted file with 'echo  "" > /proc/4140/fd/4', but eventually gave up and rebooted the hypervisor to clear the problem. Could probably also have fixed it by restarting the vdsm-reg service, but don't fully understand what the consequences of that would have been.

This is a known issue in the latest version of vdsmd package and work around is already up on the kbase. See https://access.redhat.com/site/solutions/340123

We are working on a code fix for this. Till then, it's recommended to implement the work around of adding "copytruncate" to /etc/logrotate.d/vdsm. I am not sure  you need to migrate vms, change host to maintenance mode to do this, It's expected to take effect without doing those and the change need to be persisted.This work around is to prevent this problem from happening in future. To clear up the already consumed space, you need to restart vdsmd.

That workaround filed by Aram works like a dream.   Thanks again Aram.

I strongly suggest it gets into the Knowledge base as soon as possible.

Thanks so much, everyone- An excellent example of collaboration in action!

What are the consequence of restarting vdsmd?
Can it be performed when the host is UP and have running VMs without downtime?
Or do you need to put the host in maintenance mode.

Strong suggestion noted, Rich! I'll follow up on this. Thanks.

To be fair, it would appear that I caught this one between Aram posting the solution and it being published on the KB.  The important thing is that its now up there.  Well done all and thanks again.

shall we do this workaround while vm's are up , please revert asap