Exit message: failed to create logfile /var/log/libvirt/qemu/<name_server>.log: Read-only file system.

Latest response

Hello Guys

 

I have the following environment

- 4 Nodes Dell M620 with RHEV 6.4

- 1 Node Dell M610 with RHEL and 1 Virtual Machine with RHEV-M 3.2

- 1 Storage Equallogic PS6110

Sometimes when creating a new virtual machine or perform a migration this message

Exit message: failed to create logfile /var/log/libvirt/qemu/sa_jb_01.bolitel.local.log: Read-only file system.

This message appears at random and must reboot the nodes for resolution

Anyone know that this error occurs and how to fix it

Thanks for your help

Regards

 

Sebastian

Responses

Typically when I see that read-only file system message, it usually indicates a hardware problem.  Something corrupted the file system a little bit so Linux sets it to read only.  And, of course it's the file system with all your diagnostic logs so there's no recording of what went wrong.  Maddening!

Your error is with /var/log/libvirt/qemu/sa_jb_01.bolitel.local.log - so that suggests a local boot disk or other hardware problem on one of your RHEV-H hosts has the problem.  You see the error in your RHEV-M events and then you clear the problem by rebooting all your RHEV-H hosts, right?

Next time it happens, look at which host has the SPM role and put that host into maintenance mode and reboot that host and only that host.  See if the problem clears itself up.  If so, then you're a little bit closer to the problem now.  If not, put each host into maintenance mode and reboot, one at a time, and keep track of which host reboot makes the problem go away.  

Maybe repeat this a few times and see if the problem always stays with the same host.  If it does, the odds are pretty good the problem is with that host and you can drill down further on that host.

- Greg

Or even better than host reboots - maybe just ssh into each host when the problem comes up again. One of the ssh sessions will probably fail with a "read only file system" or other error.  Put that host into maintenance mode, shut it down, then maybe swap its boot HDD and rebuild it.  

A bad HDD is not always the cause of file corruption issues - it could be a bad cable, bad controller, bad motherboard, etc.  But the part swapping always starts with the HDD.

Or if the hardware is still under warranty, call Dell and see what they can do with any hardware diagnostics on that host.  

- Greg Scott

Hello Guys

This problem appears in the 4 hosts that have RHEV. We reinstalled the 4 nodes and still receive this error randomly.

Now I connect by ssh but not present problem the host on the connection.

I reboot the host and see if appear messages of error in the boot.

Regards

Hello Guys

I reboot a host an message with possible error are:

 

CPU28: Package power limit notification (total events = 17)

Setting up Logical Volume Management:   4 logical volume(s) in volume group "HostVG" now active

cp: `/var/lib/vdsm' and `/var/lib/stateless/writable/var/lib/vdsm' are the same file

No kdump initial ramdisk found.                            [WARNING]

Rebuilding /boot-kdump/initrd-2.6.32-358.11.1.el6.x86_64kdump.img

WARNING: No module wmi found for kernel 2.6.32-358.11.1.el6.x86_64, continuing anyway

cp: cannot stat `/lib/kbd/consolefonts/LatArCyrHeb-16.psfu.gz': No such file or directory

mount: /dev/mapper/live-rw already mounted or /tmp/tmp.GxoM7Bv2AK busy

mount: according to mtab, /dev/mapper/live-rw is mounted on /

/etc/kdump.conf: Bad mount point UUID=04b71486-56be-4520-a7e4-5f00b4d8d441

Configuring libvirt for vdsm...

/bin/mv: inter-device move failed: `/tmp/tmp.31I2RSzbgO' to `/etc/logrotate.d/libvirtd'; unable to remove target: Device or resource busy

File already persisted: /etc/libvirt/libvirtd.conf

File already persisted: /etc/libvirt/qemu.conf

File already persisted: /etc/sysconfig/libvirtd

File already persisted: /etc/logrotate.d/libvirtd

 

 

Thanks for your comments.

 

Regards

Hi Sebastian,

The above error messages are not looking so bad but filesystem in read-only is really not a good indication.

It would be better if you will open a service request with Red Hat Tech Support. After reviewing the logs may be we can state the reason of it.

 

BR,

Uday

 

Hello Uday

 

I have a case in Redhat but I not have answer ): I send all log the friday.

 

Regards

 

Yes, absolutely, open a support ticket with Red Hat.  But you can help speed up problem resolution by further characterizing the problem and providing the Red Hat support team something to work with.  

The problem happens at random, after the whole system has been up and running for a while, correct?  And when the problem occurs, your cure so far has been to reboot all your hosts to get the system back online.  Still correct?

If so, then it is important to try to find which host or hosts are causing the problem.  Those logs you sent to Red Hat have a very large amount of information.  But only a small fraction of all that information is useful to your problem.  And if the problem is what I think, the logs may not show anything useful because the file system containing the logs becomes read-only for some unknown reason. So as soon as the problem presents itself, the host with the problem cannot log anything about it.  This will make it nearly impossible for Red Hat to find the problem without more help from you.  

So  **when the problem occurs next**, that is when to ssh into each host.  When logged in to your host as admin, you can press a sequence of characters on your keyboard to get to a root prompt.  I do not remember the key sequence, but somebody else reading this post can provide it.  Press that key sequence and once you are at the root prompt and can do linux commands, do the following commands:

cd /var/log/libvirt/qemu
touch a.a
rm a.a

This will try to create and then delete an empty file named, "a.a".  I think one of your hosts will give you a "read only file system" error when you try this.  When this happens, you will know which host is broken and you can log a service call with Dell about that host.  And you can take that host out of your RHEV environment until it is repaired.  

But remember, this diagostic test only makes sense after the problem occurs.  Don't try it right after a reboot.  Do this test after you see an indication of the problem.  

- Greg Scott

One other thought - if the problem is happening on more than one of your RHEV-H hosts, check the electric power feeding each host.  Make sure you are feeding it clean power.

- Greg Scott

Good idea. Thanks for all your help on this question, Greg!

My file system was not read-only, but I found when doing a forced logrotate against a new /etc/logrotate.d/syslog config I made, I had to do it against logrotate -v -f /etc/logrotate.conf which properly took care of the /var/log/messages and other files. When I attempted to do a logrotate -v -f /etc/logrotate.d/syslog - this would not execute the logrotate. I found out why from a couple of sources that said to use a forced logrotate against the /etc/logrotate.conf file, so the default parameters logrotate are known when it rotates logs, even forced.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.