Exit message: failed to create logfile /var/log/libvirt/qemu/<name_server>.log: Read-only file system.
Hello Guys
I have the following environment
- 4 Nodes Dell M620 with RHEV 6.4
- 1 Node Dell M610 with RHEL and 1 Virtual Machine with RHEV-M 3.2
- 1 Storage Equallogic PS6110
Sometimes when creating a new virtual machine or perform a migration this message
Exit message: failed to create logfile /var/log/libvirt/qemu/sa_jb_01.bolitel.local.log: Read-only file system.
This message appears at random and must reboot the nodes for resolution
Anyone know that this error occurs and how to fix it
Thanks for your help
Regards
Sebastian
Responses
Typically when I see that read-only file system message, it usually indicates a hardware problem. Something corrupted the file system a little bit so Linux sets it to read only. And, of course it's the file system with all your diagnostic logs so there's no recording of what went wrong. Maddening!
Your error is with /var/log/libvirt/qemu/sa_jb_01.bolitel.local.log - so that suggests a local boot disk or other hardware problem on one of your RHEV-H hosts has the problem. You see the error in your RHEV-M events and then you clear the problem by rebooting all your RHEV-H hosts, right?
Next time it happens, look at which host has the SPM role and put that host into maintenance mode and reboot that host and only that host. See if the problem clears itself up. If so, then you're a little bit closer to the problem now. If not, put each host into maintenance mode and reboot, one at a time, and keep track of which host reboot makes the problem go away.
Maybe repeat this a few times and see if the problem always stays with the same host. If it does, the odds are pretty good the problem is with that host and you can drill down further on that host.
- Greg
Or even better than host reboots - maybe just ssh into each host when the problem comes up again. One of the ssh sessions will probably fail with a "read only file system" or other error. Put that host into maintenance mode, shut it down, then maybe swap its boot HDD and rebuild it.
A bad HDD is not always the cause of file corruption issues - it could be a bad cable, bad controller, bad motherboard, etc. But the part swapping always starts with the HDD.
Or if the hardware is still under warranty, call Dell and see what they can do with any hardware diagnostics on that host.
- Greg Scott
Hi Sebastian,
The above error messages are not looking so bad but filesystem in read-only is really not a good indication.
It would be better if you will open a service request with Red Hat Tech Support. After reviewing the logs may be we can state the reason of it.
BR,
Uday
Yes, absolutely, open a support ticket with Red Hat. But you can help speed up problem resolution by further characterizing the problem and providing the Red Hat support team something to work with.
The problem happens at random, after the whole system has been up and running for a while, correct? And when the problem occurs, your cure so far has been to reboot all your hosts to get the system back online. Still correct?
If so, then it is important to try to find which host or hosts are causing the problem. Those logs you sent to Red Hat have a very large amount of information. But only a small fraction of all that information is useful to your problem. And if the problem is what I think, the logs may not show anything useful because the file system containing the logs becomes read-only for some unknown reason. So as soon as the problem presents itself, the host with the problem cannot log anything about it. This will make it nearly impossible for Red Hat to find the problem without more help from you.
So **when the problem occurs next**, that is when to ssh into each host. When logged in to your host as admin, you can press a sequence of characters on your keyboard to get to a root prompt. I do not remember the key sequence, but somebody else reading this post can provide it. Press that key sequence and once you are at the root prompt and can do linux commands, do the following commands:
cd /var/log/libvirt/qemu
touch a.a
rm a.a
This will try to create and then delete an empty file named, "a.a". I think one of your hosts will give you a "read only file system" error when you try this. When this happens, you will know which host is broken and you can log a service call with Dell about that host. And you can take that host out of your RHEV environment until it is repaired.
But remember, this diagostic test only makes sense after the problem occurs. Don't try it right after a reboot. Do this test after you see an indication of the problem.
- Greg Scott
One other thought - if the problem is happening on more than one of your RHEV-H hosts, check the electric power feeding each host. Make sure you are feeding it clean power.
- Greg Scott
My file system was not read-only, but I found when doing a forced logrotate against a new /etc/logrotate.d/syslog config I made, I had to do it against logrotate -v -f /etc/logrotate.conf which properly took care of the /var/log/messages and other files. When I attempted to do a logrotate -v -f /etc/logrotate.d/syslog - this would not execute the logrotate. I found out why from a couple of sources that said to use a forced logrotate against the /etc/logrotate.conf file, so the default parameters logrotate are known when it rotates logs, even forced.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
