fsck on remote SAN filesystem
Hi, my RHEL servers all have a number of filesystems (used for storage) hosted on remote SAN devices. A while ago, I had to modify /etc/fstab to change the fsck parameter to 0 on these filesystems. The reason? Well they prevented boot in to RHEL (forced to shell)!
I'm now not sure, whether this course of action was correct, and like to ask the experts (you!) what the best practice is for running fsck on SAN-based filesystems at boot time?
Many thanks
Responses
Hi Mark,
What do you mean by remote SAN devices? Remote: as in not in the same datacenter as the server.
Is the SAN connection FC-based or iscsi-based?
These answers are needed by the community to give a proper advise.
Localized storage whether DAS or SAN I would recommend to do a "normal" fsck at boot time.
Remote storage might cause latency issues at boot time, I can't advise you in such case for i do not have experience in such setups.
Kind regards,
Jan Gerrit Kootstra
Mark,
We set fsck on for all SAN filesystem on RHEL 3.0 and newer.
And only experience a drop to runlevel S if a filesystem needs a full fsck.
You can drop a line in this discussion, if you feel you need help on an error that occurs during the boot.
Give an indication when you will do the reboot, please include your local timezone.
Kind regards,
Jan Gerrit
I personally wouldn't treat SAN devices any differently to SCSI / local attached disk (although I would only boot from SAN under duress)
As Jan has mentioned, non root FS should be configured with '2' for the pass number field (last field in /etc/fstab)
Hi Mark,
Are you using ext3 or ext4 on the SAN storage you mention?
We have some servers with direct SAN attached storage that have /etc/fstab set to fsck if they are not rebooted after a set amount of days. On occasion we've had a select few of these servers go well over six months of uptime and then, we were forced to a fsck upon a reboot.
One thing I've done to help this is to reboot the systems a bit more often, and to schedule this with the customer. This has helped with some of the servers I've had to avert tediously waiting almost 2 hours while it did an FSCK.
- I notice something like this for mkfs.ext4 and mkfs.ext3
~]# mkfs.ext4 /dev/sdb1
--output truncated--
This filesystem will be automatically checked every 20 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
As far as a best practice goes for rhel 5 (ex3 usually, but ext4 with e4fsprogs) and rhel 6 (default of course is ext4, but the SAN may not be ext4), you might want to consider in a low-priority case with Red Hat and request this from thier storage folks. While you can certainly research this and acquire a good consensus between your research and discussion responses.
The benefit of a case eith Red Hat support is if your management asks for documentation on your best practice, you can cite the case number you placed with support and what they mentioned, even while you know the product of your research, it helps at times to have a case number and recommendation from Red Hat.
Our own Principal Engineer has on occasion asked me to ask Red Hat for their idea of a best practice thourgh placing a case that way we can have it for management folks or for control change boards, or other documentation. It may seem like overkill, but it can help with contracts, documentation, expectation management (and I'm talking not just about this but best practices in general).
In our case, we reboot some of the servers a bit more often and that's helped in our scenario.
Hope this helps,
Remmele
Hey Mark - do you have consistent issues still, or were you seeking validation on changing the fstab?
Also - you're going to hate my response/question ;-) Did you get your multipath sorted? I have had similar issues on my hosts where multipath was not configured correctly and my SAN volumes would not start at boot time.
Ex: system is running, we create a Volume Group/Volumes/File Systems/etc... and as the last step we reboot to make sure everything is cool when it comes back up. Things appear fine until it attempts to start the volumes. At that point I believe it would indicate that Volume was not found and drop to single user. I could manually
service multipathd restart && chkconfig multipathd on
pvscan && vgscan && vgchange -ay <volumegroup>
mount -a
To troubleshoot this, I would
* comment out the SAN volumes in /etc/fstab
* reboot
* see if multipath is running, run multipath -ll -v2 to determine what the current state is
* run vgs && vgdisplay
So - the remainder of my response is X-files level conspiracy theory. This reminded me of one of the reasons I disliked using UDEV (in particular for Oracle ASM). It was a chicken-before-the-egg type issue. UDEV needed to lookup users (oracle:dba) which we had in LDAP to apply to the Disk Devices, but... LDAP was not available because UDEV had to complete before the network would start before LDAP would run before.... Our solution was to put those users in local files (which was a good idea anyhow). Now - I don't know if this particular issue even exists any longer, or whether you decided to use the UDEV method.
Linux will attempt to mount blockdevs enumerated in /etc/fstab fairly early in the boot sequence. For FC devices, this can often happen before the hardware probes return - resulting in the symptoms seen. A couple "tricks" can be used to avoid this:
1) Add the "_netdev" mount option to the device
2) Fiddle about with sequence options (the last two columns)
3) Encapsulate the SAN devnodes within LVM objects
4) Forceload your FC drivers into your initrd image
The first option is fine if you absolutely want to mount either bare FC blockdevs - or want to mount via mpathd overlays.
The second option can be hit or miss.
The third option is fairly bulletproof (LVS' availability tends to be more gracefully handled by the boot sequence) and makes subsequent management of your FC devices more flexible (e.g., if you grow to the end of your blockdev, you can use LV to tie multiple LUNs together into a larger metadevice).
The fourth option is good ...right up until someone updates the system and fails to preload the FC drivers in the new initrd. If you've set up your environment carefully, that shouldn't happen, but I've had to recover systems where people went down this path and didn't set their environment up correctly.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
