Best Practice for Setting up Partitions and LVMs
I'm installing Oracle 11gR2 on RHEL v5 which will live on a VM in vSphere and I want to optimize performance the best I can by setting up various directories such as /tmp, /var, /opt on separate hard disks within the VM. Will the OS install give me the option to do this and I'm wondering how others have set this up for a better performance.
I'm new to LVM and not sure how to best set this up as well, so I'm looking to see what others have done or if someone could point me in the right direction.
thanks
Responses
Hey Christopher,
I'll ask the one question (before anyone/everyone else does ;-) Why RHEL 5?
PSA: Here is the release schedule for RHEL
https://access.redhat.com/articles/3078
This is a fairly sizeable discussion, but I'll throw out some light-reading ;-)
Now - on to your question:
You will have quite a bit of flexibility regarding filesystem layout. Oracle has a fairly well-defined standard as to where they expect files to go, etc.. In my environment we typically have a single Volume Group for the OS and EVERYTHING else goes on separate disk (typically SAN) in a separate VG. We try to group the Disks into VGs so that they could be exported from the host and imported elsewhere (i.e. all the binaries are in one VG).
What is your goal with your LVM setup? I.e. are you trying to deploy a STIG compliant machine, have flexibility to address storage needs at a later time, compartmentalize usage of disk space for better control...
You will be able to use either UDEV or ASMlib to manage the actual Oracle devices. Both are solid options and have their own advantages/disadvantages.
Here is an guide for best-practice on RHEL 6
https://www.redhat.com/en/resources/deploying-oracle-database-11g-r2-red-hat-enterprise-linux-6
Here is one (dated 2008)
http://www.redhat.com/f/pdf/rhel/Oracle-10-g-recommendations-v1_2.pdf
There were similar docs produced by Oracle with their recommendations as well. It might be tough to find a doc that is specific to RHEL 5 and 11gr2.
As for general filesystem recommendations
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Installation_Guide/s2-diskpartrecommend-x86.html
With moving to a virtualization environment, you'll really want to ensure that your volume elements are aligned to the underlying storage subsystem. Prior to vSphere, there was a VMware tool for doing this after the fact. Under vSphere, however, you're kinda stuck doing it from the guest-level. EL6 is nice in that its default starting block is aligned to most shared storage subsystem types, but if you're sticking with EL5, you'll have to go into expert mode and do it by hand (or, use parted to do it in one sweep).
Whether prepping the /dev/sd device for use via LVM or ASM, you'll want your DB/app accessing the disk from an aligned starting point. Failure to do so can incur non-trivial performance-penalties in environments with high consolidation-ratios.
As someone who's watched DBAs botch servers and storage, I probably wouldn't simply give them access to wholly unmanaged disks - especially not in a shared-storage environment. You'll still want to prep the disks for them so that ASM does the right thing with respect to IO alignment. I'd probably also only give them access to Oracle-owned partitions via udev-rules (if your DBA's clueful, your DBA doesn't actually need root to install Oracle, nor does he need it to keep things running).
As to LVM and vSphere, there isn't really conflict. LVM doesn't particularly care that the underlying storage is presented via a virtualization layer.
That said, and independent of LVM, expanding storage on the fly - whether doing it on a physical host by expanding a SAN-presented LUN or on a virtual host through the hypervisor's storage-engine - is kind of a pain on Linux. In normal circumstances, Linux only updates its in-kernel disk geometry information boot time. You can force it to re-read the geometry for disks, but only for disks that you can fully offline. Until you've caused Linux to update its disk geometry information, you can't do the repartitioning necessary to take advantage of the additional space.
Unfortunately, you can't really fully offline the disks containing the root filesystem(s) - you're kinda stuck with a reboot if you want to grow disks hosting the root filesystem(s). Once rebooted, you're kind of limited on how you can re-partition the disk: fdisk/parted/etc. will allow you to add additional primary or extended/logical partitions, but not completely extend the existing partitions.
To me, adding partitions to a disk - rather than growing existing partitions - isn't a "clean" way to grow the disk. Primarily because adding partitions limits how many times you can grow a disk whereas changing existing partitions' geometry doesn't so limit you.
I personally refrain from naming volume groups after disks because it defeats some of the benefits of the volume group. I name them after the tier/type of storage the disks in that volume group are coming from, so when new disks are added from the same tier/profile they are added to the associated volume group. If you are naming your volume groups after disks and you want to expand the volume group by adding a disk/PV, that volume group naming will no longer be correct (ie. it will contain multiple disks).
I am not sure I follow the concerns of your storage admin regarding LVM/VMware. VMware/vSphere isn't aware of what's on the disk (LVM), it's presenting a block device... is there concerns over hot spots/balance?.
The configuration can be pretty straightforward. If you have tiered/profiled storage under your VM, you provide the OS disk to use in LVM volumes (slower tier) and the database disk to use for ASM volumes (faster tier).
Depending on the DBA they may want to put all their binaries on ASM (ACFS) too, which may cost you a small amount of faster disk ;)
EDITED
Hi Pixel,
I appreciate the input you gave... When I use an entire disk, such as a mirror for an operating system, or a raid10 for a specific server, or a raid5 for some other use, then there is no foul in using "mirror" for a volume group name, or "raid5" or "raid10". In the times I've had to extend something like a raid5, I've made another disk, but I see your point in the virtual world where I might add something to extend a volume group where disk0 really may not be relevant such as a vmdk file.
Device (fdisk, sfdisk, gparted)
Physical (pvcreate)
Volume Group (vgcreate) - this is the closest thing to the device and physical disk. So in our environment, using "mirror" or "raid5" or "raid10" or other such relevant names is at least for me superior to the defaults I've seen in the past such as VolGroup01 which does nothing relevant in immediately identifying a disk. But I like the idea of using a tier group, perhaps you could give me an example if you wouldn't mind.
Using names such as I've described cuts down searching for useless abstract names when attempting to do LVM operations in my environment with numerous customers. But your tiered naming approach seems to have merit, I'd like to hear more,
- thanks
I have been using VMs almost exclusively for RH workloads for the past 5 or so years (can't remember the last time I installed on bare metal) so my suggestions may not be particularly relevant in a lot of situations :)
In larger environments the storage is all managed centrally and the RAID configuration etc. is completely abstracted from view at the VMware level. Most configurations I have seen have presented their datastore clusters in VMware using a specific profile of disk or RAID configuration (tier). eg.
Tier 1 - SSD
Tier 2 - SAS
Tier 3 - Near line SAS / SATA
These datastore clusters are made up of datastores that match a profile created in the storage system. In this scenario I would create a VMDK from Tier 2 for the OS disk in VMware, and a VMDK on Tier 1 for DB. This would translate to VGs
vg_tier1 (or no tier1 if using ASM)
vg_tier2
Then if I needed to extend an LV in the vg_tier2 volume group, I would add an additional VMDK from the tier 2 datastore cluster and then fdisk/parted, pvcreate, vgextend to add it to vg_tier2 and expand the LV and resize the filesystem. This ensures all disks in the same VG match the same performance profile.
I have started to see this 'tiering' method/architecture disappear though, as most storage systems are now implementing dynamic tiering which moves the hot data to the higher tiers automatically. In this scenario, all datastores appear equal in VMware, and the data is shuffled based on dynamic profiling of the application. I have found a single VG appears to meet the needs of most VMs in this configurations and also gives you the benefit of sharing available disk across all LVs on the system.
I personally avoid resizing disks/LUNs/partitions. Interested to hear what others are doing.
Chris, I have numerous oracle servers working fine on rhel 6.current. James is correct - you'll have to rebuild the servers to RHEL 6.current when you lose patch update ability for rhel 5 and oracle works just fine on RHEL 6 in my test/devel and production environments. (unless you really want to buy extended support for RHEL 5, which only delays the inevitable)
We have not only a separate partition for /var, but also /var/log and /var/log/audit because (at least for our environment) it makes sense based on past experience/pain.
We use a bind-mount of /var/tmp to go to /tmp
Some of our oracle storage is on a /app partition.
I can not stand the non-sensible way volume groups/logical volumes are created such as "VolGroup01" and "LogVol01" etc. Therefor, I make the volume group on VMware named "disk0" and if it is a physical server, I make the OS disk volume group name "mirror" because it's on a mirror. I then name the logical volume to give some clue to the name of the partition.
Id recommend making a separate /home drive to avert "rouge" developers or others from attempting to fill your "/" file system because "/home" was not it's own partition.
I went overkill and on my systems I even have my own /var/cache partition, and yes, again, it is overkill. I'd rather go overkill than have to deal with "/" filling up.
Those are some thoughts, and certainly take a look at this discussion on separating /var from /. Some tomcat servers will have a heavy (ab)use of /var if not properly constrained with "war" files and other things that those that use tomcat (ab)use that file system for. So if you use tomcat, consider a partition for where tomcat writes things. I've done this because I was dead sick of tomcat filling up /var.
Hope this helps
Heh: you've described all of the partitioning prescribed by the STIGs for EL6:
- V-38455: a separate filesystem for /tmp
- V-38456: a separate filesystem for /var
- V-38463: a separate filesystem for /var/log
- V-38467: a separate filesystem for /var/log/audit
- V-38473: a separate filesystem for /home
That said, never really bought into bind-mount /var/tmp from /tmp: the two had very specific, historical reasons for existing in the first place. The STIG recommendations for changing that violates those rationales. It's also really bad if someone decides to do a STIG overlay to a system that's mounting /tmp as tmpfs (which, if you're running an A/V component - as required to be STIG compliant - is one of the few ways to keep that A/V from killing your system - especially in a virtualized environment)
Interesting you mention the /tmp and /var/tmp. I have always had issues with bind mounting because of the differences defined in the LSB/FHS.
/tmp - Temporary files
/var/tmp - Temporary files preserved between system reboots
When building on physical systems or virtual hardware like VMware's we'd put /tmp as tmpfs.
When building VMs on AWS, usually would use instances' instance-store for /tmp.
Still haven't decided what we'll do for OpenStack hosted VMs (still too early in our migration from vSphere to public/private hybrid cloud solutions)
Yeah. The unfortunately thing about "bind /var/tmp to /tmp" is that it doesn't account for why the two locations exist in the first damned place, nor does it take into account current trends in what /tmp is based on (RHEL finally adopting what the commercial Unix world has been doing for a decade-plus).
Rather than starting a new topic, is anyone using LVM thin provisioning in their production environments?
I am assuming it's primarily useful in situations where you can't thin provision lower in the stack (eg. VM Hypervisor / Storage), or using it on the hypervisor server itself (eg. RHEV/KVM). Has anyone else found a use case for it in general server land?
In general, we've avoided most types of thin storage solutions for anything other than pre-production systems. Our funding is too unpredictable and our procurement cycles are too slow to be sure that, if we over-commit, that we'll be able to keep up with the overcommit before things start coming to a grinding halt. When pre-prod falls over, it's annoying but not really business-impacting. Production...
Chris,
I can't really see a benefit at all to splitting mount points into separate partitions on the SAN. I think it adds unnecessary operational complexity and really provides no benefit. If you want filesystem separation LVM can do that for you (or even basic partitioning), if it's because of performance and it's OS partitions, i'd just move all the partitions to the (same) faster datastore to avoid the hassle.
Thin provisioning has a list of benefits and potential issues, it's best to balance up what works in your environment/configuration. Most places i've seen that thin provision, use the storage infrastructure to do it, and thick provision (lazy zeroed) at the VMware layer. Thin provisioning at the VMware layer can be problematic if you aren't profiling your growth and a rouge VM fills a datastore (although storage DRS helps mitigate this).
There is a good write up on the VMware blog regarding where/when to thin provision. It's well worth the read:
http://blogs.vmware.com/vsphere/2012/03/thin-provisioning-whats-the-scoop.html
In general, putting members of the root filesystems onto different datastores from each other don't make a boat-load of sense. While they can make sense if you're installing applications into those filesystems, particularly performance-sensitive applications ...you really should be keeping your applications and your root filesystems separate in the first place.
As to thin provisioning, it really depends on things like:
- How much of a performance hit your applications able to take when it has to extend its backing store. While this tends to be less of an issue on hardware-based thin-provisioning, the hit you take when you're doing thin storage through the hypervisor can be quite significant. Hell, some applications are sensitive enough that you'll really want to use eager-zero thick provisioning.
- How fast your procurement and deployment cycle operates combined with how good of a storage-reporting system you have. Things can get REALLY ugly if your forecasting is not accurate - especially if your procurement and/or deployment cycles are somewhere beyond "not speedy".
In general, thin-provisioning is great for dev/test - but I'd be leery of putting production on it without an EXCEEDINGLY well-designed reporting/forecasting system tied to a speedy procurement and deployment cycle.
Performance benefits vary greatly based on the overall (end-to-end) solution-architecture, the capabilities of any given component and the I/O profiles of the workloads you're looking to support.
From a "best practices" standpoint, you're typically best served putting OS data and application data onto their own partitions and their ow vDisks. Depending on the complexity of your application, you might even gain further benefit from putting different parts of the application onto different filesystems and associated vDisks:
- Oracle's a good example here because of the different performance requirements for filesystems hosting binaries, active data files, redo logs and archive logs. This is particularly so if your Oracle installation is transactional in nature, particularly if there's lots of writes.
- Satellite, on the other hand, probably doesn't benefit as much from a complicated storage setup. Its I/O profile is heavily read-oriented. Between OS, hypervisor and storage-layer read-caching behaviors, there's likely not a lot to be gained from splitting it up six ways from Sunday. Splitting it from the OS drives mostly comes down to increasing your flexibility for growth, mobility and backups/recovery.
Also, it's good to be clear in your terms. For example, logical volumes aren't inherently interchangeable with volume-groups. I could have a single volume group that contained six logical volumes and, if I were picayune enough, I could ensure that each of those logical volumes were referencing their own underlying vDisks. If I were really pedantic, I could even ensure that the separation extended down through the hypervisor and into the array.
The fun thing is, depending on your solution components' capabilities, being that pedantic could actually harm my overall performance. Thus, much of any guidance needs to be couched in "it depends".
Chris,
Is this in a VMware context? are you referring to VMware datastores here? or logical volumes? (LVs)
If it is VMware datastores, it seems a little strange!
As Tom has mentioned, there are many layers involved In a Linux / VMware / Storage System configuration. Your environment may be very different... eg. if you are using DAS in your VMware hosts.
Here is an example of the layers:
Linux Filesystem
Linux LVM Logical Volume
Linux LVM Volume Group
Linux LVM Physical Volume
Linux Disk Partition
VMware Virtual Hard Disk
VMware Datastore Cluster
VMware Datastore
Storage System LUN
Storage System Storage Pool / Volume
Storage System RAID groups
Storage System Physical disks
You should really be splitting your OS mount points out as LVM logical volumes , these are allocated from LVM volume groups which are made up of LVM physical volumes (which in most configurations correspond to a VMDK / VMware virtual hard disk). If you need to increase the size of a logical volume, you increase space available in the volume group by adding a VMDK as a physical volume to the volume group.
For what its worth, I agree with PixelDrift - it gets too complex to separate all your minor partitions out onto different datastores - and frankly those OS-based partitions "shouldn't" have too much overhead (I/O) requirements. Our standard is to assign a 40 GB boot/root VM disk to the guest host within VMware ESX (a vDisk). That's the systems' primary disk for /, /var/, /opt, /var/log/audit, etc etc. We assign additional vDisks for any major apps or DBs/DB logs (not Oracle). Those vDisks might have several logical volumes per vDisk to support that app. It's always a balance between too much complexity and performance requirements/dreams.
I always try to leave some percentage of unused/free space on the vDisk so in a pinch we can grow a LV if a volume fills up in the middle of the night or something. Then the day shift can attend to it and conduct a more permanent fix. This comes in handy at times - especially for out of control logs that developers don't attend to and for which admins don't know about. YAY
Regarding RHEL5 -vs- RHEL6. Your manager has little leg to stand on NOT going to RHEL6. If you're a DoD customer/client, then there's FINALLY, a good RHEL6 STIG benchmark out there and is fully supported. One could easily argue that RHEL6 is "better" than RHEL5 when it comes to security standards/packages. That's not to say there's not some minor differences between 5 and 6, but they are minor. It's not like jumping from RHEL5 to RHEL7 - that's a big jump. We moved all our servers from Solaris 10 to RHEL5 then RHEL6. Let me tell you the move from 5 to 6 was CAKE.
If you have start learning LVM - and you must - you can always help to ensure your work is done correctly by using the GUI system-config-lvm. (yum install system-config-lvm) But, you should occasionally make yourself and your team use the command line LVM commands. The GUI sucks to work with on small, slow WAN pipes. It's still not as cool as ZFS (old school - but AWESOME). :-)
We always use thin provisioning, however we're very unique in that my team has complete control over every aspect of their architecture. Most groups have to rely on other groups for pieces of the architecture, and in the "certain organizations", the staff for a particular group might be quite "unqualified" to be working in that group . But it is what it is. You've probably experienced that somewhere. But that's for IT therapy sessions. LOL
Good luck!
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
