Best Practice for Setting up Partitions and LVMs

Latest response

I'm installing Oracle 11gR2 on RHEL v5 which will live on a VM in vSphere and I want to optimize performance the best I can by setting up various directories such as /tmp, /var, /opt on separate hard disks within the VM. Will the OS install give me the option to do this and I'm wondering how others have set this up for a better performance.

I'm new to LVM and not sure how to best set this up as well, so I'm looking to see what others have done or if someone could point me in the right direction.

thanks

Responses

Hey Christopher,

I'll ask the one question (before anyone/everyone else does ;-) Why RHEL 5?
PSA: Here is the release schedule for RHEL
https://access.redhat.com/articles/3078

This is a fairly sizeable discussion, but I'll throw out some light-reading ;-)

Now - on to your question:
You will have quite a bit of flexibility regarding filesystem layout. Oracle has a fairly well-defined standard as to where they expect files to go, etc.. In my environment we typically have a single Volume Group for the OS and EVERYTHING else goes on separate disk (typically SAN) in a separate VG. We try to group the Disks into VGs so that they could be exported from the host and imported elsewhere (i.e. all the binaries are in one VG).
What is your goal with your LVM setup? I.e. are you trying to deploy a STIG compliant machine, have flexibility to address storage needs at a later time, compartmentalize usage of disk space for better control...

You will be able to use either UDEV or ASMlib to manage the actual Oracle devices. Both are solid options and have their own advantages/disadvantages.

Here is an guide for best-practice on RHEL 6
https://www.redhat.com/en/resources/deploying-oracle-database-11g-r2-red-hat-enterprise-linux-6

Here is one (dated 2008)
http://www.redhat.com/f/pdf/rhel/Oracle-10-g-recommendations-v1_2.pdf

There were similar docs produced by Oracle with their recommendations as well. It might be tough to find a doc that is specific to RHEL 5 and 11gr2.

As for general filesystem recommendations
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Installation_Guide/s2-diskpartrecommend-x86.html

My management wants to hold onto RHEL 5 till the bitter end, because it is tried and true.

The previous Linux admin used LVM for all of the VMs. I am new to LVM along with being a Linux admin, so I'm kind of using what he did before me, however I'm open to other's suggestions. I just wasn't sure how or why he laid out Logical Volumes and Volume Groups the way he did.

Now we are replacing a physical Oracle machine that is EOL and when we looked at fdisk, we basically re-created what we were seeing, however we realized that we are going from physical to virtual and we realized that we need to approach this differently. Also, we are using ASM with Oracle and had to install Oracle Grid. We have a best pratice guide from VMWare about setting up Oracle, however I don't have too much experience under my belt on setting these up. Right now we are doing a pilot to prepare for the real thing.

So our main goal is to make sure it is secure (STIG) along with high availability and maintain performance, along with future growth or change, if need be.

thanks

With moving to a virtualization environment, you'll really want to ensure that your volume elements are aligned to the underlying storage subsystem. Prior to vSphere, there was a VMware tool for doing this after the fact. Under vSphere, however, you're kinda stuck doing it from the guest-level. EL6 is nice in that its default starting block is aligned to most shared storage subsystem types, but if you're sticking with EL5, you'll have to go into expert mode and do it by hand (or, use parted to do it in one sweep).

Whether prepping the /dev/sd device for use via LVM or ASM, you'll want your DB/app accessing the disk from an aligned starting point. Failure to do so can incur non-trivial performance-penalties in environments with high consolidation-ratios.

At this point, most of the Oracle DBAs I know would be asking "why are you looking to put Oracle components onto LVMs rather than ASM?". Other than Oracle binaries, you'd be leaving your vDisks as bare devices and using ASM to manage the objects.

I wanted to revisit this.

We are setting up a number of disks in VMWare/vSphere where ASM would be installed only and according to the DBA, ASM knows how to manage them and the DBA is hands off.

Now disks that pertain to the OS (like /var, /tmp and /opt) would have LVM enabled on it and I can grow these if need be ( I like the naming convention someone else recommended disk01 for a Volume Group and disk02 for a Volume Group and so forth).

Let me ask this. The storage admin had questions about how VMWare/vSphere might come into conflict with LVM. I said I wasn't sure other then where I have run out of room on disks that pertain to Linux machines, as long as I have disk space from the SAN/Data Store, I can add it via vSphere and then grow it using a range of LVM commands. So far, there hasn't been any problems, that I know of, unless someone can correct me.

In my mind, this is somewhat complex, striking a balance of performance between VMWare/vSphere, Redhat and Oracle and trying to make sure setting up is correct for the sake of performance and future growth.

Tomorrow I'm going to post my layout and would like to get feedback, if possible.

thanks

As someone who's watched DBAs botch servers and storage, I probably wouldn't simply give them access to wholly unmanaged disks - especially not in a shared-storage environment. You'll still want to prep the disks for them so that ASM does the right thing with respect to IO alignment. I'd probably also only give them access to Oracle-owned partitions via udev-rules (if your DBA's clueful, your DBA doesn't actually need root to install Oracle, nor does he need it to keep things running).

As to LVM and vSphere, there isn't really conflict. LVM doesn't particularly care that the underlying storage is presented via a virtualization layer.

That said, and independent of LVM, expanding storage on the fly - whether doing it on a physical host by expanding a SAN-presented LUN or on a virtual host through the hypervisor's storage-engine - is kind of a pain on Linux. In normal circumstances, Linux only updates its in-kernel disk geometry information boot time. You can force it to re-read the geometry for disks, but only for disks that you can fully offline. Until you've caused Linux to update its disk geometry information, you can't do the repartitioning necessary to take advantage of the additional space.

Unfortunately, you can't really fully offline the disks containing the root filesystem(s) - you're kinda stuck with a reboot if you want to grow disks hosting the root filesystem(s). Once rebooted, you're kind of limited on how you can re-partition the disk: fdisk/parted/etc. will allow you to add additional primary or extended/logical partitions, but not completely extend the existing partitions.

To me, adding partitions to a disk - rather than growing existing partitions - isn't a "clean" way to grow the disk. Primarily because adding partitions limits how many times you can grow a disk whereas changing existing partitions' geometry doesn't so limit you.

I personally refrain from naming volume groups after disks because it defeats some of the benefits of the volume group. I name them after the tier/type of storage the disks in that volume group are coming from, so when new disks are added from the same tier/profile they are added to the associated volume group. If you are naming your volume groups after disks and you want to expand the volume group by adding a disk/PV, that volume group naming will no longer be correct (ie. it will contain multiple disks).

I am not sure I follow the concerns of your storage admin regarding LVM/VMware. VMware/vSphere isn't aware of what's on the disk (LVM), it's presenting a block device... is there concerns over hot spots/balance?.

The configuration can be pretty straightforward. If you have tiered/profiled storage under your VM, you provide the OS disk to use in LVM volumes (slower tier) and the database disk to use for ASM volumes (faster tier).

Depending on the DBA they may want to put all their binaries on ASM (ACFS) too, which may cost you a small amount of faster disk ;)

EDITED
Hi Pixel,

I appreciate the input you gave... When I use an entire disk, such as a mirror for an operating system, or a raid10 for a specific server, or a raid5 for some other use, then there is no foul in using "mirror" for a volume group name, or "raid5" or "raid10". In the times I've had to extend something like a raid5, I've made another disk, but I see your point in the virtual world where I might add something to extend a volume group where disk0 really may not be relevant such as a vmdk file.

Device (fdisk, sfdisk, gparted)
Physical (pvcreate)
Volume Group (vgcreate) - this is the closest thing to the device and physical disk. So in our environment, using "mirror" or "raid5" or "raid10" or other such relevant names is at least for me superior to the defaults I've seen in the past such as VolGroup01 which does nothing relevant in immediately identifying a disk. But I like the idea of using a tier group, perhaps you could give me an example if you wouldn't mind.

Using names such as I've described cuts down searching for useless abstract names when attempting to do LVM operations in my environment with numerous customers. But your tiered naming approach seems to have merit, I'd like to hear more,
- thanks

I have been using VMs almost exclusively for RH workloads for the past 5 or so years (can't remember the last time I installed on bare metal) so my suggestions may not be particularly relevant in a lot of situations :)

In larger environments the storage is all managed centrally and the RAID configuration etc. is completely abstracted from view at the VMware level. Most configurations I have seen have presented their datastore clusters in VMware using a specific profile of disk or RAID configuration (tier). eg.
Tier 1 - SSD
Tier 2 - SAS
Tier 3 - Near line SAS / SATA

These datastore clusters are made up of datastores that match a profile created in the storage system. In this scenario I would create a VMDK from Tier 2 for the OS disk in VMware, and a VMDK on Tier 1 for DB. This would translate to VGs
vg_tier1 (or no tier1 if using ASM)
vg_tier2

Then if I needed to extend an LV in the vg_tier2 volume group, I would add an additional VMDK from the tier 2 datastore cluster and then fdisk/parted, pvcreate, vgextend to add it to vg_tier2 and expand the LV and resize the filesystem. This ensures all disks in the same VG match the same performance profile.

I have started to see this 'tiering' method/architecture disappear though, as most storage systems are now implementing dynamic tiering which moves the hot data to the higher tiers automatically. In this scenario, all datastores appear equal in VMware, and the data is shuffled based on dynamic profiling of the application. I have found a single VG appears to meet the needs of most VMs in this configurations and also gives you the benefit of sharing available disk across all LVs on the system.

I personally avoid resizing disks/LUNs/partitions. Interested to hear what others are doing.

Thanks much Pixel for the detailed response, appreciate it! Will examine/consider for my environments.

Chris, I have numerous oracle servers working fine on rhel 6.current. James is correct - you'll have to rebuild the servers to RHEL 6.current when you lose patch update ability for rhel 5 and oracle works just fine on RHEL 6 in my test/devel and production environments. (unless you really want to buy extended support for RHEL 5, which only delays the inevitable)

We have not only a separate partition for /var, but also /var/log and /var/log/audit because (at least for our environment) it makes sense based on past experience/pain.

We use a bind-mount of /var/tmp to go to /tmp

Some of our oracle storage is on a /app partition.

I can not stand the non-sensible way volume groups/logical volumes are created such as "VolGroup01" and "LogVol01" etc. Therefor, I make the volume group on VMware named "disk0" and if it is a physical server, I make the OS disk volume group name "mirror" because it's on a mirror. I then name the logical volume to give some clue to the name of the partition.

Id recommend making a separate /home drive to avert "rouge" developers or others from attempting to fill your "/" file system because "/home" was not it's own partition.

I went overkill and on my systems I even have my own /var/cache partition, and yes, again, it is overkill. I'd rather go overkill than have to deal with "/" filling up.

Those are some thoughts, and certainly take a look at this discussion on separating /var from /. Some tomcat servers will have a heavy (ab)use of /var if not properly constrained with "war" files and other things that those that use tomcat (ab)use that file system for. So if you use tomcat, consider a partition for where tomcat writes things. I've done this because I was dead sick of tomcat filling up /var.

Hope this helps

Heh: you've described all of the partitioning prescribed by the STIGs for EL6:

  • V-38455: a separate filesystem for /tmp
  • V-38456: a separate filesystem for /var
  • V-38463: a separate filesystem for /var/log
  • V-38467: a separate filesystem for /var/log/audit
  • V-38473: a separate filesystem for /home

That said, never really bought into bind-mount /var/tmp from /tmp: the two had very specific, historical reasons for existing in the first place. The STIG recommendations for changing that violates those rationales. It's also really bad if someone decides to do a STIG overlay to a system that's mounting /tmp as tmpfs (which, if you're running an A/V component - as required to be STIG compliant - is one of the few ways to keep that A/V from killing your system - especially in a virtualized environment)

Heh - yes, I didn't cite the STIG, but there it was, thanks for the info regarding bind mount Tom

Interesting you mention the /tmp and /var/tmp. I have always had issues with bind mounting because of the differences defined in the LSB/FHS.

/tmp - Temporary files
/var/tmp - Temporary files preserved between system reboots

When building on physical systems or virtual hardware like VMware's we'd put /tmp as tmpfs.

When building VMs on AWS, usually would use instances' instance-store for /tmp.

Still haven't decided what we'll do for OpenStack hosted VMs (still too early in our migration from vSphere to public/private hybrid cloud solutions)

Thanks Tom, odd that one publicly available guide from a 3 letter organization mentioned that bind mount. appreciate the input

Yeah. The unfortunately thing about "bind /var/tmp to /tmp" is that it doesn't account for why the two locations exist in the first damned place, nor does it take into account current trends in what /tmp is based on (RHEL finally adopting what the commercial Unix world has been doing for a decade-plus).

Great info everyone. Thanks for taking the time to post.

Rather than starting a new topic, is anyone using LVM thin provisioning in their production environments?

I am assuming it's primarily useful in situations where you can't thin provision lower in the stack (eg. VM Hypervisor / Storage), or using it on the hypervisor server itself (eg. RHEV/KVM). Has anyone else found a use case for it in general server land?

In general, we've avoided most types of thin storage solutions for anything other than pre-production systems. Our funding is too unpredictable and our procurement cycles are too slow to be sure that, if we over-commit, that we'll be able to keep up with the overcommit before things start coming to a grinding halt. When pre-prod falls over, it's annoying but not really business-impacting. Production...

ditto

Here is another question.

Does it makes sense to put various Linux partitions on different data stores of a SAN? For example, /opt would get its own data store, /home to get its own data store. Or will it hurt nothing if the whole OS were to be placed on one data store and then just partition from there?

Also my storage admin recommends using Thin Provisioning for all hard disks. He stated that if we take a snapshot or use vMotion that it only captures the hard space that is being used Vs Thick Provisioning, which captures all of the hard disk space, regardless of it being used or not.

thanks

Chris,

I can't really see a benefit at all to splitting mount points into separate partitions on the SAN. I think it adds unnecessary operational complexity and really provides no benefit. If you want filesystem separation LVM can do that for you (or even basic partitioning), if it's because of performance and it's OS partitions, i'd just move all the partitions to the (same) faster datastore to avoid the hassle.

Thin provisioning has a list of benefits and potential issues, it's best to balance up what works in your environment/configuration. Most places i've seen that thin provision, use the storage infrastructure to do it, and thick provision (lazy zeroed) at the VMware layer. Thin provisioning at the VMware layer can be problematic if you aren't profiling your growth and a rouge VM fills a datastore (although storage DRS helps mitigate this).

There is a good write up on the VMware blog regarding where/when to thin provision. It's well worth the read:
http://blogs.vmware.com/vsphere/2012/03/thin-provisioning-whats-the-scoop.html

In general, putting members of the root filesystems onto different datastores from each other don't make a boat-load of sense. While they can make sense if you're installing applications into those filesystems, particularly performance-sensitive applications ...you really should be keeping your applications and your root filesystems separate in the first place.

As to thin provisioning, it really depends on things like:

  • How much of a performance hit your applications able to take when it has to extend its backing store. While this tends to be less of an issue on hardware-based thin-provisioning, the hit you take when you're doing thin storage through the hypervisor can be quite significant. Hell, some applications are sensitive enough that you'll really want to use eager-zero thick provisioning.
  • How fast your procurement and deployment cycle operates combined with how good of a storage-reporting system you have. Things can get REALLY ugly if your forecasting is not accurate - especially if your procurement and/or deployment cycles are somewhere beyond "not speedy".

In general, thin-provisioning is great for dev/test - but I'd be leery of putting production on it without an EXCEEDINGLY well-designed reporting/forecasting system tied to a speedy procurement and deployment cycle.

So if I'm understanding you correctly, I was thinking about placing the following partitions on separate data stores:

/home datastore_1_101
/opt datastore _1_102
/usr datastore_1_103

And so forth. I've noticed that the previous admin that I filled in was doing that, however just because we always have done it this way doesn't mean that it will always work.

So if we place all partitions on one datastore and say we can put the different partitions on Volume Groups, this would be better in the long run, performance-wise and for future growth for space, correct?

thanks

I definitely would not have put those particular partitions onto separate datastores.

As to your closing question, I'm having difficulty parsing what you're asking ...and, at any rate, any answer would likely involve a heavy dose of "it depends".

Another quick example, on our previous RH Satellite server, we had the following partitions on different data stores, such as:

/var/satellite - datastore_01_100
/var/cache/rhn - datastore_01_101
/tmp - datastore_01_102
/home - datastore_01_103

If I'm understanding you, there are no performance benefits for having these partitions separated across different datastores.

And if we need to grow the storage space for these, we could put the different partitions on different volume groups/logical volumes to handle this, correct?

BTW, thanks for the URL on thin provisioning.

thanks

Performance benefits vary greatly based on the overall (end-to-end) solution-architecture, the capabilities of any given component and the I/O profiles of the workloads you're looking to support.

From a "best practices" standpoint, you're typically best served putting OS data and application data onto their own partitions and their ow vDisks. Depending on the complexity of your application, you might even gain further benefit from putting different parts of the application onto different filesystems and associated vDisks:

  • Oracle's a good example here because of the different performance requirements for filesystems hosting binaries, active data files, redo logs and archive logs. This is particularly so if your Oracle installation is transactional in nature, particularly if there's lots of writes.
  • Satellite, on the other hand, probably doesn't benefit as much from a complicated storage setup. Its I/O profile is heavily read-oriented. Between OS, hypervisor and storage-layer read-caching behaviors, there's likely not a lot to be gained from splitting it up six ways from Sunday. Splitting it from the OS drives mostly comes down to increasing your flexibility for growth, mobility and backups/recovery.

Also, it's good to be clear in your terms. For example, logical volumes aren't inherently interchangeable with volume-groups. I could have a single volume group that contained six logical volumes and, if I were picayune enough, I could ensure that each of those logical volumes were referencing their own underlying vDisks. If I were really pedantic, I could even ensure that the separation extended down through the hypervisor and into the array.

The fun thing is, depending on your solution components' capabilities, being that pedantic could actually harm my overall performance. Thus, much of any guidance needs to be couched in "it depends".

Chris,

Is this in a VMware context? are you referring to VMware datastores here? or logical volumes? (LVs)

If it is VMware datastores, it seems a little strange!

As Tom has mentioned, there are many layers involved In a Linux / VMware / Storage System configuration. Your environment may be very different... eg. if you are using DAS in your VMware hosts.

Here is an example of the layers:
Linux Filesystem
Linux LVM Logical Volume
Linux LVM Volume Group
Linux LVM Physical Volume
Linux Disk Partition
VMware Virtual Hard Disk
VMware Datastore Cluster
VMware Datastore
Storage System LUN
Storage System Storage Pool / Volume
Storage System RAID groups
Storage System Physical disks

You should really be splitting your OS mount points out as LVM logical volumes , these are allocated from LVM volume groups which are made up of LVM physical volumes (which in most configurations correspond to a VMDK / VMware virtual hard disk). If you need to increase the size of a logical volume, you increase space available in the volume group by adding a VMDK as a physical volume to the volume group.

For what its worth, I agree with PixelDrift - it gets too complex to separate all your minor partitions out onto different datastores - and frankly those OS-based partitions "shouldn't" have too much overhead (I/O) requirements. Our standard is to assign a 40 GB boot/root VM disk to the guest host within VMware ESX (a vDisk). That's the systems' primary disk for /, /var/, /opt, /var/log/audit, etc etc. We assign additional vDisks for any major apps or DBs/DB logs (not Oracle). Those vDisks might have several logical volumes per vDisk to support that app. It's always a balance between too much complexity and performance requirements/dreams.

I always try to leave some percentage of unused/free space on the vDisk so in a pinch we can grow a LV if a volume fills up in the middle of the night or something. Then the day shift can attend to it and conduct a more permanent fix. This comes in handy at times - especially for out of control logs that developers don't attend to and for which admins don't know about. YAY

Regarding RHEL5 -vs- RHEL6. Your manager has little leg to stand on NOT going to RHEL6. If you're a DoD customer/client, then there's FINALLY, a good RHEL6 STIG benchmark out there and is fully supported. One could easily argue that RHEL6 is "better" than RHEL5 when it comes to security standards/packages. That's not to say there's not some minor differences between 5 and 6, but they are minor. It's not like jumping from RHEL5 to RHEL7 - that's a big jump. We moved all our servers from Solaris 10 to RHEL5 then RHEL6. Let me tell you the move from 5 to 6 was CAKE.

If you have start learning LVM - and you must - you can always help to ensure your work is done correctly by using the GUI system-config-lvm. (yum install system-config-lvm) But, you should occasionally make yourself and your team use the command line LVM commands. The GUI sucks to work with on small, slow WAN pipes. It's still not as cool as ZFS (old school - but AWESOME). :-)

We always use thin provisioning, however we're very unique in that my team has complete control over every aspect of their architecture. Most groups have to rely on other groups for pieces of the architecture, and in the "certain organizations", the staff for a particular group might be quite "unqualified" to be working in that group . But it is what it is. You've probably experienced that somewhere. But that's for IT therapy sessions. LOL

Good luck!

I've never used the GUI for LVM, always has been CLI.

I always use the CLI, I had the gui betray me once.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.