Any LVM Wizards Out There
Right now, it seems like my Google-fu is failing me. A task that used to be fairly trivial on Solaris is utterly eluding me on Linux. Specifically, how does one extract a fileystem, intact, from an LVM container so as to be able to directly mount it from the underlying /dev/sd device?
Basically, I'm looking to avoid the whole Towers of Hanoi exercise of moving the data from disk-to-disk. If I knew what blocks on disk the filesystem's LVM structures pointed to, I could probably re-fdisk the device and mount the filesystem directly from the /dev/sd device. I know Linux's LVM uses a different partition tag (8e) for LVM versus native (83) partitions - just hoping that's a label rather than a geometry change.
Ideas? Am I clear in what I'm looking to accomplish?
Responses
Linux LVM is a volume manager, like various Veritas storage solutions, Microsoft LDM, AIX volume management, etc...
There are things stored in meta-data areas of the Physical Volume (PV), and there are not always 1:1 block mappings between Physical Extents (PE) in PVs to Logical Extents (LE) in a Logical Volume (LV). This is especially the case where a LV seems like it has contiguous LEs, when it's actually spread over several sets of PEs. So even if you delete the meta-data, and somehow "redefine" the slices/partitions to address where the PEs actually are for a LV, there's no guarantee they will be contiguous and usable.
Also coming from a Solaris background, I don't see how you do this on Solaris. Unless you're thinking LVM is like a Sun Disk Label. It is not. Disk Labels are completely different (and PC/Linux LVM often resides on the legacy PC BIOS/DOS disk label or the newer GPT disk label). LVM is also not MD either, so don't confuse LVM with Solaris MD capabilities.
But more importantly ... why are you doing this?
LVM on kernel 2.6 has *0* overhead. LVM is merely just leveraging the integrated DeviceMapper (DM) facilities of the kernel. I.e., whether something is in LVM or in a "slice" of the underlying disk label (e.g., partition in legacy BIOS/DOS "MBR" partition table format), it performs the same.
I.e., The kernel accesses the blocks directly, as DM defines -- LVM, underlying slices/partitions, etc...
There's no need to ever "undo" LVM. LVM has all sorts of advantages, while I don't know of any disadvantages ... other than GRUB being unable to directly boot it (requiring a separate /boot).
Typically when you create a single logical volume to span an entire physical volume, it will use a contiguous range of physical extents. However what Bryan was getting at is that if you've extended the LV over time to take up the PV, and possibly had other logical volumes in the mix on that PV at some point, then your current LV may have multiple segments that are not necessarily in order on the disk.
I can't imagine any scenario in which your LVM layout is contributing to the high memory consumption. Have you checked top to see whats eating the memory? device-mapper and LVM are simply remapping I/O that is destined for a certain are of a logical device to a specific area on disk, and outside of LVM commands and metadata changes, there really shouldn't be any overhead.
That said, if you really want to remove LVM from the picture, you can do it as you described by repartitioning the disk to have one starting where the logical volume started previously. This isn't trivial, but it can be done.
Note: This can destroy your data. Proceed at your own risk. A backup is strongly recommended. Or better yet, just copy the data from one disk to anothre and use that instead.
In this example my PV is /dev/sda, my vg is test, and my LV is 1G named lv1.
First you need to know the range of extents used by the LV. This will only work if its a single contiguous range.
# lvs -a -o +seg_pe_ranges test
LV VG Attr LSize Origin Snap% Move Log Copy% Convert PE Ranges
lv1 test -wi--- 1.00g /dev/sda:0-255
Next you need the extent size:
# vgs -a -o +extent_size test
VG #PV #LV #SN Attr VSize VFree Ext
test 1 1 0 wz--n- 10.00g 9.00g 4.00m
So, the new partition will have to be at least 4m*255, or 1020m, large.
Now you need to know the offset to create it at on disk, which is the same as the start of the first PE:
# pvs -a -o +pe_start /dev/sda
PV VG Fmt Attr PSize PFree 1st PE
/dev/sda test lvm2 a- 10.00g 9.00g 1.00m
So, a 1020m or larger partition that starts 1m into the disk. First you should back up your metadata in case you need it again (there's probably already a copy, but best to be safe):
# vgcfgbackup test
Now remove the metadata. Note: everything must be unmounted. Commands:
# vgremove test
# pvremove /dev/sda
Now I create my partition with parted:
GNU Parted 2.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Error: /dev/sda: unrecognised disk label
(parted) mklabel
New disk label type? msdos
(parted) p
Model: IET VIRTUAL-DISK (scsi)
Disk /dev/sda: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
(parted) mkpart
Partition type? primary/extended? primary
File system type? [ext2]? ext3
Start? 1m
End? 1022m
(parted) p
Model: IET VIRTUAL-DISK (scsi)
Disk /dev/sda: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 1022MB 1021MB primary ext3
(parted) quit
Information: You may need to update /etc/fstab.
# partprobe
Now I'm able to mount it via the partition:
# mount /dev/sda1 /mnt/nfs/
#
Again, I recommend you look closer at whats actually using the memory rather than go to these lengths to remove something that may benefit you in the future and probably isn't having any impact. But, if you must, thats how it can be done.
Regards,
John Ruemker, RHCA
Red Hat Software Maintenance Engineer
Online User Groups Moderator
John -- that was exactly what I was referring to. I was afraid the LV could have been extended at different points, and had ranges of extents that may not be completely continous (even if still contiguous). I didn't want to jump to such conclusions, and leave the poster upset with me as a result. ;)
Nice to know about the "seg_pe_ranges" output field. I had never tried it before. I didn't want to send the poster down the more "raw" DeviceMapper route, which is all I could think of. Thanx for that tidbit, and it's definitely much better.
To conclude, I think you and I see eye-to-eye. As much as I wanted to go into the details of how PEs are laid out, how DeviceMapper uses the ranges, etc... and get into the DM tools, I figured the poster had a reason why he was asking. Hence my question.
As it turns out, and I had a hunch (and I thought I remembered the poster from the 32-bit PAE thread ;) ), it was about performance. And I came to the same conclusion as yourself, the performance is elsewhere.
Because whether you use LVM or not, you use DeviceMapper, always. The kernel always accesses disks on a range of blocks, whether they are referring to PE ranges, or partition/slice ranges, it's 100% the same.
I can't think of any tunables that would modify performance on LVM. Before DM-LVM2, back in 2.4 (when LVM did have overhead), readahead was set separately on a LV from a block device. But it's all the same block device now in the DeviceMapper era of LVM2.
Anything you could think of? Otherwise, I think the performance issue is in another area.
If you have Oracle on Ext3, you need to be tuning Ext3, and ensuring all systems have those tunables. Are you using Direct I/O?
Most of the recommendations I'm making, like Direct IO, are standard practice when you have Oracle atop of Ext3. If it's not enabled, it could explain your performance issues, as it makes a significant difference.
I guess what I'm saying is that there could be several, major settings/configuration differences that explain your performance issue, and are far more likely (and actually significant in impact). These are things that are easily changed and tested.
Don't know what to tell you at this point, just trying to give you possibilities that actually impact performance.
All I was saying is that it can't hurt to check if all the tunables match. The lack of Direct I/O on one system versus another is the biggest red flag I've seen in my history of Oracle implementations when the storage is Ext3.
-- Bryan
P.S. I gotta ask ... these aren't Exadata systems, are they? @-ppp
I know Exadata were shipping with Oracle Linux 5.5 (can't remember if stock or Unbreakable) just a couple of months ago when a client of mine was looking at time. I know they are often sold as an "appliance," but once a customer gets them, they are setup and managing them like any old EL box, with some NAND EEPROM acceleration and storage (per-disk) licensing of Oracle Linux.
Jan --
Just FYI, the OP actually posted an OOM thread first, and I do believe it is related (as I noted in an earlier response, which he confirmed) ...
`oom-killer` problems on RHEL 5.5 32bit (w/ PAE kernel)
The OP is very constricted here, and yet trying various things under those constraints, although some seem like they would void the 3rd party support any way.
I still do not believe this is LVM related at all. On kernel 2.4, yes, LVM impacts performance. But on kernel 2.6, the kernel DeviceMapper (DM) are in use, whether LVM2 or "raw" slices (Partitions) on the disk label (BIOS/DOS Partition Table), period, exactly the same, they always use DM.
Hi,
I saw a lot of oom (Out of Memory) problems on RHEL5, on 15-17 servers, I think. But I haven't seen an oom error on RHEL6 (so far). We have resorted to rebooting the servers weekly to try to stop the oom errors crashing the servers. It's a crude fix, but customer would not like to try the hugemem kernel on RHEL5.
I'm surprised to hear folks trying to get rid of LVM. I was hoping to do the opposite i.e. move data from a non-LVM ext4 filesystem to a LVM ext4 filesystem. This will allow dynamic expansion of the underlying LV and then the ext4 filesystem. Although I'm not sure if SAN storage is used, whether this assumption is still true. One of the biggest headaches is to run out of filesystem space and need to extend a filesystem. I try to leave some room for expansion by creating a LV, then make a filesystem about 90% the size of the LV and leave 10% for "emergencies".
Lina
Lina --
I actually tracked the OP's OOM problem first, before this thread ...
`oom-killer` problems on RHEL 5.5 32bit (w/ PAE kernel)
In a nutshell, he's got 48GiB on EL x86 (PAE36) release 5 (and not 4 or earlier, with the 32-bit the 4/4G model option). That's just a recipe for disaster, and is not supported by Red Hat with release 5, because one is always exhausting low memory.
EL x86-64 (PAE52) must be used for more than 16GiB, as Red Hat only supports up to 16GiB on EL x86 (PAE36) for release 5+.
Red Hat Enterprise Linux Technology capabilities and limits (supported[/theoretical])