RHEL 5 After kernel upgrade RAID B110i failed

Latest response

Hello,

I have HP DL320 G6 (Smart Array B110i SATA) server with two 250GB HDD SATA disks. It runs RHEL 5.7 i386. I have upgraded RHEL to 5.10 and did forget to upgrade kernel module hpahcisr. After reboot, server boots with no RAID working and only one disk (/dev/sdb) works as LVM (root and /home) and /dev/sda as /boot. fdisk -l shows two disk drives.

How to restore RAID without loosing data? How to resynchronise disk newest to oldest?

Responses

This is production server, please help.

Tomasz - I recommend opening a case. The people on Red Hat Customer Portal are all volunteers - so, our assistance may not be timely.

The issue you presented is confusing. If you are using hardware RAID, then your Red Hat installation should have no impact on the RAID (unless you are using FAKERAID). If you were able to install the OS, then the drivers for that controller are already available.

I cant open case, my subscription is self support, this community is my only support.

What do you mean by Fakeraid?
This is hardware raid not software. But to install rhel on that server I had to boot kernel with:
Linux text dd
Then usb is needed with image of floppy disk from HP download page.
See bpahcisr.img on hp.com web page.

Tomasz,

Are you able to boot into the previous kernel temporarily? (when you reboot, make sure you know the grub password if your /boot/grub/grub.conf has a 'password' directive in it).

Does your raid array require a kernel module for some unusual reason? If it does need a kernel module to be added, you may have to add it after any kernel upgrade. This may not be the case.

I agree with James, open a case with Red Hat especially since it is a production server.

Kind Regards,
Remmele

Remmele

My raid need module to work. Rhel do not see raid without it. If you boot rhel installtion as default, disk drive is not present for kernel.

I do not have password in grub.

I am trying to explain the case:

I did not upgrade kernel module after upgrading kernel, but bufore reboot.

Server boot with random partition. In my case /boot was taken from /dev/sda1 and / taken from /dev/sdb2. So /dev/sda2 have old data not mirrored. After reboot server probably take random partition again. I can be dangeres.

My idea is to halt system.
Eject sda disk.
Boot with sdb
Update kernel module
Halt
Insert sda.
Boot again.
Let synchronise mirror disk.

Hi Tomasz,

You have not answered the question: Can you boot from old kernel still or is it broken too?

Please try and let us know.

I suggest to check on HP website, what to do for a kernel update to get the RAID controller module in the new kernel.

Also explain the setup of the controller's logical drives for you speak of /dev/sda and /dev/sdb.

Kind regards,

Jan Gerrit Kootstra

Thank you all for great support!!!

Jan, I do not want to try reboot, i am 99% sure it will not boot up again.
Why?

This server is Master production. And i have Slave production with identical hardware/software setup. And Slave did not boot up with any kernel. The same case. I had to re install it from zero.

I have check HP website, and it said that i have to download kernel module as hpahcisr-***.rpm and install it. I did it. But by mistake install wrong module. I have installed 5.9 not 5.10. Then after reboot Slave did not want boot from any kernel. HP webpage suggest it will boot up from previous kernel.

In BIOS level RAID was 100% working and operational.

Now I will explain issue with /dev/sda.

On re installed slave now fdisk -l display that:

Disk /dev/sda: 250.0 GB, 250023444480 bytes
255 heads, 63 sectors/track, 30396 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 30396 244051447+ 8e Linux LVM

This is normal working state. /dev/sda1 is /boot. And /dev/sda2 LVM with /home and /var and /.

Moving to MASTER server not working well right now fdisk shows:

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 30396 244051447+ 8e Linux LVM

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 83 Linux
/dev/sdb2 14 30396 244051447+ 8e Linux LVM

You see two disks as separate device, RAID is not working.

Thanks for the additional info Tomasz (and I apologize for still not understanding).

If you have hardware RAID, there is nothing to be done from the OS perspective (someone please correct me if there is a situation I am not familiar with). FAKERAID is a software/hardware RAID that typically does not work well with Linux. But.. since you don't have that - we'll move on ;-)

The part I really struggle with is when you mention mirroring the disk. Do you have a RAID controller that presents LUNs and then mirror with Software also?

You have a few great people from the forum responding already - so, hopefully we can get to the bottom of this.

Please explain why you feel the system is booting from "random" partitions. They likely have a different letter assigned after the re-install - but they will not likely be "random".

If you were not expecting 2 drives from your RAID controller, I would suggest you stop right away and figure that out.

Also - please run the following and let us know what you see

fdisk -l
lsmod | grep -i hp
rpm -qa | grep hpa 
blkid

hpahcisr-1.2.6-7.rhel5.i686.rpm
http://h50146.www5.hp.com/products/software/oe/linux/mainstream/bin/support/doc/general/mgmt/psp/v870/psp870_rhel5_x86/hpahcisr-1.2.6-7.rhel5.i686.txt

Thank you James for great support.

I have no idea what kind of RAID it is. HP web site said : The HP Smart Array B110i SATA Raid Hot Plug Advance Pack provides the raid support for the embedded SATA controller. The Hot Plug Advance kit is a License to enable the RAID support on Hot Plug models. It supports up to six 3.0G SATA hard disk drives. It will support a maximum of two (2) logical drives. It supports Raid 0, 1 and 1+0.

Probably it is FAKERAID.

Do you have a RAID controller that presents LUNs and then mirror with Software also?
I am not understand question. Anyway i can log in to RAID BIOS at boot time and monitor/setup RAID disk as LUN.

On Master Server command return that:

[root@wawa-slev5-b ~]# lsmod | grep -i hp
[root@wawa-slev5-b ~]#

[root@wawa-slev5-b ~]# rpm -qa | grep hpa
[root@wawa-slev5-b ~]#

[root@wawa-slev5-b ~]# find /lib/modules/ | grep hpahci
/lib/modules/2.6.18-238.el5PAE/updates/hpahcisr.ko

[root@wawa-slev5-b ~]# ls /lib/modules/
2.6.18-238.el5PAE 2.6.18-274.12.1.el5PAE 2.6.18-371.4.1.el5PAE

[root@wawa-slev5-b ~]# uname -r
2.6.18-371.4.1.el5PAE

[root@wawa-slev5-b ~]# blkid
/dev/sdb1: LABEL="/boot" UUID="f59b672a-a5c4-4f64-9720-321ed20c057d" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol01: TYPE="swap"
/dev/mapper/VolGroup00-LogVol04: UUID="a0b19fd1-4488-4a48-9d8b-016e29df91a8" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol03: UUID="1d30b1b2-a7e0-4431-a1c4-500fa9667b32" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol02: UUID="bfef3c3f-7887-4c1a-9ce7-bf916f13b4a4" TYPE="ext3"
/dev/mapper/VolGroup00-LogVol00: UUID="72900b8e-a7a2-4a84-9f22-61f9df664c31" TYPE="ext3"
/dev/sr0: LABEL="RHEL/5.6 i386 DVD" TYPE="iso9660"
/dev/sda1: LABEL="/boot" UUID="f59b672a-a5c4-4f64-9720-321ed20c057d" TYPE="ext3" SEC_TYPE="ext2"
/dev/VolGroup00/LogVol00: UUID="72900b8e-a7a2-4a84-9f22-61f9df664c31" TYPE="ext3"
/dev/VolGroup00/LogVol01: TYPE="swap"

[root@wawa-slev5-b ~]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/mapper/VolGroup00-LogVol02 on /home type ext3 (rw)
/dev/mapper/VolGroup00-LogVol03 on /var type ext3 (rw)
/dev/mapper/VolGroup00-LogVol04 on /opt type ext3 (rw)
/dev/sdb1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[root@wawa-slev5-b ~]# cat /etc/fstab
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
/dev/VolGroup00/LogVol02 /home ext3 defaults 1 2
/dev/VolGroup00/LogVol03 /var ext3 defaults 1 2
/dev/VolGroup00/LogVol04 /opt ext3 defaults 1 2
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0

The newest kernel module for RHEL 5.7 and 5.10 is:
kmod-hpahcisr-PAE-rhel5-1.2.6-16.rhel5u10.i686.rpm
kmod-hpahcisr-PAE-rhel5-1.2.6-16.rhel5u7.i686.rpm

My server was updated but not re booted.

[root@wawa-slev5-b ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 5.10 (Tikanga)

vgdisplay :
--- Physical volumes ---
PV Name /dev/sda2
PV UUID Llavx8-aQYP-ITwo-Hr4r-e1ss-TU4s-G0G5a8
PV Status allocatable
Total PE / Free PE 7447 / 0

As you see, LVM is working from /dev/sda2 and /boot is on /dev/sdb1. It is random.

Question remains : if I install kernel module kmod-hpahcisr-PAE-rhel5-1.2.6-16.rhel5u10.i686.rpm and re boot server, is it be synchronized?
I wish to be copied form latest /dev/sda2 to oldest /dev/sdb2, if revers it will be disaster.

Tomasz

See the questions/tips from James and Jan above.
--and the output of the commands James mentions will be highly useful!!!

Added - you mentioned you did the kernel module update before the reboot. There is a likely chance the kernel module update would not have affected the new kernel because you were not running with it yet.

Kind Regards...
Remmele

I see what your concern is - and I'll summarize just so you can validate we are on the same page
You were running 5.7 and OS would see 1 disk device (sda) which has 2 partitions (sda1 = boot, sda2 = LVM [/,swap,/home]

When you "updated" your OS to 5.10, you now see 2 disk devices (sda, sdb) which are seemingly the same. However, they -should- be 1 device and mirrored. Further alarming is that the box booted from /dev/sdb1 (/boot) and that LVM is using /dev/sda2. The fact that the installer selected /dev/sdb1 is odd, but it is not random - grub was installed and then points to that device.)

Your biggest concern is when you fix the RAID issue, which device will become the SOURCE (current) copy and which will be the DESTINATION potentially overwriting the data that has been written to since the upgrade.

Assumptions:
* booting back to the old kernel may not even possible, and you don't want to do that due to fear of how the RAID mirroring will behave (overwritting your new(er) data)
* you performed an "in-place" upgrade from 5.7 to 5.10 (as opposed to wiping the / volume)
* I believe LVM now sees 2 copies of the same Volume Group. It will then, of course, invalidate one copy.

Since we are only talking about 250GB of data, I recommend getting an external disk device of some sort and getting a copy of the volumes as they are currently. I would approach this as though you ARE going to lose data (even though you may not).

This is quite perplexing and I need more time to process all this.

I now have the same fears as you. Specifically - IF you fix the raid.. and somehow sda becomes the SOURCE to copy to sdb. Or.. which drive is the grub on, and which copy of grub will survive. I believe you CAN get out of this without data loss, but I'm not sure it will be doable via posts on a Forum, unfortunately. Again - I would approach this as though you ARE going to lose data, be extra cautious (which I believe you are doing). If you have a Red Hat partner in your metro, you may want to consider engaging them for a few hours of consultation and assistance.

I found a Q&A for your controller, it appears to actually be a software/hardware RAID (commonly called FAKERAID)
http://h18004.www1.hp.com/products/servers/proliantstorage/arraycontrollers/smartarrayb110i/questionsanswers.html

FINALLY: I would present your case to the HP forum.
http://h30499.www3.hp.com/t5/ProLiant-Servers-ML-DL-SL/DL320-G6-B110i-RAID-Controller-Rebuild-Raid-1-0/td-p/5624453#.U2Ou-jmyi-I
Even though you are running RHEL, I think the expertise from the hardware side would be beneficial here.

James,
thank you for additional information and URLs. I have fixed date to recovery RAID on Saturday May 9th. Until that date i have to know all issues.

  1. I will backup all data before job.
  2. I will not disconnect one disk drive, because it take too long time to synchronize disks, based on that HP forum you were post.
  3. I will upgrade hpihcisr*.rpm kernel module and try to reboot.
  4. If it fails to boot, no choice to re install server from ZERO.

Do you have any other idea to support my case and prevent Zero installation?

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.