RHEL 5 After kernel upgrade RAID B110i failed
Hello,
I have HP DL320 G6 (Smart Array B110i SATA) server with two 250GB HDD SATA disks. It runs RHEL 5.7 i386. I have upgraded RHEL to 5.10 and did forget to upgrade kernel module hpahcisr. After reboot, server boots with no RAID working and only one disk (/dev/sdb) works as LVM (root and /home) and /dev/sda as /boot. fdisk -l shows two disk drives.
How to restore RAID without loosing data? How to resynchronise disk newest to oldest?
Responses
Tomasz - I recommend opening a case. The people on Red Hat Customer Portal are all volunteers - so, our assistance may not be timely.
The issue you presented is confusing. If you are using hardware RAID, then your Red Hat installation should have no impact on the RAID (unless you are using FAKERAID). If you were able to install the OS, then the drivers for that controller are already available.
Tomasz,
Are you able to boot into the previous kernel temporarily? (when you reboot, make sure you know the grub password if your /boot/grub/grub.conf has a 'password' directive in it).
Does your raid array require a kernel module for some unusual reason? If it does need a kernel module to be added, you may have to add it after any kernel upgrade. This may not be the case.
I agree with James, open a case with Red Hat especially since it is a production server.
Kind Regards,
Remmele
Hi Tomasz,
You have not answered the question: Can you boot from old kernel still or is it broken too?
Please try and let us know.
I suggest to check on HP website, what to do for a kernel update to get the RAID controller module in the new kernel.
Also explain the setup of the controller's logical drives for you speak of /dev/sda and /dev/sdb.
Kind regards,
Jan Gerrit Kootstra
Thanks for the additional info Tomasz (and I apologize for still not understanding).
If you have hardware RAID, there is nothing to be done from the OS perspective (someone please correct me if there is a situation I am not familiar with). FAKERAID is a software/hardware RAID that typically does not work well with Linux. But.. since you don't have that - we'll move on ;-)
The part I really struggle with is when you mention mirroring the disk. Do you have a RAID controller that presents LUNs and then mirror with Software also?
You have a few great people from the forum responding already - so, hopefully we can get to the bottom of this.
Please explain why you feel the system is booting from "random" partitions. They likely have a different letter assigned after the re-install - but they will not likely be "random".
If you were not expecting 2 drives from your RAID controller, I would suggest you stop right away and figure that out.
Also - please run the following and let us know what you see
fdisk -l
lsmod | grep -i hp
rpm -qa | grep hpa
blkid
hpahcisr-1.2.6-7.rhel5.i686.rpm
http://h50146.www5.hp.com/products/software/oe/linux/mainstream/bin/support/doc/general/mgmt/psp/v870/psp870_rhel5_x86/hpahcisr-1.2.6-7.rhel5.i686.txt
Tomasz
See the questions/tips from James and Jan above.
--and the output of the commands James mentions will be highly useful!!!
- Please try to ---boot the system with the previous kernel--- and let us know how that goes!
- use the blkid command and determine the raid mounts you wish to mount
- Please see this link - Examine your /etc/fstab and see if you are using the node name (like perhaps /dev/hpa1 or /dev/sda1 for example)" instead of a UUID or label to mount with
-- are you using just a node name for the raid mounts in /etc/fstab? See link in the line above. - If you had a partition check out, and you were not using labels or LVM, your fstab mounts will reorder and cause issues.
-
See the other good comments made above, previously
-
VERIFY if this link has your driver, you must verify this for yourself, verify before using that driver!! SEE the link James provided above too!
Added - you mentioned you did the kernel module update before the reboot. There is a likely chance the kernel module update would not have affected the new kernel because you were not running with it yet.
Kind Regards...
Remmele
I see what your concern is - and I'll summarize just so you can validate we are on the same page
You were running 5.7 and OS would see 1 disk device (sda) which has 2 partitions (sda1 = boot, sda2 = LVM [/,swap,/home]
When you "updated" your OS to 5.10, you now see 2 disk devices (sda, sdb) which are seemingly the same. However, they -should- be 1 device and mirrored. Further alarming is that the box booted from /dev/sdb1 (/boot) and that LVM is using /dev/sda2. The fact that the installer selected /dev/sdb1 is odd, but it is not random - grub was installed and then points to that device.)
Your biggest concern is when you fix the RAID issue, which device will become the SOURCE (current) copy and which will be the DESTINATION potentially overwriting the data that has been written to since the upgrade.
Assumptions:
* booting back to the old kernel may not even possible, and you don't want to do that due to fear of how the RAID mirroring will behave (overwritting your new(er) data)
* you performed an "in-place" upgrade from 5.7 to 5.10 (as opposed to wiping the / volume)
* I believe LVM now sees 2 copies of the same Volume Group. It will then, of course, invalidate one copy.
Since we are only talking about 250GB of data, I recommend getting an external disk device of some sort and getting a copy of the volumes as they are currently. I would approach this as though you ARE going to lose data (even though you may not).
This is quite perplexing and I need more time to process all this.
I now have the same fears as you. Specifically - IF you fix the raid.. and somehow sda becomes the SOURCE to copy to sdb. Or.. which drive is the grub on, and which copy of grub will survive. I believe you CAN get out of this without data loss, but I'm not sure it will be doable via posts on a Forum, unfortunately. Again - I would approach this as though you ARE going to lose data, be extra cautious (which I believe you are doing). If you have a Red Hat partner in your metro, you may want to consider engaging them for a few hours of consultation and assistance.
I found a Q&A for your controller, it appears to actually be a software/hardware RAID (commonly called FAKERAID)
http://h18004.www1.hp.com/products/servers/proliantstorage/arraycontrollers/smartarrayb110i/questionsanswers.html
FINALLY: I would present your case to the HP forum.
http://h30499.www3.hp.com/t5/ProLiant-Servers-ML-DL-SL/DL320-G6-B110i-RAID-Controller-Rebuild-Raid-1-0/td-p/5624453#.U2Ou-jmyi-I
Even though you are running RHEL, I think the expertise from the hardware side would be beneficial here.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
