Politics and Technology.

Thursday, July 5, 2007

GRUB, RHEL and a corrupted MBR


That's all the screen said.

Just "GRUB." It's almost like seeing "PC LOAD LETTER." My first reaction is "what the #$%! does that mean?" to coin the famous "Office Space" phrase.

Well, I know what GRUB is and what it does. What strikes me as why when it gets screwed up early in the bootstrap process, that's all it says on the screen. Couldn't it say "loading GRUB", or something more meaningful for me to know what stage in the bootstrapping it coughed blood?

But alas, no.


Well, I've seen this several times before, and today saw it again: when booting, and after the BIOS work is done, the system just shows "GRUB" and hangs. Nothing else happens. Pretty much, this means that there is a problem with, well, GRUB.

Unfortunately, I've never taken a serious mental note on how the problem was fixed (I handed it off to other SA's to fix in the past) so I spent several hours today doing the same thing over and over again expecting different results.


In the end, the fix was as follows.

First, boot from the RHEL boot CD. When prompted, go into rescue mode.

boot: linux rescue

Follow through on the instructions to choose your language and keyboard. Don't worry about getting network up and running, so opt not to use it. Choose "CONTINUE" when prompted whether to have it find an installed copy of RHEL or not. You want it.

Next, when presented with a shell, chroot that RHEL image.

# chroot /mnt/sysimage

This will come back with another shell. Run a "df" to confirm it made the root change so that it appears you booted from the internal disks.

Next, invoke GRUB with the right options.

# grub --batch --device-map=/boot/grub/device.map
--config-file=/boot/grub/grub.conf -no-floppy

From the GRUB shell, re-install the MBR.

grub> root (hd0,0)
grub> setup (hd0)

grub> quit

It is important to "quit" out of GRUB so that anything cached gets dumped.

My mistakes involved mainly re-installing GRUB without specifying the options, installing GRUB not on the master boot record but on the boot block of the first partition, and not using the rescue CD to execute the grub shell.

You must pay attention to your devices. For me, and the typical RHEL install, "hd0" is the root disk, and "hd0,0" is the "/boot" partition. If you have "/boot" installed on, say, your 2nd partition, you'd use "hd0,1" as your "root". You might not even have "hd0" mapped out. Review your "/boot/grub/device.map" file. For this particular RHEL install on an IBM 366 with a hardware mirrored internal drive, our file maps "sda" to "hd0". Our Dell boxes map the same way, too.


Linux said...

Gud one Dude!!

nitinr0cks said...

cool one !

sreejith said...

This has helped alot... We were fully ready to a fresh installation of our damaged RHEL server. As a last option tried it... Thanks u so much...

gapazza said...

thanks dude!