Super Linux Guru required with strong boot-fu!

Peon

Expert Member
Joined
Sep 28, 2006
Messages
3,835
Reaction score
1,123
Location
In my burrow
Evening guys,

Im having an issue with a server that disrespecting my authority.

Upon booting it gets the error messaged attached in the attachment.

Its a Server running Debian Stretch. One of the drives had bad sectors and I removed it. Even with the drive in it still produced this error - its a RAID 1 array..

Ive checked the lilo.conf and made sure root or / is md0. I have edited the fstab and used md0's UUID as well. This sucker just wont boot. Please could someone point me in the right direction.

Peon20160924_210132.jpg
 
Not sure with LILO but on Grub I needed to go edit the UUID into the grub config before.

Would reckon similar thing here.
 
No joy on the boot-repair utility. Doesnt pick up the single raid drive
 
Your bootloader (LILO? - would think it would be grub unless this is a very old install) can find the /boot directory since it loads the kernel just fine. Probably got /boot sitting on its own partition,

The kernel cannot find the root (/) filesystem once it finished initialising the hardware and wants to hand over to init. Or it can find (/) but then the /etc/fstab file is stuffed. /root need to be on the same partition as the root (/) directory otherwise it won't boot. Same with /sbin.

I suspect the / filesystem is stuffed or the root filesystem / given in the boot loader in the kernel is wrong. Usually it will say it can't find the root file system though.

Best course of action -

*check the root file system passed to the kernel in your boot loader config file - if that doesn't solve it,
*boot of live cd.
* reassembly the software raid device
*run fsck on the raid file system,
*then try and mount it to see if everything is there and
*lastly inspect the /etc/fstab file to see if it needs to be updated.

If you can't boot of a live cd then try using the grub shell.


Tip: You can identify your device by device id such as /dev/sda1 or more likely /dev/md0p1 etc or use the safer approach of using UUIDs as module defined ids can change over time. To get the uuid of the root file system you can run "ls#-l#/dev/disk/by-uuid" -> replace the # with a space. Seems to set off cloudflares security filters if I post with spaces so had to obscure it. :(

Good luck - I reckon you got a good chance of coming right.
 
Last edited:
Hi Guys,

@mxc: you were right. fsck ran 1428 days ago. I had to repair the filesystem. Took almost 20 minutes. I was able to boot the system successfully using a SuperGRUB tool.

Im just a bit confused now as to why there is no boot partition or reserved boot area. Its a RAID1 array and there is no boot partition with like ReiserFS. I can donate a flashstick and make it bootable but I dont want to do that. I have checked the data and folders and boot is there. Using grub-install on a flashstick works and boots the server, but I dont want to do that, or do I have no choice?

Any ideas how I can make the LILO boot this sucker without a boot area partition?This has been a very interesting challenge.
 
Hi Guys,

Im just a bit confused now as to why there is no boot partition or reserved boot area. Its a RAID1 array and there is no boot partition with like ReiserFS. I can donate a flashstick and make it bootable but I dont want to do that. I have checked the data and folders and boot is there. Using grub-install on a flashstick works and boots the server, but I dont want to do that, or do I have no choice?

Any ideas how I can make the LILO boot this sucker without a boot area partition?This has been a very interesting challenge.

When it boots off the usb stick what does "ls /boot say"? Is it an empty directory and is there an entry for boot in /etc/fstab? You can also run "mount" and it should list the mounted file systems. It definitely could read the /boot directory when you reported your problem and didn't need to boot off a usb.

It could well be that there is no separate boot partition and everything is on root partition. With raid1 this is possible as its simply mirroring of drives so no need for fancy raid drivers in the boot loader -for reiserfs you needed a separate boot partition because lilo/grub didn't have drivers for that file systems.

Maybe your file system isn't corrupted even if fsck was last run 4 years ago. If the boot directory is on the same partition as root then it it sounds like there is just a problem with lilo config. So if there is no separate boot partition - check lilo configuration. Unfortunately its been so long since i used lilo I can't remember its syntax. Every time you change lilo config you need to reinstall it in the mbr. I doubt lilo will understand UUIDs.
 
Hi Guys,

@mxc: you were right. fsck ran 1428 days ago. I had to repair the filesystem. Took almost 20 minutes. I was able to boot the system successfully using a SuperGRUB tool.

Im just a bit confused now as to why there is no boot partition or reserved boot area. Its a RAID1 array and there is no boot partition with like ReiserFS. I can donate a flashstick and make it bootable but I dont want to do that. I have checked the data and folders and boot is there. Using grub-install on a flashstick works and boots the server, but I dont want to do that, or do I have no choice?

Any ideas how I can make the LILO boot this sucker without a boot area partition?This has been a very interesting challenge.

What about dumping LILO and just switching to Grub?
 
If you using raid1 you can just tell grub to find boot on any of the underlying block devices. eg /dev/sda1 or /dev/sdb1 etc This only works with raid1 and not other raids since its a simple mirror of one disk to another. So just try grub-install /dev/sdb or whatever your boot device is defined in the bios and set grub root to (hd1,0) or what-ever is appropriate. (Is it Grub legacy or Grub2? - there are some differences. It probably grub legacy if it can't understand raid devices.)

btw - I hope you have a backup and you do this all at your own risk :)
 
Hey mxc,

Ive tried that. Ive tried grub-install to /dev/sda1 and it wont work as it errors out. What also annoys me is that ive removed lilo yet on reboot I still hit a lilo loader. Going to try lilo -u now to see if it removes it.

This is what frustrates me here. Im not a total linux newbie but booting is very simple.

edit:
grub-install on sda - this GPT partition label contains no BIOS Boot Partition; embedding wont be possible.

grub-install on sda1 - File system ext2 doesnt support embedding.

This is why I think running it from a flashstick the grub2 boot is the only option here. The RAID array has already been created so I cant break it to create a small boot partition.
 
Last edited:
Hey mxc,

Ive tried that. Ive tried grub-install to /dev/sda1 and it wont work as it errors out. What also annoys me is that ive removed lilo yet on reboot I still hit a lilo loader. Going to try lilo -u now to see if it removes it.

This is what frustrates me here. Im not a total linux newbie but booting is very simple.

edit:
grub-install on sda - this GPT partition label contains no BIOS Boot Partition; embedding wont be possible.

grub-install on sda1 - File system ext2 doesnt support embedding.

This is why I think running it from a flashstick the grub2 boot is the only option here. The RAID array has already been created so I cant break it to create a small boot partition.

Sounds like you running a really old linux install (lilo/ext2) on some modern hardware (uefi/gpt partitioned disks)? What is the history behind the install? It will probably help understanding what is going on.

The only thing I can suggest is try putting the motherboard firmware into uefi mode, i.e not legacy bios mode, and then see if you can install grub. Your disks have been partitioned with gpt instead of mbr but you still booting using the bios so there are some compatibility issues it seems. What I don't get is how the disks could be partitioned with gpt while everything else is legacy? I think I am missing something.
 
I left the LILO mbr as is because im concerned I wont be able to install GRUB2 in its current state of RAID. Theoretically issuing the grub-install command should overwrite LILO but its not.
 
Sounds like you running a really old linux install (lilo/ext2) on some modern hardware (uefi/gpt partitioned disks)? What is the history behind the install? It will probably help understanding what is going on.

The only thing I can suggest is try putting the motherboard firmware into uefi mode, i.e not legacy bios mode, and then see if you can install grub. Your disks have been partitioned with gpt instead of mbr but you still booting using the bios so there are some compatibility issues it seems. What I don't get is how the disks could be partitioned with gpt while everything else is legacy? I think I am missing something.

Its a server I was asked to fix. I would like to format and redo the RAID using the mobo's RAID function. But its a production server that was screwed up from the start.
 
I would suggest to migrate over to new HW, or even a VM if possible.
These older Linux boxes are great, but will eventual fail (mostly from old HW).
Have seen plenty of these old Sun boxes running for years without a problem, and then one day, you install a OS patch that requires a boot, and suddenly sit without a working CPU / RAM module, etc.
 
I would suggest to migrate over to new HW, or even a VM if possible.
These older Linux boxes are great, but will eventual fail (mostly from old HW).
Have seen plenty of these old Sun boxes running for years without a problem, and then one day, you install a OS patch that requires a boot, and suddenly sit without a working CPU / RAM module, etc.

I agree. This will be my advice to my client. The RAM modules sometimes also dont post correct amount. One boot its 6gb and the next its 8gb.
 
Its a server I was asked to fix. I would like to format and redo the RAID using the mobo's RAID function. But its a production server that was screwed up from the start.

I reckon the person who is asking you to fix it isn't telling you the whole story because it doesn't add up. They probably tried to fix it themselves first, moving to new hardware etc, and messed it up. Its weird that people like to hide things from their IT support because they think somehow it saves costs but its like going to a doctor and then not telling them about the pain in the chest because you reckon it will be cheaper if you make them guess where the problem is.
 
You are correct mxc. Thats why there is LILO in the mbr or boot and GRUB on the md0 partition. I noticed someone also backed up the boot folder.

Nonetheless the little flashstick will have to do or destroy array and recreate it.

Im finishing up now and noticed on the good working drive the swap partition (sda2) is not mounting as it does not contain superblocks. Ive searched for the backup blocks and are unable to restore the backup blocks to assemble md1 (swap) with sda2.

Any advice?
 
Top
Sign up to the MyBroadband newsletter
X