Proxmox Freezing

acidrain

Executive Member
Joined
Jan 7, 2007
Messages
6,969
Reaction score
1,766
Location
At a computer
Hi guys,

Hopefully someone with more knowledge on proxmox can maybe assist with this. Recently the system has been freezing with the following error code:
EXT4-fs error (device dm-1) in ext4_do_update_ioode:5112: Journal has aborted
EXT4-fs error (device dm-1): ext4_journal_check_start:84 comm pmxcfs: Detected aborted journal
EXT4-fs (dm-1): Remounting filesystem read-only

The only thing that fixes it is a power cycle but yesterday was the second time in a week so it's likely going to re-occur. What is obvious is it has something to do with the filesystem. Using google with similar errors with mixed results such as
1. may be a failing hard drive
2. it could be a failing controller (PERC H700)
3. Something about passthrough and IOMMU but I would think this wouldve surfaced in the beginning already.

Also went to look through all the logs but nothing really relating to errors as such.
 
Looks like a failing drive from the info given.
That seems to be the assumption. Tried to run smartctl remotely but seems the controller is only making one drive available.

Will have to run diagnostics in the raid bios to see which drive has issues.
 
So raid manager was not much help. No way to run tests but under smart report it states no errors for all disks.

Trying to install openmanager but that also seems to not work. Perhaps someone has working instructions for openmanager for a Dell R510 running PMV8.1
 
So raid manager was not much help. No way to run tests but under smart report it states no errors for all disks.

Trying to install openmanager but that also seems to not work. Perhaps someone has working instructions for openmanager for a Dell R510 running PMV8.1
do you not have IDRAC access? it should be able to provide more info
 
My money is on one of the drives also.

Have several ProxMox servers running for years now without issues.

The only issues I have had were either my own stupidity or hardware (usually hard drives)

Do you have another server you can migrate the VM's onto so you can test those drives properly?
 
My money is on one of the drives also.

Have several ProxMox servers running for years now without issues.

The only issues I have had were either my own stupidity or hardware (usually hard drives)

Do you have another server you can migrate the VM's onto so you can test those drives properly?
I do, was hoping to avoid that since it was a bit of work migrating them to prox.

and if I do migrate, I might consider dumping the PERC controller and going with something like TrueNAS with zfs.
 
  • Like
Reactions: OCP
managed to migrate everything and pull the drives. Turns out 2 of them are SAS drives so there goes the idea of dumping the perc controller.

The 2 sata drives are in my synology. Quick smart test gave no errors. Running an extended test now to have a better idea but possibly looking like the drives are fine.
 
Non-dell drives on a PERC can cause lots of issues. Even in cases where it is a Seagate/HGST similar to what Dell sells, the firmware will be different.

It's got to do with how cache is handled. The FS corruption can happen very slowly or rapidly in specific scenarios like power loss or lockups caused by other components failing. It is possible that these drives will work okay elsewhere.

While I've done it before with some success, I wouldn't recommend that anyone use a drive that isn't from dell on PERC. iDRAC not showing the drive info means you are living on the edge.
 
Top
Sign up to the MyBroadband newsletter
X