Help diagnosing & fixing desktop server fault

Brawler

Honorary Master
Joined
May 23, 2006
Messages
13,180
Reaction score
4,404
Hello.

I have been asked to diagnose and fix (myself or send it somewhere) our server. Our "IT" has already tried a few things and failed to fix it so I have been asked to take a look as I know a thing or two about HW.

Spec:
1155 Sandybridge Xeon
32GB RAM
5x 1TB WD enterprise drives in raid 5 with a raid card, not sure which one.
Unrecognised mobo - doesn't look like anything too great.
800w server psu (those long and flat ones that slide in)

Problem:
The thing just randomly reboots. Sometimes at desktop and other times at whilst its busy doing something. I tried hirens boot disk with mini XP and Prime95, the one time it ran fine for 10 minutes the other time it rebooted within a few seconds. I have also tried Linux and got the same thing so I have ruled SW out.

Apparently IT has tried a different PSU and it was doing the same thing. I have also swapped out the RAM modules with others (and tested individual RAM slots).

I haven't tested the Raid setup yet (worried I'm going to mess with the config) but as it is also rebooting off live CD's I don't think its a problem with the HDDs. The data is all backed up so this is my next step.

Temperatures are good.

What else? I'm thinking the motherboard is poked. I tried Seatools first and when it got to the main menu there were visual artifacts on the screen (onboard GPU) so that further points to mobo?

I'd appreciate some advice or suggestions on where to take it to be fixed. Somewhere that really knows what they are doing and has components to swap out and test, reconfigure the RAID 5 et cetera if need be.

Thanks
 
Open device manager and see if any devices has a yellow exclamation mark.
 
You say temps are good, and I'd like to believe you, but I don't think they're good. Try re-applying thermal paste on the CPU/Fan and any other fan if there are any others running on heatsinks to make sure it's fine. Sometimes a mobo's heat temp sensor isn't always on the right spot to detect it.

However, I doubt your I.T department tried a different PSU, so I'd try that myself and see. The only reason I say this, is because your I.T department seems very incompetent if you were asked to have a look at it after they couldn't figure it out. (no offense to you or your skills, more offense to their lazy-assed skill-less nature)
 
You say temps are good, and I'd like to believe you, but I don't think they're good. Try re-applying thermal paste on the CPU/Fan and any other fan if there are any others running on heatsinks to make sure it's fine. Sometimes a mobo's heat temp sensor isn't always on the right spot to detect it.

However, I doubt your I.T department tried a different PSU, so I'd try that myself and see. The only reason I say this, is because your I.T department seems very incompetent if you were asked to have a look at it after they couldn't figure it out. (no offense to you or your skills, more offense to their lazy-assed skill-less nature)
this

@ OP those Idiot Tantrum throwers need to be raked over the coals.

an IT tech that cannot find hardware problems?
If they were programmers then different, but employed to keep the hardware running smoothly?
 
You say temps are good, and I'd like to believe you, but I don't think they're good. Try re-applying thermal paste on the CPU/Fan and any other fan if there are any others running on heatsinks to make sure it's fine. Sometimes a mobo's heat temp sensor isn't always on the right spot to detect it.

However, I doubt your I.T department tried a different PSU, so I'd try that myself and see. The only reason I say this, is because your I.T department seems very incompetent if you were asked to have a look at it after they couldn't figure it out. (no offense to you or your skills, more offense to their lazy-assed skill-less nature)

Which begs the question are they even CompTIA A+ Certified?
 
also
run the pc/server with a panel open, then you can see if all the fans turn. If not replace.
remember to clean off old thermal paste, as suggested, and replacing with a fresh layer.

If the unit has a GPU, try replacing it.
 
Well you say you tested with a different PSU and RAM.. .those would be the two I checked first.

Make sure there are no swollen capacitors on your board. It does sound a little dated. Yeah, your next bet is to get a replacement mobo and do some testing with that. If it still continues test with a different CPU.
 
You say temps are good, and I'd like to believe you, but I don't think they're good. Try re-applying thermal paste on the CPU/Fan and any other fan if there are any others running on heatsinks to make sure it's fine. Sometimes a mobo's heat temp sensor isn't always on the right spot to detect it.

However, I doubt your I.T department tried a different PSU, so I'd try that myself and see. The only reason I say this, is because your I.T department seems very incompetent if you were asked to have a look at it after they couldn't figure it out. (no offense to you or your skills, more offense to their lazy-assed skill-less nature)

+1
 
Why? That certification (like many others) means very little.


Come on now, this is possibly a hardware fault and what is A+? Hardware.

And CompTIA is the Computing Technology Industry Association. You cant just say it means very little.

The A+ exam is intended for information technology professionals who have the equivalent of 500 hours of hands on experience
 
Thanks all, I will try again in the morning as I have some of my actual job to do now, lol.

I work for a massive organisation and IT did come have a look and tried a few things but as that particular server isn't theirs they were a bit hands off. I'm sure if it was theirs they would have escalated it. My dept has IT equip of its own and of the organisation. I think the person who came to look as a noob who usually sets up outlook and such.
 
If the machine is not server grade, there is most likely a probably with the amount of ram in the machine. 1 stick at a time may prove fine in memtest. Put them together and you get issues on nearly everyboard. The voltages are not correct, the timings give issues. Mid range gamming boards are the worst at this. Re-thermal paste and make sure that CPU heatsink is on properly, not kinda on sturdy, it must be on solidly. Re-seat any cards going into the board...
 
Come on now, this is possibly a hardware fault and what is A+? Hardware.

And CompTIA is the Computing Technology Industry Association. You cant just say it means very little.

I don't care what comptia says, it's crap. Years ago the entire it section was forced to do it, was a two week course from what I remember. After three days we said fsck this and went back to work as it was mind 'destroyingly' boring, the first friday we all went to write the exam and everybody passed. It's a course for people that know absolutely jack about computers, most school kids with a interest in computers know more.
 
I don't care what comptia says, it's crap. Years ago the entire it section was forced to do it, was a two week course from what I remember. After three days we said fsck this and went back to work as it was mind 'destroyingly' boring, the first friday we all went to write the exam and everybody passed. It's a course for people that know absolutely jack about computers, most school kids with a interest in computers know more.

Im doing the A+ now and im pretty jacked up about computers and i must say somethings in there that i have never seen in my life.
 
Top
Sign up to the MyBroadband newsletter
X