Hard Drive Failure: Advice and Tips Thread

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
Looked for a thread or sticky with a similar title and couldn't find one (confesses, didn't look very hard but so there :p)

Anyhow, having recently suffered the all too common misfortune of a drive going bad, I started looking for tools that help you recover from the situation as fast as possible ie. get as much of your good stuff off the dodgy drive as fast as possible with the least hassle.


Please read the other replies below - I am not qualified in this area and am only offering advice based on my own experience - other, more competent people have commented extensively below :D

Hopefully this can serve as help to others and a 'first line' data recovery FAQ of sorts.

Tools Used:
- Hard Disk Sentinel
- Roadkil's Unstoppable Copier

1. Used Hard Disk Sentinel to determine extent of problem (SMART showed some errors; then non-destructive scan showed many, many more)

2. Used Roadkil's Unstoppable Copier - this application is incredible - it does exactly what the desperate owner of a failing hard drive needs - copies the good stuff off the dodgy drive and when it hits a bad file, tries a number of times (user parameter - I set mine to try once only - and set to "fastest recovery"), and then moves on to the next file - for those of you who have had to do this, you will understand what I mean when I say that this application solves this singularly most frustrating problem - it does not get stuck in loop - once it encounters a read error, it moves on.

Good luck all - will keep this thread updated as I discover more useful tips/tricks/applications.
 
Last edited:

SauRoNZA

Honorary Master
Joined
Jul 6, 2010
Messages
47,910
Testdisk is my first stop.

Fortunately I've never needed a second stop.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
Testdisk is my first stop.

Fortunately I've never needed a second stop.

Yeah - Testdisk is great :thumbsup:

It's my second stop after I've retrieved the good stuff off the drive. Where Roadkil excels is getting the stuff that's undamaged off quickly so that you can do a quick assessment and determine if you care about the stuff it couldn't get off in the first pass.


PS: Off topic - :wtf: do we have to have such a poor selection of emoticons on MyBB - most unbecoming of such a premier forum
 
Last edited:

ponder

Honorary Master
Joined
Jan 22, 2005
Messages
92,883
gnu ddrescue
testdisk/photorec

On a failing drive I would rather image the drive with gnu ddrescue, mount the image and then recover from the image.
 

SouthBit

Dealer
Joined
Jan 6, 2011
Messages
1,170
There's a huge amount to say on this topic of hard drive failure and data loss, but I'll give a very brief contribution as it's my field:

Logical failure

This implies that the hard drive itself is perfect, that hardware is functioning 100%, but the data loss is on a software level. For example the user has formatted his drive, deleted something by mistake, reinstalled the OS, virus attack etc etc. I'd say that this is a DIY situation, there are many very capable software packages that can help you in this situation. R-studio and Data Rescue are two we use on a daily basis.

Hardware failure

This implies there is something wrong with the hard drive itself. There is a huge amount that can go wrong with a hard drive as it's a highly complex bit of kit. The most common failures are:

- Head failure: You'll hear a ticking or clicking noise as the heads cannot locate themselves and fly around the surface of the disk, hitting the limiters and creating that noise most of you have probably heard at some stage. Solution is to swap heads in the lab and image on special hardware.

- PCB failure: This usually only ever happens when a user plugs the wrong power supply into their external hard drive. Sure, PCBs do fail by themselves, but very rarely. And no, you can't just find a similar PCB and attach it to the drive and think it will work. The PCB contains unique info pertaining to the drive it's mated to.

- Bad sectors: This is very common. Hard drives have a certain number of 'spare sectors' that can be used as the drive gets old and becomes degraded. As a sector becomes bad, the drive's firmware will remap that bad sector to a spare. This is normal and you won't notice anything. However, once the pool of spares is used up, the firmware doesn't know what to do and the drive can do a number of things. It might freeze your machine, the drive might not even detect as the firmware has locked up due to the G-list (firmware module of defective sectors) being full, etc. Solution is to rectify firmware issue/s and image on special hardware.

- Stiction: This is when the heads, that normally fly nanometers over the surface, fail to park properly and end up stuck to the platter/s when the drive powers off. If your drive does not spin up and you hear a buzzing or beeping noise, your drive most likely has stiction. It happens sometimes by chance, other times by not properly disconnecting your drive. Solution is to open in the lab and free the heads (sometimes they need to be replaced) and image on special hardware.

- Firmware issues: Hard drives run a mini OS in order to work, and this OS is in the form of firmware modules which are stored on the hard drive surface. If there is a problem with these modules (corrupt, unable to be read) then the drive will not function correctly. Resolution is to repair/replace firmware modules using special hardware.

There are other problems that can occur, and the sections above can be broken down into much finer detail.

I would recommend that if your data is important to you/not replaceable, that you book your drive in for professional data recovery. Sure, there are some DIY routes such as ddrescue for drives with bad sectors for example, but these DIY can worsen the problem from one that was easily recoverable to a drive that is no longer recoverable by any means.

Here is some info, with pictures, about the basics of what a hard drive looks like inside and how it works:
http://www.southbit.co.za/inside-a-hard-drive/

And some recovery type FAQs
http://www.southbit.co.za/faq/

Checking the SMART status is an easy way to keep an eye on your drive's health, there are many free software tools available to do this.

I'm happy to answer any hard drive or recovery related questions, hope it helps.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
Any chance a mod could sticky this thread - it certainly is a worthwhile sticky to have on a tech site.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
What do you guys reckon is the best, windows based, free drive imaging application

I know Hiren's BootCD contains some of the best non-windows utilities, but some people are only comfortable within a Windows environment.
 

SouthBit

Dealer
Joined
Jan 6, 2011
Messages
1,170
For healthy drives or damaged drives? Windows is terrible at handling any drives with errors.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
@Southbit, my external that gave extensive read errors is the process of being 'refreshed' by Hard Disk Sentinel - so far the refresh appears to be successful (62% of the way through with 1 sector classified as damaged).

What's the best way of testing if the drive is still usable?

I of course have now lost all confidence in the drive (it's an LG branded drive with a WD inside) - I have the original box and noted that it has a two year warranty (can't find the original invoice/receipt though) - going to see if I can get it replaced under warranty.
 

sajunky

Honorary Master
Joined
Nov 1, 2010
Messages
13,124
@Southbit, my external that gave extensive read errors is the process of being 'refreshed' by Hard Disk Sentinel - so far the refresh appears to be successful (62% of the way through with 1 sector classified as damaged).

What's the best way of testing if the drive is still usable?
You didn't mention these tests involved reading only or reading and writing. If later and still receive bads, you did your tests already.
On modern drives if after read/write scan there is any single damaged sector, it means that firmware is unable to hide defects. It happens when firmware is corrupt and unable to hide defects, or there is to many defects (all spare sectors had been used already). In either case drive is qualified for replacement.

In many cases after filling drive with zeros it shows extensive delays during reading, but still do not report errors. There is no utility which tells you straight away that such drive qualifies for replacement, you have to make your own decision based on SMART report and delays report during full surface scan. Utilities capable to do such tests are MHDD (DOS), Victoria (DOS/Windows) or Testdisk. All of them you can find on Hirens Boot CD. Boot from Hiren's CD with only one drive connected to avoid zeroing wrong device. :)
 
Last edited:

NomNom

Executive Member
Joined
Oct 23, 2009
Messages
5,018
Any chance a mod could sticky this thread - it certainly is a worthwhile sticky to have on a tech site.

Yeah agreed, the other ones either need an update or should be unstickyied* IMO.

*New word.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
You didn't mention these tests involved reading only or reading and writing. If later and still receive bads, you did your tests already.
On modern drives if after read/write scan there is any single damaged sector, it means that firmware is unable to hide defects. It happens when firmware is corrupt and unable to hide defects, or there is to many defects (all spare sectors had been used already). In either case drive is qualified for replacement.

In many cases after filling drive with zeros it shows extensive delays during reading, but still do not report errors. There is no utility which tells you straight away that such drive qualifies for replacement, you have to make your own decision based on SMART report and delays report during full surface scan. Utilities capable to do such tests are MHDD (DOS), Victoria (DOS/Windows) or Testdisk. All of them you can find on Hirens Boot CD. Boot from Hiren's CD with only one drive connected to avoid zeroing wrong device. :)

Personally I use Winhex, very useful tool and works wonders for imaging.

Thanks.

re: tests, it's the destructive write test - it appears to have gotten stuck on Block 6331 - have restarted from that block, but based on that, I think it's a goner.

I have my original LG box that has the warranty sticker and serial number on it but now I can't find the darn invoice and can't remember who I bought it from....:(
 

sajunky

Honorary Master
Joined
Nov 1, 2010
Messages
13,124
re: tests, it's the destructive write test - it appears to have gotten stuck on Block 6331 - have restarted from that block, but based on that, I think it's a goner.
Are you saying drive hangs on this block or report error and scan continue? What happened next? Does it stuck in some other places or continue to the end? Hanging might be due to the incorrect error handling by USB bridge controller or by drive itself. Better picture would give direct SATA connection.
What is result of surface scan after destructive write? Still extensive errors? Can you post summary report?

Single hard errors (or device hanging) can be fixed by regenerating translator. Multiple errors or delays usually indicate weak or damaged heads, there is no cure for that.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
Are you saying drive hangs on this block or report error and scan continue?

So far it seems to be hanging on this block (I've had to interrupt the scan today to attend to other things) - I've re-initiated what HDS [Hard Disk Sentinel] calls a 'refresh' at the offending block location to see what happens next. The seven blocks (per the HDS picture map) immediately preceding this one also gave errors but were eventually passed over).

What happened next?

It gave an HDS Error 51 which led me to discover your conclusion below re: the bridge controller

Does it stuck in some other places or continue to the end?

So far, this is the first place where it has gotten stuck to such a degree

Hanging might be due to the incorrect error handling by USB bridge controller or by drive itself. Better picture would give direct SATA connection.

It's a sealed external unit - don't want to take it apart if I can still return it under warranty

What is result of surface scan after destructive write?

So far it hasn't completed yet :(

Still extensive errors?

Seems to have a bad patch of extensive errors in the 'centre' of the drive

Can you post summary report?

Will do when I get one

Single hard errors (or device hanging) can be fixed by regenerating translator. Multiple errors or delays usually indicate weak or damaged heads, there is no cure for that.

Thanks for all the input - much appreciated.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
No longer proceeds - I reckon this drive is toast


Failure Predicted - Attribute: 1 Raw Read Error Rate, Errors occurred while reading raw data from a disk. Indicate problem with the disk surface or the read/write heads.
There are 374 bad sectors on the disk surface. The contents of these sectors were moved to the spare area.
Based on the number of remapping operations, the bad sectors may form continuous areas.
There are 38 weak sectors found on the disk surface. They may be remapped any time in the later use of the disk.
Replace hard disk immediately.

It is recommended to backup immediately to prevent data loss.
 

sajunky

Honorary Master
Joined
Nov 1, 2010
Messages
13,124
OK, you can stop scanning now. There is good chance that heads are still in good condition.
Seems to have a bad patch of extensive errors in the 'centre' of the drive
There is a bad patch, but generally run smooth, right?
Translator regeneration might help, but if you have invoice, better find it.
I can send you instruction how to regenerate translator, I know Southbit will stay quiet on this isssue, it enters DR area. In short it would require connect drive to SATA controller, change settings in BIOS to ATA compatibility mode and execute MHDD 4.5 script. Power off/on and fill up drive with zeros. If doesn't lockup again, is fine. Afterwards (also in MHDD) usual health determination by full read scan for delays and monitoring SMART before and after.
 

AfricanTech

Honorary Master
Joined
Mar 19, 2010
Messages
40,418
OK, you can stop scanning now. There is good chance that heads are still in good condition.
There is a bad patch, but generally run smooth, right?
Translator regeneration might help, but if you have invoice, better find it.
I can send you instruction how to regenerate translator, I know Southbit will stay quiet on this isssue, it enters DR area. In short it would require connect drive to SATA controller, change settings in BIOS to ATA compatibility mode and execute MHDD 4.5 script. Power off/on and fill up drive with zeros. If doesn't lockup again, is fine. Afterwards (also in MHDD) usual health determination by full read scan for delays and monitoring SMART before and after.

Thanks. Let me look for that invoice first.
 

SouthBit

Dealer
Joined
Jan 6, 2011
Messages
1,170
Yes I will stay quiet here :) The drive is no longer useful to an end user as a reliable means of storage.
 
Top