Understanding HDD SMART info / Filesystem corruption..

redarrow · Dec 30, 2013

Hey,

I have a linux machine on my local network which I use primarily as a NAS box, running Ubuntu server 12.04.

It has 3 1.5tb Seagate drives in a raid 5 array. A few days ago something caused the one drive to get knocked out the array.
This is not the first time this has happened, once before it happened due to a faulty Sata port.

Anyway, in the meantime I re-added the drive to the array and it's been running ok for last couple days..
Smartmon however reported some issues with the drive: "8 Offline uncorrectable sectors" and "8 Currently unreadable (pending) sectors".
I instructed smartctl to run an "extended offline" test a few times which resulted in:

Code:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     20764         176965896
# 2  Extended offline    Completed: read failure       90%     20763         176965896
# 3  Extended offline    Completed: read failure       90%     20762         176965896
# 4  Short offline       Completed without error       00%     20761         -
# 5  Short offline       Completed: read failure       90%     20728         606232
# 6  Short offline       Completed without error       00%      4794         -
# 7  Short offline       Completed without error       00%       139         -

Clearly something is screwy on the drive, problem is I don't really know much aboud hdd 'smart' info or how serious it is.
Smart info does state the "overall-health self-assessment test result" is "passed".

Can anyone tell me if this is something to be worried about?
Should I look into replacing the drive ASAP?

The other issue I have is the filesystem had some corruption somehow (don't get how it happened as raid5 is supposed to work just fine with only 2 drives). Trashed my mysql database as luck would have it, no big deal though as I back it up like a madman

so recovery wasn't an issue.

The worrying part though is these log messages still happening (even though fsck finds no further issues) which I can't figure out:

Code:

Dec 30 08:22:02 redserver kernel: [54990.908116] EXT4-fs error (device md3): __ext4_ext_check_block:475: inode #111413832: comm linuxdcpp: bad header/extent: invalid extent entries - magic f30a, entries 41, max 340(340), depth 0(0)
Dec 30 08:22:02 redserver kernel: [54990.908579] EXT4-fs error (device md3): __ext4_ext_check_block:475: inode #111413832: comm linuxdcpp: bad header/extent: invalid extent entries - magic f30a, entries 41, max 340(340), depth 0(0)
Dec 30 08:22:02 redserver kernel: [54990.909048] EXT4-fs error (device md3): __ext4_ext_check_block:475: inode #111413832: comm linuxdcpp: bad header/extent: invalid extent entries - magic f30a, entries 41, max 340(340), depth 0(0)
Dec 30 08:22:02 redserver kernel: [54990.909510] EXT4-fs error (device md3): __ext4_ext_check_block:475: inode #111413832: comm linuxdcpp: bad header/extent: invalid extent entries - magic f30a, entries 41, max 340(340), depth 0(0)

Thanks!

redarrow · Dec 30, 2013

Here's a full dump of "smartctl -a" (had to truncate the last error in the log to fit it within the post length limit):

Code:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: 		(  643) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x30b7)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   099   085   006    Pre-fail  Always       -       15840
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1721
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       357828478
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       20781
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       335
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       229
188 Command_Timeout         0x0032   100   097   000    Old_age   Always       -       4295032960
189 High_Fly_Writes         0x003a   057   057   000    Old_age   Always       -       43
190 Airflow_Temperature_Cel 0x0022   061   051   045    Old_age   Always       -       39 (Min/Max 32/39)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1623
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1721
194 Temperature_Celsius     0x0022   039   049   000    Old_age   Always       -       39 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   031   006   000    Old_age   Always       -       15840
197 Current_Pending_Sector  0x0012   100   097   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0010   100   097   000    Old_age   Offline      -       8
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       53867479847179
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1104442962
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       2353825296

SMART Error Log Version: 1
ATA Error Count: 59 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 59 occurred at disk power-on lifetime: 20728 hours (863 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 18 40 09 00  Error: UNC at LBA = 0x00094018 = 606232

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 41 09 40 00      00:47:29.029  READ FPDMA QUEUED
  60 00 00 00 3d 09 40 00      00:47:29.025  READ FPDMA QUEUED
  60 00 00 00 5d 06 40 00      00:47:27.654  READ FPDMA QUEUED
  60 00 00 00 59 06 40 00      00:47:27.654  READ FPDMA QUEUED
  60 00 08 00 50 06 40 00      00:47:26.699  READ FPDMA QUEUED

Error 58 occurred at disk power-on lifetime: 20728 hours (863 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 18 40 09 00  Error: UNC at LBA = 0x00094018 = 606232

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 3d 09 40 00      00:40:29.802  READ FPDMA QUEUED
  60 00 00 00 39 09 40 00      00:40:29.798  READ FPDMA QUEUED
  60 00 00 00 71 06 40 00      00:40:28.655  READ FPDMA QUEUED
  60 00 00 00 6d 06 40 00      00:40:28.654  READ FPDMA QUEUED
  60 00 02 00 50 06 40 00      00:40:27.889  READ FPDMA QUEUED

Error 57 occurred at disk power-on lifetime: 20728 hours (863 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 18 40 09 00  Error: UNC at LBA = 0x00094018 = 606232

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 85 06 40 00      00:33:07.022  READ FPDMA QUEUED
  60 00 00 00 81 06 40 00      00:33:07.018  READ FPDMA QUEUED
  60 00 00 00 7d 06 40 00      00:33:07.006  READ FPDMA QUEUED
  60 00 00 00 79 06 40 00      00:33:07.006  READ FPDMA QUEUED
  60 00 08 00 50 06 40 00      00:33:05.999  READ FPDMA QUEUED

Error 56 occurred at disk power-on lifetime: 20727 hours (863 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 18 40 09 00  Error: UNC at LBA = 0x00094018 = 606232

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 3d 09 40 00      00:23:53.205  READ FPDMA QUEUED
  60 00 00 00 85 06 40 00      00:23:52.117  READ FPDMA QUEUED
  60 00 00 00 81 06 40 00      00:23:52.113  READ FPDMA QUEUED
  60 00 00 00 7d 06 40 00      00:23:52.102  READ FPDMA QUEUED
  60 00 00 00 79 06 40 00      00:23:52.102  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     20764         176965896
# 2  Extended offline    Completed: read failure       90%     20763         176965896
# 3  Extended offline    Completed: read failure       90%     20762         176965896
# 4  Short offline       Completed without error       00%     20761         -
# 5  Short offline       Completed: read failure       90%     20728         606232
# 6  Short offline       Completed without error       00%      4794         -
# 7  Short offline       Completed without error       00%       139         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Tinuva · Dec 30, 2013

redarrow said:
The other issue I have is the filesystem had some corruption somehow (don't get how it happened as raid5 is supposed to work just fine with only 2 drives). Trashed my mysql database as luck would have it, no big deal though as I back it up like a madman so recovery wasn't an issue.

Just to point out, RAID5 doesn't protect against data corruption, only against loss of a drive failure. If the data corrupts as it is written to disk, then RAID5 won't help at all. This is where something like ZFS shines over hardware raid, because it checksums the data and check it again after writting to disk. It also checks this again when you run a SCRUB to check the data on all drives, and it can actually correct it on the disks with the corrupted data.

The thing however is, for home users this silent data corruption happen so little of the time, that it mostly goes unnoticed.

That said, that drive is probably dying slowly, at some point you will have to replace it. Sooner than later is usually better, however in my experience, most shops will only replace total failed drives.

redarrow · Dec 30, 2013

Tinuva said:
Just to point out, RAID5 doesn't protect against data corruption, only against loss of a drive failure. If the data corrupts as it is written to disk, then RAID5 won't help at all. This is where something like ZFS shines over hardware raid, because it checksums the data and check it again after writting to disk. It also checks this again when you run a SCRUB to check the data on all drives, and it can actually correct it on the disks with the corrupted data.

I realise this, wasn't expecting raid to prevent corrupted data being written to the disc as such, was more thinking that if all the data is meant to be on any 2 drives then one dropping shouldn't cause any loss, but I guess there's some delay in the syncing process which would create a window during which data could be lost. Would explain why it was my mysql which got trashed as among other things I have a netflow monitor which is pretty much constantly dumping data via mysql.

Tinuva said:
That said, that drive is probably dying slowly, at some point you will have to replace it. Sooner than later is usually better, however in my experience, most shops will only replace total failed drives.

I hadn't even thought about warranty actually.. not sure if it's even covered still. But yea I doubt they'd replace it while it's actually still running anyway.

ginggs · Dec 30, 2013

redarrow said:
Smartmon however reported some issues with the drive: "8 Offline uncorrectable sectors" and "8 Currently unreadable (pending) sectors".

Download the ISO and burn a CD of SeaTools for DOS, then boot up and run the extended test on the drive. This will correct the errors by mapping the bad sectors out of the usable pool and mapping new sectors in from the spare pool. You probably won't be able to get a warranty replacement if the drive does not fail the SeaTools extended test.

I've had some drives that developed a couple of bad sectors and then continued to work for years, but generally once they start developing bad sectors they just keep getting worse.

redarrow · Dec 30, 2013

ginggs said:
Download the ISO and burn a CD of SeaTools for DOS, then boot up and run the extended test on the drive. This will correct the errors by mapping the bad sectors out of the usable pool and mapping new sectors in from the spare pool.

Thanks, I'll give this a go..

Nod · Jan 8, 2014

A fsck command for ext4, found somewhere ...

Code:
Code:

fsck.ext4 -cDfty -C 0 /dev/sdxx

Run a ext4 file system check and badblocks scan with progress info
Nothing fancy, just a regular filesystem scan that calls the badblocks program and shows some progress info. The used options are:
-c ? check for bad sectors with badblocks program
-D ? optimize directories if possible
-f ? force check, even if filesystem seems clean
-t ? print timing stats (use -tt for more)
-y ? assume answer ?yes? to all questions
-C 0 ? print progress info to stdout
/dev/sdxx ? the partition to check, (e.g. /dev/sda1 for first partition on first hard disk)
NOTE: Never run fsck on a mounted partition!

Gnome · Jan 8, 2014

ginggs said:
You probably won't be able to get a warranty replacement if the drive does not fail the SeaTools extended test.

The S.M.A.R.T offline extended test and the SeaTools extended test is the same thing.

Seagate just put a fancy GUI on the completely standard S.M.A.R.T tests.

The SeaTools "quick" test is the S.M.A.R.T Short offline test which takes 60 seconds.

The short test should also fail if any of your S.M.A.R.T parameters have triggered a failure.

That said, most people don't know that the SeaTools tests = S.M.A.R.T tests so you generally need to run the tool to get the print-out they want for warranty return.

ginggs · Jan 9, 2014

Gnome said:
The S.M.A.R.T offline extended test and the SeaTools extended test is the same thing.

I beg to differ. If after running the SeaTools long test on a Seagate drive it finds correctable errors, it offers to repair them. I don't get this option with the SMART extended test.

Running the SeaTools long test on a non-Seagate drive may well be equivalent to the SMART extended test though.

Gnome · Jan 9, 2014

ginggs said:
I beg to differ. If after running the SeaTools long test on a Seagate drive it finds correctable errors, it offers to repair them. I don't get this option with the SMART extended test.

Running the SeaTools long test on a non-Seagate drive may well be equivalent to the SMART extended test though.

The extended S.M.A.R.T test gives the LBA where the error occured. The tool thereafter probably tries to access the LBA in question repeatedly.

Other than constantly trying to access an LBA there really is nothing else a piece of software can do to "correct" an error. The fact is the LBA was identified as bad because it fails the check sum.

The tool probably follows this up by writing to the LBA in question to "clear" it so to speak (or move it to the spare space reserved for bad sectors). It will be marked bad if the check sum on an LBA fails enough times IIRC.

EDIT: I just realized, the simplest way to be sure is to run SeaTools followed by listing all S.M.A.R.T information. If they indeed use the same tests it will show up in the S.M.A.R.T log.

I don't really have time to do that but I would be surprised if their tests are anything special.

ginggs · Jan 9, 2014

Gnome said:
Other than constantly trying to access an LBA there really is nothing else a piece of software can do to "correct" an error.

Seagate write SeaTools and the firmware for their hard drives. Together, their software can pretty much do whatever they want it to do.

Gnome said:
EDIT: I just realized, the simplest way to be sure is to run SeaTools followed by listing all S.M.A.R.T information. If they indeed use the same tests it will show up in the S.M.A.R.T log.

IIRC, SeaTools test runs are not listed in the SMART self-test log.

sajunky · Jan 18, 2014

It might be late for advice in this case, but it is important.
Neither SMART test nor SeaTools will fix hard errors or 'pending' sectors, as 'pending' always means that HDD firmware is unable to deal with the case. We don't know whether error comes out in result intermittent cause like power surges, vibration, power off during internal maintenance or it is permanent condition due to firmware corruption or weak heads (G-list overflow). I run SeaTools only when I know already that HDD is for replacement.

In this case two other parameters show serious problems:
187 - reported UNC (how many different sectors were reported unreadable to the user)
195 - H/W ECC recovered (situation where reading depends on ECC)
Plenty UNC's, HDD was failing for sometime (hidden in array).

Action I suggest: 1. Remove from array. 2. Fill-up with zeros (or secure erase). 3. Check SMART. 4. If pending sectors dropped to zero continue with surface read scan (checking for errors and delays). The following programs can be used to examine delays: MHDD for DOS, Victoria for DOS and Windows, TestDisk(?) for Windows. All of them are included on Hiren's Boot CD.
Checking for delays combined with SMART report (before and after operation) is the only way to examine condition of modern drives.

redarrow · Jan 19, 2014

I have to admit that due to, erm, reasons (read: laziness

) I haven't actually done much about the situation aside from re-affirming that all valuable data is backed up.

I did get rid of the corruption on the filesystem, simply by running fsck from a rescue disk, for some reason running it at Ubuntu bootup was not correcting all the issues, go figure. Furthermore I am now fairly certain that the reason the array actually fell apart is due to a bad sata port, because I swapped the ports between that drive and one of the others and sure enough about a week later the other drive got kicked out the array (i.e., different drive in the same sata port). Due to my hack 'n patch nature my (temporary) fix was to attach a fan blowing onto the sata card, figuring that it's overheating, it hasn't broken again, but not to worry I have a new sata card on order.. should arrive any day now.

The odd thing is that the uncorrectable sector errors on the smart report seemingly repaired itself, or at least they dissapeared:

Code:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.5.0-45-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda Green (Adv. Format)
Device Model:     ST1500DL003-9VT16L
Serial Number:    5YD0L2VA
LU WWN Device Id: 5 000c50 02afa71a2
Firmware Version: CC32
User Capacity:    1*500*301*910*016 bytes [1,50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun Jan 19 12:39:18 2014 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  643) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30b7) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   109   085   006    Pre-fail  Always       -       40536
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1726
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       370722103
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21251
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       340
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       230
188 Command_Timeout         0x0032   100   097   000    Old_age   Always       -       4295032961
189 High_Fly_Writes         0x003a   057   057   000    Old_age   Always       -       43
190 Airflow_Temperature_Cel 0x0022   066   051   045    Old_age   Always       -       34 (Min/Max 28/37)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1625
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1726
194 Temperature_Celsius     0x0022   034   049   000    Old_age   Always       -       34 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   037   006   000    Old_age   Always       -       40536
197 Current_Pending_Sector  0x0012   100   097   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   097   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       164127880270561
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       471552505
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1153012166

That said, I am considering this as an advance warning of impending doom.

From looking around at HDD prices the 1.5tb drives seem quite uncommon now and not the best priced, I'm not sure if it's worth buying a new 1.5 disk only to have another one die in due course, the whole array is almost 3 years old .. perhaps time to budget for some new drives.

sajunky · Jan 20, 2014

redarrow said:
That said, I am considering this as an advance warning of impending doom.

Rather progressive deterioration, but not my last word.
#1 & #195 quick increase from 16k to 40k
#187 yet another reported UNC
Pending sector count dropped to zero, it is expected and consistent with my previous post. It will be repaired during write operation to the affected area. Question is whether errors will eventually come back? Drive with such number reported UNC'c would cause severe system corruption when used stand alone. Reformatting drive gives it a chance to self-repair (hide defects). While operating in array you are not forced to reformat it every couple months. I would give it a chance to to repair itself instead of depending on array error correction.

If you remove drive from array, zero-fill, check condition, you will know much more and in many cases extend its lifespan.

There is alternative (non-destructive) option with MHDD. It is an option to scan with erasing delays. It will scan and force relocating all sectors with delays. As array will detect errors when you put drive back into array, you can also select destructive option to repair bad blocks during scan. I recommend the later one.

Join the MyBroadband community

Get started

Understanding HDD SMART info / Filesystem corruption..

redarrow

Expert Member

redarrow

Expert Member

Tinuva

The Magician

redarrow

Expert Member

ginggs

༼ つ ◕_◕ ༽つ

redarrow

Expert Member

Nod

Honorary Master

Gnome

Executive Member

ginggs

༼ つ ◕_◕ ༽つ

Gnome

Executive Member

ginggs

༼ つ ◕_◕ ༽つ

sajunky

Honorary Master

redarrow

Expert Member

sajunky

Honorary Master