Nate's picture

Posted 2009-05-19 10:43 by Nate

There has been a lot of talk about the failure rate of the Seagate 1.5TB hard drives since they launched, and it's time for me to share my experiences. I've handled around 15 or so of these drives personally, and I've talked with Seagate about the issues with reallocated sectors and error rates. Let's start off by demonstrating how to see if your drive is affected.

SMART Status

This is a screen shot of a piece of software called "CrystalDiskInfo" and it can, and should, be download over here. The installation package works pretty flawlessly, but the portable option is really nice to throw on a flash drive and use in the field. Notice in the screen shot where it has a yellow caution bubble? That means your drive is failing.

The Reallocated Sectors Count is a measure of the number of times that the hard drive could not read from or write to a sector on its platter, so it utilized some error correcting technology to figure out what was in there, and then re-wrote the sector in its spare sectors area. This is actually a really neat trick, and can keep a failing drive from ruining your day. This is also how some people make their hard drives larger than they ship from the factory, by unlocking that spare space.

On a healthy drive, this count should be zero, however I still consider drives that have less than 10 Reallocated Sectors to be healthy. If your drive has more than 10 of these, you should replace it as soon as possible. The absolute highest I've ever seen this count is 2000, which was on a drive that was obviously crashed. The platter bearing was squealing when it spun up, and no data was retrievable without data recovery service. Smartly, that drive was part of a RAID-5 array, so it was just replaced and the array rebuilt itself with no data loss.

The experience that pushed me to write this piece was the building of an iSCSI box for a client. We started with a Thecus N5200, which is a pretty neat little NAS device. We decided to go with the biggest 7200 RPM drives on the market, which are the Seagate 1.5TB Barracudas. We purchased a total of 4 of these drives for the new NAS and started by using one of them in an open-air environment to move some data. This was the start of the trouble.

When an associate went to pick up the drive after having nearly 1.3TB of data written to it, his finger nearly blistered from the burn caused by the drive. This was just sitting on top of an aluminum case, which would be wicking some heat away, in a fairly warm server room. Needless to say, that drive failed shortly afterwards, so we shipped it back and ordered another one to replace it.

Once we began building the actual RAID-5 array in the Thecus N5200, we looked at the temperatures of the drives. They were staying in the 45 degrees Celcius range, which is pretty high for any drive, but shouldn't necessarily cause failure. We decided to give it a go, and sure enough, drive 3 was the first to show the dreaded Reallocated Sectors Count. Drive 2 followed shortly after wards, so we decided to move from the Thecus to a cooler working environment.

Enter the Lian-Li PC-K7B case with two 120mm fans blowing directly over the hard drive cage. I designed and built this custom NAS machine around the excellent Asus P5Q-EM motherboard and a dual-core Celeron E1400 with 2GB of DDR2-800. For the software, we decided to use Openfiler, which brings Novell's iSCSI solution to an easy-to-configure web interface. Amazingly, this solution wound up being considerably less expensive than the Thecus while having about twice the horsepower and a lot more airflow.

So we loaded up the two remaining good drives and two more brand-spankin'-new drives only to be greeted by my least favorite noise from one of the new drives, the sound of heads clicking. This DOA can be expected when you have to sift through this many drives, so we boxed it up and sent it back. We had a spare drive that we were waiting to get back from RMA, so we decided to build a smaller array with the three remaining drives, just to make sure they would be solid, and then wait on the other drive to actually start moving data. We threw about 300GB of data at the array to benchmark the array, and then I came back a couple of days later to install the new, new drive.

I decided to check on the status of the drives with a handy little utility that comes built into most Linux's these days called smartctl. Run "smartctl -A /dev/sda" and you'll get a list of SMART data for the drive: /dev/sda. I was only looking for two things, temperature and the Reallocated Sector Count. Temperatures were awesome, with these drives staying around 35C, but the Reallocated Sector Count had reared its head again. This time, of the three drives, one was totally failed out and the other two were showing less than 10 Reallocated Sectors, however that was after throwing just 300GB at this array. So we got rid of the heat, but the failures were still coming.

I wish I had kept better records of this whole process, but I didn't. If I had, I would've seen that we killed around 7 or so hard drives, and never did find a good one. I was very frustrated at this point, so I called up Seagate and explained that I have been in the Seagate Partner Program for nearly 2 years and was very concerned about this issue. The representative looked up the failure rate on these drives, 0.3%, and claimed that I might just have gotten drives from a bad batch. I said that makes sense and I'm sure that Seagate would be happy to send me some new drives so that we could get this situation behind us. He did not agree, and was unable to authorize sending me new drives.

At this point, I decided to check in on a couple of other drives that I had sold over the past few months, the first of which was the drive who's failing status is pictured above. I also checked in on another drive that probably was purchased within the first month of their production, and that one has not produced any Reallocated Sectors yet. However, it does have a scary high "Read Error Rate" and "Seek Error Rate" which seems to be par for the course for these drives. Luckily, the hardware error correction has saved his data to this point, but would you trust a drive that is relying on error correction to keep your data? I wouldn't.

So, needless to say, to remedy the NAS server, we purchased 5 Western Digital Caviar Black 1TB drives and installed them yesterday. We threw about 500GB of data at them, and they are still showing zeros for Read Errors and Seek Errors, no hardware ECC needed. They are sitting pretty at 35C, and will hopefully run forever. At least they have a 5 year warranty, which is something Seagate can't deliver anymore, apparently.

I also have a Western Digital Green Power 1.5TB drive on the way to replace my failing Barracuda, and I'll try to share my thoughts on that solution once it arrives and I have some time with it. I understand that they are not really acceptable for RAID solutions, but the new RE4-GP is, and it weighs in at 2TB. It's quite costly compared to the Seagates, but I'd be willing to bet that your data is much safer on that drive if you need big capacity.

As always, Seagate is more than welcome to email me their thoughts that I will gladly put at the end of this piece, but I hope that I might have saved some data with these ramblings.

I have three of these: Model

I have three of these:

Model Family:     Seagate Barracuda 7200.11
Device Model:     ST31000340AS

These are a part of the infamous family that issues with playing video and data loss. There is a firmware patch which I haven't applied yet. I have them in RAID 5 on a Debian GNU/Linux server and haven't had any noticable issues. However I decided to look at the SMART data. I have no high reallocated sectors but all of them seem to have high "Read Error Rate" and "Seek Error Rate", have a look:

 

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       209861720
  3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000f   067   060   030    Pre-fail  Always       -       107496971103
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       10336
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       31
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   099   000    Old_age   Always       -       8590065666
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   068   056   045    Old_age   Always       -       32 (Lifetime Min/Max 28/33)
194 Temperature_Celsius     0x0022   032   044   000    Old_age   Always       -       32 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   029   022   000    Old_age   Always       -       209861720
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0


SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   105   099   006    Pre-fail  Always       -       9982673
  3 Spin_Up_Time            0x0003   092   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       68
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   073   060   030    Pre-fail  Always       -       25892725261
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       10341
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       3
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       68
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   099   000    Old_age   Always       -       8590065666
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   064   045   045    Old_age   Always   In_the_past 36 (Lifetime Min/Max 32/37)
194 Temperature_Celsius     0x0022   036   055   000    Old_age   Always       -       36 (0 16 0 0)
195 Hardware_ECC_Recovered  0x001a   032   029   000    Old_age   Always       -       9982673
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       144336664
  3 Spin_Up_Time            0x0003   092   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       68
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   070   058   030    Pre-fail  Always       -       51664173256
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       10347
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       4
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       68
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   099   000    Old_age   Always       -       8590065666
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   045   045    Old_age   Always   In_the_past 34 (Lifetime Min/Max 31/36)
194 Temperature_Celsius     0x0022   034   054   000    Old_age   Always       -       34 (0 16 0 0)
195 Hardware_ECC_Recovered  0x001a   034   029   000    Old_age   Always       -       144336664
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

should I be looking at the raw values? if so those are terrible aren't they?

 

I really should upgrade the firmware on these drives.

 

FWIW these drivers were installed in March of last year and have been running fince since.

 

JB

JBstrikesagain's picture
Posted by JBstrikesagain on Wed, 2009-06-03 12:41
There is one reallocated

There is one reallocated sector on the first drive, which indicates it is in the process of dying, and should be replaced. While it will probably run for a while without trouble, performance will certainly degrade as more sectors are pushed to the spare space.

After I wrote this piece, I started checking all the Seagate drives I have access to, and all of them are atleast 7200.10's. They all exhibit this strange error behaviour, even though some haven't actually failed:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   092   076   006    Pre-fail  Always       -       201975484
  3 Spin_Up_Time            0x0003   097   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       69
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   062   058   030    Pre-fail  Always       -       1619850632058
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       11479
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       69
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   061   042   045    Old_age   Always   In_the_past 39 (Lifetime Min/Max 36/43)
194 Temperature_Celsius     0x0022   039   058   000    Old_age   Always       -       39 (0 24 0 0)
195 Hardware_ECC_Recovered  0x001a   063   057   000    Old_age   Always       -       12076313
 

That's from my 750GB drive. Seek errors and read errors are insanely high, but ECC has recovered all of them. I'm starting to think that this is just how Seagate rolls, so I'm not going to be a part of it anymore, and am switching out all my drives to WD's.

Nate's picture
Posted by Nate on Wed, 2009-06-03 12:49
BAH! this is annoying! I

BAH! this is annoying! I thought seagate was supposed to be one of the best..

according to http://support.seagate.com/customer/warranty_validation.jsp my warranty is valid until December 2012.. I may upgrade my RAID array to something else (WD) and use these for other purposes (desktop usage?) and when/if they become unusable get them replaced?

 

this is dissapointing.

JBstrikesagain's picture
Posted by JBstrikesagain on Thu, 2009-06-04 08:48
It really pissed me off when

It really pissed me off when I figured out what was going on also. First they dropped their warranty from 5 to 3 years, then this. Ohh well, live and learn, right?

Nate's picture
Posted by Nate on Thu, 2009-06-04 08:57
I loaded CrystalDiskInfo at

I loaded CrystalDiskInfo at Nate's suggestion only to discover that my C: drive was not recognized by the software:

I guess I don't have a Seagate system disk. What surprised me was that the software found and reported on one of my USB drives:

Am I right in thinking that this drive is on the brink?

Cheers... Brian

bms44974's picture
Posted by bms44974 on Fri, 2009-06-12 13:20
Hi Brian,   The drive's a

Hi Brian,

  The drive's a little warm, but that's pretty standard for the external drives, since they don't have much airflow over them. It still shows that strange behavior with the high seek errors and high read errors, however the reallocated sector count is zero, so it should be alright.

  I'm not sure why the C: drive isn't showing up. Usually this software can talk to just about any drive. You can try another utility called "Speedfan" which has a new version supporting more chipsets. It's available here.

-Nate

Nate's picture
Posted by Nate on Fri, 2009-06-12 13:33
Nate, The way I read it, the

Nate, The way I read it, the reallocated sector count is 100. What should I be looking for?

I downloaded speedfan and here are the results for my C: drive.

bms44974's picture
Posted by bms44974 on Fri, 2009-06-12 13:57
Brian,   It's showing zeros

Brian,

  It's showing zeros in the Raw Value field, so you're alright. All of your other values look like the ones we've seen in the functional drives. Be sure to check up on it every now and then though.

 

-Nate

Nate's picture
Posted by Nate on Fri, 2009-06-12 14:07