
Posted 2009-05-19 10:43 by Nate
There has been a lot of talk about the failure rate of the Seagate 1.5TB hard drives since they launched, and it's time for me to share my experiences. I've handled around 15 or so of these drives personally, and I've talked with Seagate about the issues with reallocated sectors and error rates. Let's start off by demonstrating how to see if your drive is affected.

This is a screen shot of a piece of software called "CrystalDiskInfo" and it can, and should, be download over here. The installation package works pretty flawlessly, but the portable option is really nice to throw on a flash drive and use in the field. Notice in the screen shot where it has a yellow caution bubble? That means your drive is failing.
The Reallocated Sectors Count is a measure of the number of times that the hard drive could not read from or write to a sector on its platter, so it utilized some error correcting technology to figure out what was in there, and then re-wrote the sector in its spare sectors area. This is actually a really neat trick, and can keep a failing drive from ruining your day. This is also how some people make their hard drives larger than they ship from the factory, by unlocking that spare space.
On a healthy drive, this count should be zero, however I still consider drives that have less than 10 Reallocated Sectors to be healthy. If your drive has more than 10 of these, you should replace it as soon as possible. The absolute highest I've ever seen this count is 2000, which was on a drive that was obviously crashed. The platter bearing was squealing when it spun up, and no data was retrievable without data recovery service. Smartly, that drive was part of a RAID-5 array, so it was just replaced and the array rebuilt itself with no data loss.
The experience that pushed me to write this piece was the building of an iSCSI box for a client. We started with a Thecus N5200, which is a pretty neat little NAS device. We decided to go with the biggest 7200 RPM drives on the market, which are the Seagate 1.5TB Barracudas. We purchased a total of 4 of these drives for the new NAS and started by using one of them in an open-air environment to move some data. This was the start of the trouble.
When an associate went to pick up the drive after having nearly 1.3TB of data written to it, his finger nearly blistered from the burn caused by the drive. This was just sitting on top of an aluminum case, which would be wicking some heat away, in a fairly warm server room. Needless to say, that drive failed shortly afterwards, so we shipped it back and ordered another one to replace it.
Once we began building the actual RAID-5 array in the Thecus N5200, we looked at the temperatures of the drives. They were staying in the 45 degrees Celcius range, which is pretty high for any drive, but shouldn't necessarily cause failure. We decided to give it a go, and sure enough, drive 3 was the first to show the dreaded Reallocated Sectors Count. Drive 2 followed shortly after wards, so we decided to move from the Thecus to a cooler working environment.
Enter the Lian-Li PC-K7B case with two 120mm fans blowing directly over the hard drive cage. I designed and built this custom NAS machine around the excellent Asus P5Q-EM motherboard and a dual-core Celeron E1400 with 2GB of DDR2-800. For the software, we decided to use Openfiler, which brings Novell's iSCSI solution to an easy-to-configure web interface. Amazingly, this solution wound up being considerably less expensive than the Thecus while having about twice the horsepower and a lot more airflow.
So we loaded up the two remaining good drives and two more brand-spankin'-new drives only to be greeted by my least favorite noise from one of the new drives, the sound of heads clicking. This DOA can be expected when you have to sift through this many drives, so we boxed it up and sent it back. We had a spare drive that we were waiting to get back from RMA, so we decided to build a smaller array with the three remaining drives, just to make sure they would be solid, and then wait on the other drive to actually start moving data. We threw about 300GB of data at the array to benchmark the array, and then I came back a couple of days later to install the new, new drive.
I decided to check on the status of the drives with a handy little utility that comes built into most Linux's these days called smartctl. Run "smartctl -A /dev/sda" and you'll get a list of SMART data for the drive: /dev/sda. I was only looking for two things, temperature and the Reallocated Sector Count. Temperatures were awesome, with these drives staying around 35C, but the Reallocated Sector Count had reared its head again. This time, of the three drives, one was totally failed out and the other two were showing less than 10 Reallocated Sectors, however that was after throwing just 300GB at this array. So we got rid of the heat, but the failures were still coming.
I wish I had kept better records of this whole process, but I didn't. If I had, I would've seen that we killed around 7 or so hard drives, and never did find a good one. I was very frustrated at this point, so I called up Seagate and explained that I have been in the Seagate Partner Program for nearly 2 years and was very concerned about this issue. The representative looked up the failure rate on these drives, 0.3%, and claimed that I might just have gotten drives from a bad batch. I said that makes sense and I'm sure that Seagate would be happy to send me some new drives so that we could get this situation behind us. He did not agree, and was unable to authorize sending me new drives.
At this point, I decided to check in on a couple of other drives that I had sold over the past few months, the first of which was the drive who's failing status is pictured above. I also checked in on another drive that probably was purchased within the first month of their production, and that one has not produced any Reallocated Sectors yet. However, it does have a scary high "Read Error Rate" and "Seek Error Rate" which seems to be par for the course for these drives. Luckily, the hardware error correction has saved his data to this point, but would you trust a drive that is relying on error correction to keep your data? I wouldn't.
So, needless to say, to remedy the NAS server, we purchased 5 Western Digital Caviar Black 1TB drives and installed them yesterday. We threw about 500GB of data at them, and they are still showing zeros for Read Errors and Seek Errors, no hardware ECC needed. They are sitting pretty at 35C, and will hopefully run forever. At least they have a 5 year warranty, which is something Seagate can't deliver anymore, apparently.
I also have a Western Digital Green Power 1.5TB drive on the way to replace my failing Barracuda, and I'll try to share my thoughts on that solution once it arrives and I have some time with it. I understand that they are not really acceptable for RAID solutions, but the new RE4-GP is, and it weighs in at 2TB. It's quite costly compared to the Seagates, but I'd be willing to bet that your data is much safer on that drive if you need big capacity.
As always, Seagate is more than welcome to email me their thoughts that I will gladly put at the end of this piece, but I hope that I might have saved some data with these ramblings.


I have three of these:
Model Family: Seagate Barracuda 7200.11
Device Model: ST31000340AS
These are a part of the infamous family that issues with playing video and data loss. There is a firmware patch which I haven't applied yet. I have them in RAID 5 on a Debian GNU/Linux server and haven't had any noticable issues. However I decided to look at the SMART data. I have no high reallocated sectors but all of them seem to have high "Read Error Rate" and "Seek Error Rate", have a look:
should I be looking at the raw values? if so those are terrible aren't they?
I really should upgrade the firmware on these drives.
FWIW these drivers were installed in March of last year and have been running fince since.
JB
There is one reallocated sector on the first drive, which indicates it is in the process of dying, and should be replaced. While it will probably run for a while without trouble, performance will certainly degrade as more sectors are pushed to the spare space.
After I wrote this piece, I started checking all the Seagate drives I have access to, and all of them are atleast 7200.10's. They all exhibit this strange error behaviour, even though some haven't actually failed:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 092 076 006 Pre-fail Always - 201975484
3 Spin_Up_Time 0x0003 097 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 69
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 062 058 030 Pre-fail Always - 1619850632058
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 11479
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 69
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 061 042 045 Old_age Always In_the_past 39 (Lifetime Min/Max 36/43)
194 Temperature_Celsius 0x0022 039 058 000 Old_age Always - 39 (0 24 0 0)
195 Hardware_ECC_Recovered 0x001a 063 057 000 Old_age Always - 12076313
That's from my 750GB drive. Seek errors and read errors are insanely high, but ECC has recovered all of them. I'm starting to think that this is just how Seagate rolls, so I'm not going to be a part of it anymore, and am switching out all my drives to WD's.
BAH! this is annoying! I thought seagate was supposed to be one of the best..
according to http://support.seagate.com/customer/warranty_validation.jsp my warranty is valid until December 2012.. I may upgrade my RAID array to something else (WD) and use these for other purposes (desktop usage?) and when/if they become unusable get them replaced?
this is dissapointing.
It really pissed me off when I figured out what was going on also. First they dropped their warranty from 5 to 3 years, then this. Ohh well, live and learn, right?
I loaded CrystalDiskInfo at Nate's suggestion only to discover that my C: drive was not recognized by the software:
I guess I don't have a Seagate system disk. What surprised me was that the software found and reported on one of my USB drives:
Am I right in thinking that this drive is on the brink?
Cheers... Brian
Hi Brian,
The drive's a little warm, but that's pretty standard for the external drives, since they don't have much airflow over them. It still shows that strange behavior with the high seek errors and high read errors, however the reallocated sector count is zero, so it should be alright.
I'm not sure why the C: drive isn't showing up. Usually this software can talk to just about any drive. You can try another utility called "Speedfan" which has a new version supporting more chipsets. It's available here.
-Nate
Nate, The way I read it, the reallocated sector count is 100. What should I be looking for?
I downloaded speedfan and here are the results for my C: drive.
Brian,
It's showing zeros in the Raw Value field, so you're alright. All of your other values look like the ones we've seen in the functional drives. Be sure to check up on it every now and then though.
-Nate