I work for Seagate. I was there when the fit hit the shan, and I saw everything going in internally, as well as externally. I really love my job, so please excuse the sock-puppet nature that creating a brand new account and claiming to be an authority on the subject I must seem to be. But I am a geek, and I really think you all need to know the true story behind the scenes.
This whole thing started with the 1.5 Terabyte drives. It had a stuttering issue, which at first we all thought was a simple bad implementation of SATA on common chipsets. Seagate engineers promptly jumped in and worked to try to duplicate the issue and prove where the problem was. This wasn't a massive rush as 1.5tb drives are what? 5% of the drives on the market. When it became obvious that the issue was more widespread, they buckled down and put out a couple of firmware revisions to fix it.
Now, in the 1.5tb drives, there are 2 main revisions. the the product line that gets the CC* firmware, and the line that gets the SD* firmware. They came out with firmware CC1H and SD1A to fix these issues and started issuing them.
But, seagate has always been restrictive of handing out their firmware, so such updates required calling in with your serial so that the people who had access to hand out the firmware could check a) model, b) part number, and c) current firmware just to make absolutely sure that they were giving the right firmware out. This has been a procedre that has worked for YEARS up until now.
Then the bricking issue came to their attention. It took so long because it's an issue that's hard to track down - pretty much the journal or log space in the firmware is written to if certain events occur. IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS.
This is a rare, but still obviously bad issue. Up until now, we all figured it was just some standard type of failure, as it was such a rare event, so we'd RMA the drives.
So, for whatever reason, mid management started freaking out (as it could be a liability for seagate, I suspect - ontop of the already potentially liable issue of the stuttering problem causing drives to fail in RAIDs). So, they pushed the release of the SD1A firmware to the general public. They took a few days to 'test', though it was mostly just including some code in the batch file that kicks off the firmware updater, to check that it is a BRINKS drive, and the proper model number. Then it was kicked out to the public.
Please understand, this firmware had to go through five different checks to make sure it applies to the specific conditions to qualify sending to a customer, before now. 5 chances for us to go your drive needs the other (or none) firmware update. Suddenly, it's down to ONE check, and even that was more designed for a contingency just incase the wrong firmware was sent out.
Of course, it starts bricking drives.
Right now, the engineers are crapping themselves, the firmware's been pulled, the support agents are told to say "The firmware will be released soon" and no real procedure to fix this issue is in place. Our phones are flooded so bad that it locks the system up when there are too many calls in queue, and emails are coming in at hundreds an hour.
We simply cannot keep up.
The good news is, the chance of your drive simply not spinning up one day is very low. And for those of you who flashed the wrong firmware - be patient. It's not bricked, just unable to write data to the platters properly. When they have a *GOOD* firmware out, a new flash should un-brick the drives. If not, flashing it back to SD15 should make it work again.
Seagate really pushes the idea of being open and honest as much as we can without being sued to hell. They let agents make choices and use their skills instead of scripting us to death. They worked hard to bring their support back to the USA.
Seagate does care about their customers. They just got caught with their pants down, twice in a very short period of time! So, they're wanting to double, triple, and quadruple check the firmware so it doesn't brick anymore drives.
As for why it takes so long before an issue is reported and before seagate makes an announcement - we get a dozen 'reports' of issues that are really just one-off problems a day. It takes time for an issue to be 'significant' enough to escalate to the product teams, and time before they can provide a fix.
I hope this clears up a few things. I may or may not be able to answer questions if you have any.
Good luck! I live near a town where Seagate is the largest local employer and already had to fire a portion of their staff this past week. I really hope you guys are able to solve this quickly both for your customers and yourselves of course.
Thank you!
I did lose some very close coworkers in the last round of layoffs. i think that's another reason this is hitting so hard - we are at our highest support volume, low sales, and rough stock. I believe Seagate realizes they need to keep every customer possible. But pressure can and in this case has lead to some bad decisions by management to ignore their engineers' recommendations... but even then, this issue was hard to see coming as this firmware had a month of field proof that it works... they ju
Sounds like the ease of hacking around the batch file might be an issue.
So stop running the checks through a batch file. Make it an.exe so people don't bypass it.
As I've noted below, it was an emergency release that shouldn't have been, and was never designed for release to the general public.
They should have redesigned the delivery system, but there was too much public pressure on them to get a fox out *now*...
But then again, it was somewhat their own damn fault - if they had just came out an explained the details of the issue to everyone instead of keeping it in-house, people would have realized quickly it wasn't as dangerous a situation as it seems at first glance. Just inconvenient to the few who run into it more then anything. But the ambulance chasing lawyers smelled blood during the 1.5Tb issue and forced management into a hole.
There's a lesson to be learned here. DON'T FARKING LET MIDDLE MANAGEMENT BYPASS YOUR TRIED AND TRUE TEST/RELEASE PROCEDURE. Yes, the initial problem was bad, but the rush to get a fix out made it much much worse. Upper management is at fault here for allowing middle management pencil pushing idiots to do this to the company's reputation. Procedures are in place for a damn good reason.
You say that now, but you have to admit, with such screaming and carpet-clawing that went on about the 1.5Tb issue, some of the fault rests on the mob mentality pushing Seagate management to get a fix out ASAP for an issue recently proven.
I'm not saying it's okay - but the exact same situation that can force a large and lumbering company to move faster, can force management to push really hard and cause quality systems to break down.
You can whip the bull to get it to run, but you may just cause it to run right off the cliff.:)
Hahaha. That's so true, but the whole point of middle management is to make bad engineering decisions for political (read: "stupid") reasons, because the people who know enough wouldn't, and the people above them think it'll save money to have a political layer (that is, a stupidity) like that inbetween. The people at the top can't make those decisions directly, because when they screw up, someone has to take the blame. So these people are pushed to make "the hard decisions", then get blamed for it when
Maybe I'm naive, but why can't the firmware updater first check that it is being applied to the proper drive hardware? Surely it has a way to ask the drive exactly what hardware it has in it. But routers seem to be just as stupid, accepting whatever file you send to it without checking anything, so what do I know.
It was never designed to be a public release. The script checks two things.. to make sure it's a BRINKS or a MOOSE drive, and to check the model number. If you get the firmware from the torrents (it's out there) and tear it apart with uniextract, you can see the batch file and what it checks for. It's a program that was built back in the 90's and used ever since! You remove those 2 checks, and it'll happily flash that IBM or Western Digital drive with the seagate firmware as well.
I don't think I've ever seen a hardware firmware protection mechanism, but I agree that it should be commonplace these days. I lost a Lite-On DVD-Rom drive after a rogue program (buggy, rather than malicious, I think) lashed out and flipped one particular bit in all 16 bit words of the firmware, causing it to forget how to do pretty much everything.
Maxtorman, I'd mod you up if I had the points. Your comments are the first ones to alleviate a very significant knot that formed in my stomach after reading this. I'm still a little concerned though, and have some questions at the bottom I hope you could answer.
I'm a little late to the party because I only use these only for non-critical stuff like home office and family PC's, but the prospect of having all my drives inevitably die really scares me. I've bought 18 drives (ST31000340AS and ST3500320AS all w/
I'll answer your questions to the best of my ability, and as honestly as I can!
I'm no statistician, but the 'drive becoming inaccessable at boot-up' is pretty much a very slim chance - but when you have 10 million drives in the field, it does happen. The conditions have to be just right - you have to reboot just after the drive writes the 320th log file to the firmware space of the drive. this is a log file that's written only occasionally, usually when there are bad sectors, missed writes, etc... might happen every few days on a computer in a nin-RAID home use situation.. and if that log file is written even one time after the magic #320, it rolls over the oldest file kept on the drive and there's no issue. It'll only stop responding IF the drive is powered up with log file #320 being the latest one written... a perfect storm situation.
IF this is the case, then seagate is trying to put in place a procedure where you can simply ship them the drive, they hook it up to a serial controller, and re-flashed with the fixed firmware. That's all it takes to restore the drive to operation!
As for buying new drives, that's up to you. None of the CC firmware drives were affected - only the SD firmware drives. I'd wait until later in the week, maybe next week, until they have a known working and properly proven firmware update.
If you were to have flashed the drives with the 'bad' firmware - it would disable any read/write functions to the drive, but the drive would still be accessible in BIOS and a very good chance that flashing it back to a previous SD formware (or up to the yet to be released proven firmware) would make it all better.
Oh, and RAID0 scares me by it's very nature... not an 'if' but 'when' the RAID 0 craps out and all data is lost - but I'm a bit jaded from too much tech support!:)
Yeah, I'm not so much a fan of RAID0 either. It works for what it does - speed at the cost of reliability - on a system that'll be reformatted a couple times a year as hardware's rearranged.
Again, you've put my concerns to rest. I'll wait the week for a patch content there's a low chance of hitting that magic number.
Does anyone know if drives with firmware SD04 are affected?
I have a 500 GB (ST3500320AS) that appears in the "affected" list but as this is one of the first 7200.11 drives made it has firmware SD04. This firmware supposedly has an issue where the 32 MB cache does not work correctly but as far as I can tell I don't have that issue.
This issue of the drive dieing on reboot is the one that scares me but I can't tell if SD04 has this problem. Are there other reasons why I might want to upgrade from SD04?
If the problem was analyzed so well, why is the update so f-ed up? This is a failure of both the SCM and SQA processes and possibly in the professionalism of the firmware development and test teams, if this new error was caused by a failure to follow process.
If they don't get it right on the third try, heads should roll. The original defect you describe sounds suspiciously like an untested boundary condition (or more accurately a double-boundary condition - the last log file in the circular queue, combined
The conditions have to be just right - you have to reboot just after the drive writes the 320th log file to the firmware space of the drive. this is a log file that's written only occasionally, usually when there are bad sectors, missed writes, etc... might happen every few days on a computer in a nin-RAID home use situation.. and if that log file is written even one time after the magic #320, it rolls over the oldest file kept on the drive and there's no issue. It'll only stop responding IF the drive is powered up with log file #320 being the latest one written.
So, in other words, if your computer is rebooted regularly, and you use it for long enough, it's very likely to happen. If the log file is written to once every few days on average, and you reboot daily or more frequently, after a few months it seems likely that the magic log entry #320 will be the last one written when you shut down. No wonder there were so many issues. (The thing I'm wondering is if events causing log writes tend to cluster together? That'd make the odds better, but not by enough.)
As far as I know, if your drive has the CC1G, CC1H, CC1J or any of the CC firmwares really, it is completely unaffected by this issue. However, it may need an update if you experience 'stuttering' (the drive pausing for more then a few seconds during data transfer). The CC1H and CC1J firmwares are *fine* and will absolutely not brick your drive.
I'd still wait a little while though - support is overwhelmed and mistakes are being made as noone is used to these changes. Once everyone gets a routine down (once there -is- a routine at all), they'll be better able to help reliably.
I imagine Maxtorman is referring to connecting to the on-board controller using its RS232 interface. I don't think there's any practical way of making that a suitable end-user tool or process.
If the drive is properly-bricked (i.e. the problem that the current round of firmware updates is designed to fix - not showing up in the BIOS, etc), then it's necessary to connect to the RS232 interface and reset the drive's on-board controller in order to make it show up again. At that point, flashing the firmware is easy. I imagine he was suggesting, similarly, the log file is only accessible via the RS232 interface and not using the ATA command set.
Very possibly! It could have forced a new write of the log and bypassed the bug. Other people are reporting that the drive comes up again if power cycled a few times... so i think you may have gotten lucky in that respect.
The log, if my information is correct, is written each time a SMART check is done. This will always happen on drive init, but can also happen at regularly scheduled events during normal usage, as the drive has to go through various maintenance functions to keep it calibrated and working properly.
The log, if my information is correct, is written each time a SMART check is done.
Ah, nice. I run these (long variant) as cron-jobs every 14 days in order to increase disk reliability and get early fault detection. Fortunately I have moved to better brands than Seagate some time ago. (Got a visibly factory damaged Chinese Seagate ES drive and decided to move away...)
If all the drives in an array write log entries at the same rate than they have non-independent failure rates. You lose either no drives or all drives at the same rate as losing a single drive. Sort of defeats the purpose of RAID but that's a separate issue.
but the prospect of having all my drives inevitably die really scares me
Uh, all drives die at some point, it question is just when. The specific problem here is that they die so frequently that it could happen that two or more drives in your RAID5 might die at the same time, or that your backup drive and the original drive might die at the same time (or before you can get a replacement for the first to die).
Just remember, everything dies
I sincerely smile in relief every time I hear someone other then myself say that phrase. It's a sign of someone who truely 'gets it' as far as hardware is concerned.
Thing is, I know Seagate really does try to push for high manufacturing standards (for example, did you know that every last Refurb drive *must* go through the full new-drive qualification before it's sent out? - something only a percentage of actual new drives have to go through because it's time consuming).
I had a 500GB Seagate drive in an external enclosure. I always bought Seagate drives, because in the past I've seen 2 Maxtor drives fail and no Seagate ones. Anyway, after a while the external closure didn't work anymore, and the disk made clicking spinup sounds all the time (trouble with the controller of the enclosure maybe, it happened both with USB and eSata). So I moved the drive inside my PC and it's been working without problems for almost a year in there now. I hope it keeps doing so.
That's the same place I'm in. Business stuff has redundant backups (sometimes doubly so;)). I'm just concerned about this particular case because I have a lot of the exact model/fw drives out there, purchased and installed as system drives for friends and relatives in the past 6 months. Worst-case scenario on business stuff is mirrored, often times remotely with versioning for important data.
What's worse is Seagate's own forum seemed to be constantly proving them wrong on the scope of affected drives. This [seagate.com]
I've been a denizen of slashdot for many years - I just wish all these mod points were on my main account!:) But it is nice to be able to contribute knowledge and experience back to the community for once.
Thing is, this issue -is- rare. But it manifests itself in a way that's hard to distinguish from a normal drive failure. (suddenly no detection in the BIOS; spins up but never is seen on the computer - this can happen for a dozen reasons including a loose or bad cable, physical drive failure, etc) so a who
Since I had not heard of massive numbers of Seagate drives failing I already suspected that this is a rare occasion in which the drives would not spin up. I was wondering why Seagate announced this bug berfore they have a fix ready. Looks like they announced at very early. Maybe they also should have put more emphasis on the fact that it is a very rare bug.
It was announced. And people were freaking out about a bug from Seagate without a fix ready. What happens when customers freak? Right: Tons of pressure o
I was wondering one thing. I got two Seagate Deskstar USB drives last year. I don't have them here to check them, but I believe there is no way to check for or update firmware through USB. Or is there? What are the plans for external enclosures?
I use Linux, so I can't use the Windows updating software. I know that there is a boot floppy image available. But I don't have a floppy drive anymore. I guess Seagate won't be making any firmware upgrading software for Linux soon. But a USB stick boot image with FreeDOS or sth. would be nice.
Many thanks, Maxtorman. Yours is the first useful information I've had out of Seagate so far, and is much more reassuring than the official KB articles and the 'support' I've received from most of the first line techs I've dealt with at Seagate. I only wish you could show this to your management and take credit for it. I hope that they have the sense to keep you and those like you through the coming upheaval.
Now, a few questions, if I may...
The wording on the 207931 KB article keeps changing; sometimes it'
Thank you! I wish this information would have been public and I didn't have to create a new account to avoid being fired for releasing 'confidential information' - but what can you do with jerkoff lawyers tearing at your corporate heels already?
Now, to your questions!
1) It keeps changing because the scope of the issue keeps changing. I'm pretty sure it's a range of drives within the familys noted in the KB article - but also, there are some external drives affected because they contain an internal drive with the problem, that aren't on the article yet. Your best bet would be to compare your drive to the list of models, and then wait a little while.. around friday, I *think* they should have most issues sorted out and the information accurate. But I can't promise anything.
2) That could very well be it. I'm not privy to the nitty-gritty details, as engineering clammed up pretty quickly - I'm just a geek enough to understand what I hear in passing or the few technical details I came across when I go looking for information. But the mysterious death log being a SMART self-test log would absolutely make sense, and is consistent with what I'm hearing.
3) Unofficially, I've seen more then just the 1.5Tb drives display symptoms similiar to the stuttering issue, but none so blatent or as impacting as it is in the 1.5Tb drives.
As far as the firmware fixing both the stuttering issue and the unresponsive-drive issue, yes. The changes for the stuttering issue was made in CC1H and SD1A firmwares. Any firmware equal or more recent then those two, will have the fix for both issues.
4) I have no idea. SMART characteristics can vary from part number to part number - or even sometimes drive-to-drive; so what is 'out of tolerances' for one part number could be just fine for a different p/n (even though they are the same model number).
1 word: Lawsuits. if they gave incorrect information, it could open them up for liability if people acted o that information. When a business' data could be worth millions, one slip-up could cost them dearly. The only reason this firmware isn't such an issue, because of the disclaimers allover the place when you flash a drive.
yes, the 1.5Tb drives both stutter and are at risk of bricking due to the journal issue. The Stuttering issue is fairly recent and mostly runs in the 1.5tb drives - but the journal issue is older and exists across many 7200.11 drives. ES2 drives and Diamondmax drives.
SD1A fixes both of these problems in the 1.5Tb drives.
Apologies can be construed as acknowledgment of an issue, and could carry weight in a civil lawsuit.
And given the rabid scaremongering happening on/. and other enthusiast sites, do you really think that people will stop spamming the support threads with useless whining and mee-too's. While some posts may get deleted, I bet a large number of deleted posts are just people whipping the mob into a frenzy or making unfounded allegations.
If the flash utility will flash the drive at all(even if the data is inaccessible), then you have a very good chance of restoring functionality to the drive with an older or the new 'good' firmware yet to be published.
I personally have not been affected by any of the recent firmware issues on Seagate drives, but I have long been a Seagate fan and have been reading your posts with interest. First, I just wanted to say that the great deal of transparency that you personally and your company as a whole have both shown through these recent ordeals is more likely than not going to counter any possible fear that might have resulted in me leaving the brand for another. So, kudos.
First, let me apologize, I'm gong to withhold employment details such as tenure and experience mostly due to the fact that many of us at Seagate (including some in management) are Slashdot regulars.
That said, I really do enjoy my time at Seagate, and it has been an absolutely wonderful company to work for.
As far as "BRINKS" "MOOSE" "GALAXY" etc.. are concerned, they are pretty much the internal development names of the drive family. There can be overlap, but most "BRINKS" drives are 7200.11, I believe, while "MOOSE" drives are almost all 7200.10, and "GALAXY" drives are 7200.9. Generally, those names don't make it out into public, but if you were to tear into the SD1A firmware, you'll notice that it looks for the "BRINKS" drive before it flashes the firmware to the drive. There can be different internal names for different revisions of the drive itself, but generaly they stick to one revision per family - a new internal name would only be used for a MAJOR revision on the drive.
I don't have my documentation handy, but I'll look that up later in the week and try to give you a better answer.
You have absolutely no need or cause for apology. My curiosity often gets the better of me and I end up asking questions the answers to which I have absolutely no entitlement.
If you happen to somehow manage to read over your docs and then furthermore have time to flesh out your answer to me in more detail, I'll consider that a bonus. You pretty much confirmed what I guessed about the different names being internal codenames for entire series, so I guess the more interesting information at this point would
Thank you for that clarification. I knew it had to do with the internal configuration of the drive, but wasn't sure about the details. I figured they were pretty family specific, and could vary even with the same model number.
Two of my beloved 80Gb Seagate died 6 months ago. Luckily, I was able to retrieve their data before they were unreadable on a 320Gb Seagate. The 80Gb were 5 years old. The 320Gb is 1 year old. Now, the 320Gb is dying, by doing some noisy clicks, and the symptoms are that it's impossible to copy files larger than 500Mb, it locks XP and I get BSOD. Luckily, I was able to retrieve the data, because the clicks disappear if I keep the drive po
What about reports of forum threads and postings being repeatedly deleted? To me that was the worst aspect of this whole issue. And due to Seagate's slowness in responding many will assume the worst - eg deliberate hiding of the issue, hating of the customers, ignorance etc - when the actual cause appears to be overloading of the support process.
Yeah. I have no idea what's going on with the forum. I don't work directly with AlanM but I imagine he has a set of policies he has to enforce, and that sucks for doing actual dev work on the forums.
The forums are known as dangerous waters for support people to venture in, and forbidden to do so in any official capacity as support agents. But we do read them, especially when things go crazy like this.
I am one of the victims and your report confirmed all the problems which I expected to occur inside your company. I previously worked with an electronic giant and the problems are just too similar.
The catastrophic problems which Seagate is facing now could have been prevented - if there would have been one single person in customer service who would have cared and pushed the issue, which was known for months, up to the right people. A little googling some months ago would have proven that this issue is far bigger than a "one time" incident.
After all it doesn't happen every day that Data Recovery companies announce with joy that they are able to handle widespread 7200.11 firmware problems. Or that the two major companies which provide recovery solutions race for being the first to have a two click solution for this cash cow.
Data recovery companies were flooded with drives. They figured out an easy way to fix the firmware and kept it secret. They made a great profit, charging prices as if it was a hardware failure.
Seagate Datarecovery did the same by quoting up to 1800 USD for a 10 minute fix. Although I am sure that they were the only ones not aware of the easy fix.
The problem with the undetectable bios drives really isn't new. Your customer service knew it for a long time, but they are paid so little and probably have such strict procedures that they don't care about Seagates customers and no one dared to report the drive failures as a major incident. Everyone shut up about it and the people which are responsible and do care only learned about it months later when (or shortly before) it got out to the press.
Seagate had months of time to fix it. Two months ago when my drive broke, there was already plenty of information about the problem on the net. The only one who would deny any problem was Seagate.
I warned your board moderator of the disaster which will strike Seagate months ago. I tried to show him that these were not normal failure rates but the poorly paid guy didn't care.
The email support who takes two weeks to respond, and the phone and live support were just as ignorant.
There were people reporting how 4 out of 6 drives broke within weeks, and Seagate would only respond that such failure rates are normal.
People on the Seagate boards were constantly reporting the problem, but your board moderator shut them up. Threads where getting deleted and locked, including a big thread where the community was working on a fix. The reason, according to Seagate, was that it added nothing to the community.
The board moderator would consistently tell everyone that there is no known problem with the drive - the same message as your customer service.
It went as far as blocking links in private messages to a posting on another board which could help the victims. So how could Seagate expect from those people now to actually believe that the company cares?
The posting on the new board had within a short time 10.000 views. That's when things started to get out of hand for Seagate.
People were pissed off for months about Seagate. Everyone knew that the firmware was broken, but the company denied any problems. We knew that it is not that difficult to recover the data if you have the tools and knowhow, but the company wouldn't give any assistance. Many would have accepted the fate if the drive would truly be broken. But not if it is inaccessible because of a firmware bug which makes every single drive a -clicking- time bomb.
People everywhere were calling Seagate harddrives junk drives which are so unreliable that they will never buy them again.
So I, as many others, went on to warn every single person we knew about the problem with Seagate drives. The hilarious/sad thing is that before, I would recommend Seagate to everyone I knew. If someone would ask me which drive to buy I would reply with no doubt: Seagate.
This could have been prevented if Seagate would have acknowledged the problem much earlier. I wasted day after day,
The problem with the undetectable bios drives really isn't new. Your customer service knew it for a long time, but they are paid so little and probably have such strict procedures that they don't care about Seagates customers and no one dared to report the drive failures as a major incident. Everyone shut up about it and the people which are responsible and do care only learned about it months later when (or shortly before) it got out to the press.
----
You say that as though it was willful ignorance on
Seagate is STILL refusing to give a list of drives that have the problem (whats so hard about a matrix of model/firmware/etc?) - calls to their support do not get through, and emails are being ignored to the large part.
I personally have in excess of 50 500GB 7200.11 drives in semi-critical operation here, on machines that get restarted several times a day - for a while we have been losing drives at the rate of about one a week.
Now, Seagates response so far has been to RMA them onc
I'm not here to explain what *should* happen to to lick boots of those that have been wronged; I'm here to try and explain what *has* happened and *why* things are the way they are.
Make your choices as you will. I'm just trying to help get some much needed information out that can't seem to make it through 'proper' channels.
Also Here [custkb.com] is a list of affected models. If our drive is on that list, and has the SD* firmware, it's affected. It is that simple.
As for service issues - the facts are thus: Phone is slam
And a whole lot more people are sitting with bricked drives and inaccessable data because they didn't wait a few extra days to design a good procedure for dealing with the influx of people and drives. 3 or 4 days wouldn't have made much of a difference, (especially since this issue has apparently been around for months anyways) and would have avoided many dozens of people losing access to their drives that were working fine beforehand.
that article on MSFN is about the best reference I've seen yet - I am really not familiar with working on drives via a serial interface, though I may pick one up cheap and low capacity off eBay and play around! I know of no special commands more then that article describes. however, I'll look around and see if I can get that information (though I don't know how much I can disclose if I do find it.). We'll see what I can dig up.
Also, I cannot say for sure it's EXACTLY 320 entries. That was the number bandied
Unrelated to the new drives, but I'm hoping you can help. I have an old Maxtor drive (manufactured 2001, 60GB D540X-4D) with some bad sectors on it. I would *love* to be able to perform a best-effort, dirty read on them, instead of getting back nothing and a CRC error. Is this possible? I looked high and low on the net and couldn't find any free utilities. Best I found were references to expensive crap like SpinRite, which looks like snake oil. At this point I'm looking into hacking the Linux IDE driv
What specifically are you trying to achieve? Do you know that (parts of) files you wish to recover are specifically stored in the blocks that are giving read errors? Or are you just trying to get a good copy of the whole disc? If the latter, then you might well be able to get away with using something like ddrescue [gnu.org] which can ignore the bad sectors if they don't read correctly after a number of retries. If the former, then I imagine you'll need to look into whether the drive has an interface to the onboard c
I got a copy of the disk -- actually I managed to get all of it except for about 22 megs. It's just that the 22 megs has some of the crucial pieces. If I could read something like 480/512 bytes that would be much better than nothing.
Thanks for the RS232 tip. I'll look around and see if my Maxtor drive has something similar.
A bad block is a bad block - it's marked corrupted and that block's LBA address is reassigned to a replacement block on the end of the track. Once a block is reallocated, it's as though the old block doesn't exist according to the firmware. The driver won't help you at all - the old block no longer even has an LBA address. I guess it still have a physical address that may be accessed if you know what you're doing, but that'sa bit over my head at this point.
Thanks for your reply. I know the data is still accessible, because sometimes a read will work after repeated attempts, but this happens so infrequently and it takes so long to read from bad blocks that it's no longer worth trying any more. As for bad blocks being re-allocated, the drive has already used all the spares, so any read that fails is from the data I need. I just need a way to tell the firmware to ignore the CRC and give me whatever data it sees.
That's not quite right. I'm not expert, but as I understand it, drives don't generally actually reallocate bad blocks until one of two things happen:
1) The drive manages to recover the contents of the bad block successfully.
2) The block is written to.
Until then, the block is put on a list of pending blocks that need to be reallocated. (Basically, modern discs try really hard to handle unreadable data.)
I believe there may be ways of reading raw blocks, but I'm not sure that'll help you. Modern drives us
Thanks for that very clear non-bull explanation. I was all about to go on a rant about QA and release management, but I should have known it was middle-management cutting corners.
(Let's face it, that does tend to be behind a vast number of faults in any technical company.)
I ordered a pair of 1TB ES.2 drives a little while back, and the drives report themselves as being GB1000EAFJLs, which as I understand is an HP relabel of the ST31000340NS. The firmware version reported is HPG6, which seems to be the latest version from HP. Would you happen to know if there's any correlation between the HP firmware and Seagate firmware versions?
OEM drives like that often have a special firmware designed by the OEM themselves based on Seagate's stock firmware.
It may or may not have the problem, but all the OEMs have been given details about this issue and are responsible for checking their firmware and updating it as necessary.
It's a shame seagate is/was so secretive about the actual problems with the 1.5 TB drives. I specifically went looking on your website to get information about the problem so I could assess how bad the problem was and whether or not we'd be affected (yes, we run Linux). The only thing I found was mass censorship on the forums by moderators, and any discussion about the problem was locked away. Since I couldn't find out *easily* whether or not we'd be affected, I opted to buy a set of drives from another
"Then the bricking issue came to their attention. It took so long because it's an issue that's hard to track down - pretty much the journal or log space in the firmware is written to if certain events occur. IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS."
Can we view and clear this with ourselves to avoid the bricking in
As far as I know, no. There may be a way using the controller some people have posted instructions on building, but your best bet would be just to watch the KB article like a hawk and update the firmware as soon as a good release is out.
Thanks Maxtorman! I did subscribe to the notifications to those KB documents and the forum threads. And of course my/. replies to your/. comments. I want this resolved before it is too late.:)
Hmm, and I'm supposed to believe that when your nickname is "maxtorman"? Your plan was cunning and clever, but your subconscious knew of your evil plot and forced you to reveal your hand!
I absolutely agree with you. If we had been allowed proper development and proving time, this may not have been an issue at all. But the moment Seagate even admitted there was an firmware issue with the 1.5Tb drives, lawyers began recruiting for class action suits.
Wait a few days - Seagate will have in place a procedure to get bricked drives due to a bad firmware, in place. Once they do, you should just be able to send them the drive and it'll be reflashed with good firmware and sent back. I can't say this for absolute certain, but that's what they're telling us now.
If you have confidential data on the drive, you have two options:
a) if you send it in for a reflash, there will be a tech who flashes the drive using a serial interface, and then verifies good read/writes
I've worked tech support for a while now, and that's something that's struck me working for seagate. Just about everyone there actually.. well, -cares- about what they do. The atmosphere inside is completely different then say, dell or cox or AT&T (from what I hear - I've never worked at any of those places, but many of my coworkers have).
The management lets support agents have license to use their experience and skills in fixing issues, instead of reading from a script. They don't heavy-hand them, and t
That fact is a shame, since it is the most easily recorded process of interaction. In particular, I note the representative's lack of capitalization, punctuation, and other grammar issues.
Frankly, it puts a face of ignorance on Seagate. Support is, and I apologize for the cliche, "Director of First Impressions." For this reason I have little to do as possible with the likes of Dell and ComCast (though my experience with AT&T has been mixed.)
Seriously, you do not want to upset people who come to you fo
The reason for the RMA was I physically broke off the data connector myself (I was rammed by a 120lb dog while installing a new Raptor 150GB), and Seagate didn't have a problem with that whatsoever. They've been really good to me.
Nice to know Seagate's warranty covers Acts of Dog.
You can ask that a CD be mailed to you with the firmware flash on it. We don't publicize it, but we already do that with our normal drive software. So if you absolutely can't go to a friend's house, download and burn the ISO and flash it that way, usually the agent is happy to send you a CD in the mail. If not, ask for a team lead and they'll do it.
Sorry, I was using "One in a Million" as more of an expression then a valid statistic.:) But yeah, your math adds up, though actual field results I'm sure are much lower. 30% of 14 million drives(the number of drives potentially affected) is 4.2 million - we'd be overran with dead drives.
The drives have to go through a calibration and burn-in as part of the manufacture process, which should have already detected any bad sectors and reallocated them and then the SMART zero'ed out before going into the field. It's always a possibility that a few sectors were just on the tipping point and weren't detected during burn-in or later went bad for other reasons.
Having a few reallocated sectors like that is a pretty consistant event across all drives, no matter then make, model or manufacture date or
There are windows programs that will do such tests.. otherwise in linux you can just "dd if=/dev/random of=/dev/ count=1" or something like that (I can't remember exactly the commands to end the dd at the end of the drive), and just run your SMART check after the drive's finished being written to. Then do the same and zero it out (/dev/zero instead of/dev/random) and check again. this will obviously wipe any data on the drive, of course, so don't do it on a drive that you care about the data, as it won't b
There is a set of part numbers that it regularly bricks, but I don't have access to the details why and what just yet. When I do, I'll try to update with details.
I do know that swapping the PCB won't help a firmware bricked drive as the firmware is not kept on the PCB - it's kept on the drive itself mostly.
Are your drives recognized in the BIOS? If so, then better to just wait until they have the good firmware. If not, then wait until they have the process in place for returning the drives for a physical fi
Everyone has their personal manufacturer blacklist due to personal experience or anecdote. I can't say I blame you!
To make matters worse, the list needs constant maintenance. Nothing else will help. I agree that reliability in "normal times" is pretty similar across vendors. There may be issues specific to one vendor or drive model, like older Maxtors reacting badly to heat or WDs having interface compatibility issues, but these are longer term and you can find out about them beforehand. For the one that is
We had an old WD 30GB fail in a RAID-1 array. While awaiting approval to replace the drive, the IBM Deskstar 40GB mirror failed. (Yeah, old drives, right?)
As an act of desperation against reloading the server and paying a software vendor to reinstall their system (and, no, this server is not frequently backed up for reasons and discussion I wish to avoid,) I put the Deathstar in the freezer overnight. I figured, what the hell, the worst that could happen is I would still have
Then yeah, if you don't care about warranty, then PCB replacement would be your best bet i think (other then data recovery). Match up part number and firmware revisions though!
The update is on a bootable FreeDOS CD image. I'm not sure if you can persuade an x86 Mac to boot DOS natively, but at least you don't have to use Windows.
"Everything should be made as simple as possible, but not simpler."
-- Albert Einstein
THE FACTS (Score:5, Interesting)
I work for Seagate. I was there when the fit hit the shan, and I saw everything going in internally, as well as externally.
I really love my job, so please excuse the sock-puppet nature that creating a brand new account and claiming to be an authority on the subject I must seem to be. But I am a geek, and I really think you all need to know the true story behind the scenes.
This whole thing started with the 1.5 Terabyte drives. It had a stuttering issue, which at first we all thought was a simple bad implementation of SATA on common chipsets. Seagate engineers promptly jumped in and worked to try to duplicate the issue and prove where the problem was. This wasn't a massive rush as 1.5tb drives are what? 5% of the drives on the market. When it became obvious that the issue was more widespread, they buckled down and put out a couple of firmware revisions to fix it.
Now, in the 1.5tb drives, there are 2 main revisions. the the product line that gets the CC* firmware, and the line that gets the SD* firmware. They came out with firmware CC1H and SD1A to fix these issues and started issuing them.
But, seagate has always been restrictive of handing out their firmware, so such updates required calling in with your serial so that the people who had access to hand out the firmware could check a) model, b) part number, and c) current firmware just to make absolutely sure that they were giving the right firmware out. This has been a procedre that has worked for YEARS up until now.
Then the bricking issue came to their attention. It took so long because it's an issue that's hard to track down - pretty much the journal or log space in the firmware is written to if certain events occur. IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS.
This is a rare, but still obviously bad issue. Up until now, we all figured it was just some standard type of failure, as it was such a rare event, so we'd RMA the drives.
So, for whatever reason, mid management started freaking out (as it could be a liability for seagate, I suspect - ontop of the already potentially liable issue of the stuttering problem causing drives to fail in RAIDs). So, they pushed the release of the SD1A firmware to the general public. They took a few days to 'test', though it was mostly just including some code in the batch file that kicks off the firmware updater, to check that it is a BRINKS drive, and the proper model number. Then it was kicked out to the public.
Please understand, this firmware had to go through five different checks to make sure it applies to the specific conditions to qualify sending to a customer, before now. 5 chances for us to go your drive needs the other (or none) firmware update. Suddenly, it's down to ONE check, and even that was more designed for a contingency just incase the wrong firmware was sent out.
Of course, it starts bricking drives.
Right now, the engineers are crapping themselves, the firmware's been pulled, the support agents are told to say "The firmware will be released soon" and no real procedure to fix this issue is in place. Our phones are flooded so bad that it locks the system up when there are too many calls in queue, and emails are coming in at hundreds an hour.
We simply cannot keep up.
The good news is, the chance of your drive simply not spinning up one day is very low. And for those of you who flashed the wrong firmware - be patient. It's not bricked, just unable to write data to the platters properly. When they have a *GOOD* firmware out, a new flash should un-brick the drives. If not, flashing it back to SD15 should make it work again.
Seagate really pushes the idea of being open and honest as much as we can without being sued to hell. They let agents make choices and use their skills instead of scripting us to death. They worked hard to bring their support back to the USA.
Seagate does care about their customers. They just got caught with their pants down, twice in a very short period of time! So, they're wanting to double, triple, and quadruple check the firmware so it doesn't brick anymore drives.
As for why it takes so long before an issue is reported and before seagate makes an announcement - we get a dozen 'reports' of issues that are really just one-off problems a day. It takes time for an issue to be 'significant' enough to escalate to the product teams, and time before they can provide a fix.
I hope this clears up a few things. I may or may not be able to answer questions if you have any.
Re: (Score:2)
Good luck! I live near a town where Seagate is the largest local employer and already had to fire a portion of their staff this past week. I really hope you guys are able to solve this quickly both for your customers and yourselves of course.
Re: (Score:3, Interesting)
Re: (Score:2)
Re:THE FACTS (Score:5, Interesting)
As I've noted below, it was an emergency release that shouldn't have been, and was never designed for release to the general public.
They should have redesigned the delivery system, but there was too much public pressure on them to get a fox out *now*...
But then again, it was somewhat their own damn fault - if they had just came out an explained the details of the issue to everyone instead of keeping it in-house, people would have realized quickly it wasn't as dangerous a situation as it seems at first glance. Just inconvenient to the few who run into it more then anything. But the ambulance chasing lawyers smelled blood during the 1.5Tb issue and forced management into a hole.
Re:THE FACTS (Score:5, Insightful)
There's a lesson to be learned here. DON'T FARKING LET MIDDLE MANAGEMENT BYPASS YOUR TRIED AND TRUE TEST/RELEASE PROCEDURE. Yes, the initial problem was bad, but the rush to get a fix out made it much much worse. Upper management is at fault here for allowing middle management pencil pushing idiots to do this to the company's reputation. Procedures are in place for a damn good reason.
Re:THE FACTS (Score:4, Insightful)
Re: (Score:2)
There's a lesson to be learned here. DON'T FARKING LET MIDDLE MANAGEMENT BYPASS YOUR TRIED AND TRUE TEST/RELEASE PROCEDURE.
Indeed. You could say NASA learned the same sort of lesson through Challenger - http://en.wikipedia.org/wiki/Space_Shuttle_Challenger_launch_decision [wikipedia.org]
Re: (Score:3, Interesting)
Hahaha. That's so true, but the whole point of middle management is to make bad engineering decisions for political (read: "stupid") reasons, because the people who know enough wouldn't, and the people above them think it'll save money to have a political layer (that is, a stupidity) like that inbetween. The people at the top can't make those decisions directly, because when they screw up, someone has to take the blame. So these people are pushed to make "the hard decisions", then get blamed for it when
Re: (Score:2, Interesting)
Re:THE FACTS (Score:5, Interesting)
Re: (Score:2)
I don't think I've ever seen a hardware firmware protection mechanism, but I agree that it should be commonplace these days. I lost a Lite-On DVD-Rom drive after a rogue program (buggy, rather than malicious, I think) lashed out and flipped one particular bit in all 16 bit words of the firmware, causing it to forget how to do pretty much everything.
Re:THE FACTS (Score:5, Funny)
"Right now, the engineers are crapping themselves"
Shitting bricks no doubt. ;)
Re:THE FACTS (Score:4, Funny)
Re: (Score:2)
Haha, good one.
Thanks for your informative post. I mean the original one.
A thank-you! (and some questions) (Score:3, Interesting)
Maxtorman, I'd mod you up if I had the points. Your comments are the first ones to alleviate a very significant knot that formed in my stomach after reading this. I'm still a little concerned though, and have some questions at the bottom I hope you could answer.
I'm a little late to the party because I only use these only for non-critical stuff like home office and family PC's, but the prospect of having all my drives inevitably die really scares me. I've bought 18 drives (ST31000340AS and ST3500320AS all w/
Re:A thank-you! (and some questions) (Score:5, Informative)
Re: (Score:2)
Yeah, I'm not so much a fan of RAID0 either. It works for what it does - speed at the cost of reliability - on a system that'll be reformatted a couple times a year as hardware's rearranged.
Again, you've put my concerns to rest. I'll wait the week for a patch content there's a low chance of hitting that magic number.
Thank you!
-Matt
Re: (Score:2)
Does anyone know if drives with firmware SD04 are affected?
I have a 500 GB (ST3500320AS) that appears in the "affected" list but as this is one of the first 7200.11 drives made it has firmware SD04. This firmware supposedly has an issue where the 32 MB cache does not work correctly but as far as I can tell I don't have that issue.
This issue of the drive dieing on reboot is the one that scares me but I can't tell if SD04 has this problem. Are there other reasons why I might want to upgrade from SD04?
Re: (Score:2)
If the problem was analyzed so well, why is the update so f-ed up? This is a failure of both the SCM and SQA processes and possibly in the professionalism of the firmware development and test teams, if this new error was caused by a failure to follow process.
If they don't get it right on the third try, heads should roll. The original defect you describe sounds suspiciously like an untested boundary condition (or more accurately a double-boundary condition - the last log file in the circular queue, combined
Re: (Score:2)
The conditions have to be just right - you have to reboot just after the drive writes the 320th log file to the firmware space of the drive. this is a log file that's written only occasionally, usually when there are bad sectors, missed writes, etc... might happen every few days on a computer in a nin-RAID home use situation.. and if that log file is written even one time after the magic #320, it rolls over the oldest file kept on the drive and there's no issue. It'll only stop responding IF the drive is powered up with log file #320 being the latest one written.
So, in other words, if your computer is rebooted regularly, and you use it for long enough, it's very likely to happen. If the log file is written to once every few days on average, and you reboot daily or more frequently, after a few months it seems likely that the magic log entry #320 will be the last one written when you shut down. No wonder there were so many issues. (The thing I'm wondering is if events causing log writes tend to cluster together? That'd make the odds better, but not by enough.)
Re:A thank-you! (and some questions) (Score:4, Informative)
As far as I know, if your drive has the CC1G, CC1H, CC1J or any of the CC firmwares really, it is completely unaffected by this issue.
However, it may need an update if you experience 'stuttering' (the drive pausing for more then a few seconds during data transfer). The CC1H and CC1J firmwares are *fine* and will absolutely not brick your drive.
I'd still wait a little while though - support is overwhelmed and mistakes are being made as noone is used to these changes. Once everyone gets a routine down (once there -is- a routine at all), they'll be better able to help reliably.
Re: (Score:2)
I imagine Maxtorman is referring to connecting to the on-board controller using its RS232 interface. I don't think there's any practical way of making that a suitable end-user tool or process.
Re: (Score:2)
If the drive is properly-bricked (i.e. the problem that the current round of firmware updates is designed to fix - not showing up in the BIOS, etc), then it's necessary to connect to the RS232 interface and reset the drive's on-board controller in order to make it show up again. At that point, flashing the firmware is easy. I imagine he was suggesting, similarly, the log file is only accessible via the RS232 interface and not using the ATA command set.
Re: (Score:2)
I'll see what I can dig up!
People are reporting good results with that method though.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
The log, if my information is correct, is written each time a SMART check is done.
Ah, nice. I run these (long variant) as cron-jobs every 14 days in order to increase disk reliability and get early fault detection. Fortunately I have moved to better brands than Seagate some time ago. (Got a visibly factory damaged Chinese Seagate ES drive and decided to move away...)
Re: (Score:2)
1% of 14 Million is still 140,000 drives dead.
Re: (Score:2)
If all the drives in an array write log entries at the same rate than they have non-independent failure rates. You lose either no drives or all drives at the same rate as losing a single drive. Sort of defeats the purpose of RAID but that's a separate issue.
Re: (Score:2)
but the prospect of having all my drives inevitably die really scares me
Uh, all drives die at some point, it question is just when. The specific problem here is that they die so frequently that it could happen that two or more drives in your RAID5 might die at the same time, or that your backup drive and the original drive might die at the same time (or before you can get a replacement for the first to die).
Re: (Score:2, Interesting)
I sincerely smile in relief every time I hear someone other then myself say that phrase. It's a sign of someone who truely 'gets it' as far as hardware is concerned.
Thing is, I know Seagate really does try to push for high manufacturing standards (for example, did you know that every last Refurb drive *must* go through the full new-drive qualification before it's sent out? - something only a percentage of actual new drives have to go through because it's time consuming).
Re: (Score:2)
Re: (Score:2)
That's the same place I'm in. Business stuff has redundant backups (sometimes doubly so ;)). I'm just concerned about this particular case because I have a lot of the exact model/fw drives out there, purchased and installed as system drives for friends and relatives in the past 6 months. Worst-case scenario on business stuff is mirrored, often times remotely with versioning for important data.
What's worse is Seagate's own forum seemed to be constantly proving them wrong on the scope of affected drives. This [seagate.com]
Re: (Score:2, Informative)
I've been a denizen of slashdot for many years - I just wish all these mod points were on my main account! :)
But it is nice to be able to contribute knowledge and experience back to the community for once.
Thing is, this issue -is- rare. But it manifests itself in a way that's hard to distinguish from a normal drive failure. (suddenly no detection in the BIOS; spins up but never is seen on the computer - this can happen for a dozen reasons including a loose or bad cable, physical drive failure, etc) so a who
I saw this coming (Score:3, Insightful)
Since I had not heard of massive numbers of Seagate drives failing I already suspected that this is a rare occasion in which the drives would not spin up. I was wondering why Seagate announced this bug berfore they have a fix ready. Looks like they announced at very early. Maybe they also should have put more emphasis on the fact that it is a very rare bug.
It was announced. And people were freaking out about a bug from Seagate without a fix ready. What happens when customers freak? Right: Tons of pressure o
Thx for answering questions here on /. (Score:2)
I was wondering one thing. I got two Seagate Deskstar USB drives last year. I don't have them here to check them, but I believe there is no way to check for or update firmware through USB. Or is there? What are the plans for external enclosures?
Linux user - USB stick boot image? (Score:2)
I use Linux, so I can't use the Windows updating software. I know that there is a boot floppy image available. But I don't have a floppy drive anymore. I guess Seagate won't be making any firmware upgrading software for Linux soon. But a USB stick boot image with FreeDOS or sth. would be nice.
Re: (Score:2)
Many thanks, Maxtorman. Yours is the first useful information I've had out of Seagate so far, and is much more reassuring than the official KB articles and the 'support' I've received from most of the first line techs I've dealt with at Seagate. I only wish you could show this to your management and take credit for it. I hope that they have the sense to keep you and those like you through the coming upheaval.
Now, a few questions, if I may...
Re:THE FACTS (Score:5, Informative)
Thank you! I wish this information would have been public and I didn't have to create a new account to avoid being fired for releasing 'confidential information' - but what can you do with jerkoff lawyers tearing at your corporate heels already?
Now, to your questions!
1) It keeps changing because the scope of the issue keeps changing. I'm pretty sure it's a range of drives within the familys noted in the KB article - but also, there are some external drives affected because they contain an internal drive with the problem, that aren't on the article yet. Your best bet would be to compare your drive to the list of models, and then wait a little while.. around friday, I *think* they should have most issues sorted out and the information accurate. But I can't promise anything.
2) That could very well be it. I'm not privy to the nitty-gritty details, as engineering clammed up pretty quickly - I'm just a geek enough to understand what I hear in passing or the few technical details I came across when I go looking for information. But the mysterious death log being a SMART self-test log would absolutely make sense, and is consistent with what I'm hearing.
3) Unofficially, I've seen more then just the 1.5Tb drives display symptoms similiar to the stuttering issue, but none so blatent or as impacting as it is in the 1.5Tb drives.
As far as the firmware fixing both the stuttering issue and the unresponsive-drive issue, yes. The changes for the stuttering issue was made in CC1H and SD1A firmwares. Any firmware equal or more recent then those two, will have the fix for both issues.
4) I have no idea. SMART characteristics can vary from part number to part number - or even sometimes drive-to-drive; so what is 'out of tolerances' for one part number could be just fine for a different p/n (even though they are the same model number).
Re: (Score:2)
Thanks again, Maxtorman! All very useful stuff. I hope your answers to my questions are useful to others as well. :-)
Re:THE FACTS (Score:4, Informative)
1 word: Lawsuits. if they gave incorrect information, it could open them up for liability if people acted o that information. When a business' data could be worth millions, one slip-up could cost them dearly. The only reason this firmware isn't such an issue, because of the disclaimers allover the place when you flash a drive.
yes, the 1.5Tb drives both stutter and are at risk of bricking due to the journal issue. The Stuttering issue is fairly recent and mostly runs in the 1.5tb drives - but the journal issue is older and exists across many 7200.11 drives. ES2 drives and Diamondmax drives.
SD1A fixes both of these problems in the 1.5Tb drives.
Re: (Score:2)
Apologies can be construed as acknowledgment of an issue, and could carry weight in a civil lawsuit.
And given the rabid scaremongering happening on /. and other enthusiast sites, do you really think that people will stop spamming the support threads with useless whining and mee-too's. While some posts may get deleted, I bet a large number of deleted posts are just people whipping the mob into a frenzy or making unfounded allegations.
Re: (Score:2)
Re: (Score:2)
FYI, SD1B appears to work fine on my two ST31000333AS drives (P/N 9FZ136-300).
Re: (Score:2, Funny)
Our phones are flooded so bad that it locks the system up when there are too many calls in queue
You should look for a firmware update for your phones too!
Re: (Score:2)
Hey Maxtorman,
I personally have not been affected by any of the recent firmware issues on Seagate drives, but I have long been a Seagate fan and have been reading your posts with interest. First, I just wanted to say that the great deal of transparency that you personally and your company as a whole have both shown through these recent ordeals is more likely than not going to counter any possible fear that might have resulted in me leaving the brand for another. So, kudos.
Mostly I'm just curious about som
Re:THE FACTS (Score:4, Interesting)
First, let me apologize, I'm gong to withhold employment details such as tenure and experience mostly due to the fact that many of us at Seagate (including some in management) are Slashdot regulars.
That said, I really do enjoy my time at Seagate, and it has been an absolutely wonderful company to work for.
As far as "BRINKS" "MOOSE" "GALAXY" etc.. are concerned, they are pretty much the internal development names of the drive family. There can be overlap, but most "BRINKS" drives are 7200.11, I believe, while "MOOSE" drives are almost all 7200.10, and "GALAXY" drives are 7200.9. Generally, those names don't make it out into public, but if you were to tear into the SD1A firmware, you'll notice that it looks for the "BRINKS" drive before it flashes the firmware to the drive. There can be different internal names for different revisions of the drive itself, but generaly they stick to one revision per family - a new internal name would only be used for a MAJOR revision on the drive.
I don't have my documentation handy, but I'll look that up later in the week and try to give you a better answer.
Finally, thank you for your kind comments.
Re: (Score:2)
You have absolutely no need or cause for apology. My curiosity often gets the better of me and I end up asking questions the answers to which I have absolutely no entitlement.
If you happen to somehow manage to read over your docs and then furthermore have time to flesh out your answer to me in more detail, I'll consider that a bonus. You pretty much confirmed what I guessed about the different names being internal codenames for entire series, so I guess the more interesting information at this point would
Re: (Score:2)
Re: (Score:2)
This is how customer support should work. Posting on slashdot!
I have 2x500Gb and 1x1000Gb drives installed now, running for ~18 Months, without any trouble yet, unlike various WD drives I had in the past years.
Maybe I will buy a new Seagate in a few months, depending on how this pans out.
Re: (Score:2)
Thank you for your insider informations !
2008 has been a bad year for me.
Two of my beloved 80Gb Seagate died 6 months ago.
Luckily, I was able to retrieve their data before they were unreadable on a 320Gb Seagate. The 80Gb were 5 years old. The 320Gb is 1 year old.
Now, the 320Gb is dying, by doing some noisy clicks, and the symptoms are that it's impossible to copy files larger than 500Mb, it locks XP and I get BSOD.
Luckily, I was able to retrieve the data, because the clicks disappear if I keep the drive po
Re: (Score:2)
What about reports of forum threads and postings being repeatedly deleted? To me that was the worst aspect of this whole issue. And due to Seagate's slowness in responding many will assume the worst - eg deliberate hiding of the issue, hating of the customers, ignorance etc - when the actual cause appears to be overloading of the support process.
Re: (Score:2)
Yeah. I have no idea what's going on with the forum. I don't work directly with AlanM but I imagine he has a set of policies he has to enforce, and that sucks for doing actual dev work on the forums.
The forums are known as dangerous waters for support people to venture in, and forbidden to do so in any official capacity as support agents. But we do read them, especially when things go crazy like this.
A victims point of view (Score:5, Informative)
I am one of the victims and your report confirmed all the problems which I expected to occur inside your company. I previously worked with an electronic giant and the problems are just too similar.
The catastrophic problems which Seagate is facing now could have been prevented - if there would have been one single person in customer service who would have cared and pushed the issue, which was known for months, up to the right people. A little googling some months ago would have proven that this issue is far bigger than a "one time" incident.
After all it doesn't happen every day that Data Recovery companies announce with joy that they are able to handle widespread 7200.11 firmware problems. Or that the two major companies which provide recovery solutions race for being the first to have a two click solution for this cash cow.
Data recovery companies were flooded with drives. They figured out an easy way to fix the firmware and kept it secret. They made a great profit, charging prices as if it was a hardware failure.
Seagate Datarecovery did the same by quoting up to 1800 USD for a 10 minute fix. Although I am sure that they were the only ones not aware of the easy fix.
The problem with the undetectable bios drives really isn't new. Your customer service knew it for a long time, but they are paid so little and probably have such strict procedures that they don't care about Seagates customers and no one dared to report the drive failures as a major incident. Everyone shut up about it and the people which are responsible and do care only learned about it months later when (or shortly before) it got out to the press.
Seagate had months of time to fix it. Two months ago when my drive broke, there was already plenty of information about the problem on the net. The only one who would deny any problem was Seagate.
I warned your board moderator of the disaster which will strike Seagate months ago. I tried to show him that these were not normal failure rates but the poorly paid guy didn't care.
The email support who takes two weeks to respond, and the phone and live support were just as ignorant.
There were people reporting how 4 out of 6 drives broke within weeks, and Seagate would only respond that such failure rates are normal.
People on the Seagate boards were constantly reporting the problem, but your board moderator shut them up. Threads where getting deleted and locked, including a big thread where the community was working on a fix. The reason, according to Seagate, was that it added nothing to the community.
The board moderator would consistently tell everyone that there is no known problem with the drive - the same message as your customer service.
It went as far as blocking links in private messages to a posting on another board which could help the victims. So how could Seagate expect from those people now to actually believe that the company cares?
The posting on the new board had within a short time 10.000 views. That's when things started to get out of hand for Seagate.
People were pissed off for months about Seagate. Everyone knew that the firmware was broken, but the company denied any problems. We knew that it is not that difficult to recover the data if you have the tools and knowhow, but the company wouldn't give any assistance. Many would have accepted the fate if the drive would truly be broken. But not if it is inaccessible because of a firmware bug which makes every single drive a -clicking- time bomb.
People everywhere were calling Seagate harddrives junk drives which are so unreliable that they will never buy them again.
So I, as many others, went on to warn every single person we knew about the problem with Seagate drives. The hilarious/sad thing is that before, I would recommend Seagate to everyone I knew. If someone would ask me which drive to buy I would reply with no doubt: Seagate.
This could have been prevented if Seagate would have acknowledged the problem much earlier. I wasted day after day,
Re: (Score:2)
----
You say that as though it was willful ignorance on
Re: (Score:2)
I think you miss the issue.
Seagate is STILL refusing to give a list of drives that have the problem (whats so hard about a matrix of model/firmware/etc?) - calls to their support do not get through, and emails are being ignored to the large part.
I personally have in excess of 50 500GB 7200.11 drives in semi-critical operation here, on machines that get restarted several times a day - for a while we have been losing drives at the rate of about one a week.
Now, Seagates response so far has been to RMA them onc
Re: (Score:2)
I'm not here to explain what *should* happen to to lick boots of those that have been wronged; I'm here to try and explain what *has* happened and *why* things are the way they are.
Make your choices as you will. I'm just trying to help get some much needed information out that can't seem to make it through 'proper' channels.
Also Here [custkb.com] is a list of affected models. If our drive is on that list, and has the SD* firmware, it's affected. It is that simple.
As for service issues - the facts are thus: Phone is slam
Re: (Score:2)
The SD1A is the firmware that was being pushed out supposed to *fix* the stuttering and bricking issues.
However, it was pulled from the KB article as it was bricking some of the 500Gb drives worse then the original problem it was pushed out to solve.
See the actual article these commends are tagged onto for details.
Re: (Score:2)
Re: (Score:2)
that article on MSFN is about the best reference I've seen yet - I am really not familiar with working on drives via a serial interface, though I may pick one up cheap and low capacity off eBay and play around! I know of no special commands more then that article describes. however, I'll look around and see if I can get that information (though I don't know how much I can disclose if I do find it.). We'll see what I can dig up.
Also, I cannot say for sure it's EXACTLY 320 entries. That was the number bandied
Re: (Score:2)
Unrelated to the new drives, but I'm hoping you can help. I have an old Maxtor drive (manufactured 2001, 60GB D540X-4D) with some bad sectors on it. I would *love* to be able to perform a best-effort, dirty read on them, instead of getting back nothing and a CRC error. Is this possible? I looked high and low on the net and couldn't find any free utilities. Best I found were references to expensive crap like SpinRite, which looks like snake oil. At this point I'm looking into hacking the Linux IDE driv
Re: (Score:2)
Re: (Score:2)
I got a copy of the disk -- actually I managed to get all of it except for about 22 megs. It's just that the 22 megs has some of the crucial pieces. If I could read something like 480/512 bytes that would be much better than nothing.
Thanks for the RS232 tip. I'll look around and see if my Maxtor drive has something similar.
Re: (Score:2)
I bet you'll find that hard drive firmware won't even let you read bad blocks in the first place, so hacking the IDE driver will get you nothing.
Dude just re-download your pr0n. 60GB isn't that much :-)
Re: (Score:2)
I bet you'll find that hard drive firmware won't even let you read bad blocks in the first place, so hacking the IDE driver will get you nothing.
That's what I'm hoping to find out from "maxtorman" :)
Dude just re-download your pr0n. 60GB isn't that much :-)
If it was porn I wouldn't care. These are personal documents that I lazily didn't back up.
Re: (Score:2)
Re: (Score:2)
A bad block is a bad block
Thanks for your reply. I know the data is still accessible, because sometimes a read will work after repeated attempts, but this happens so infrequently and it takes so long to read from bad blocks that it's no longer worth trying any more. As for bad blocks being re-allocated, the drive has already used all the spares, so any read that fails is from the data I need. I just need a way to tell the firmware to ignore the CRC and give me whatever data it sees.
Thanks anyways. I'll continue looking on the ne
Re: (Score:2)
1) The drive manages to recover the contents of the bad block successfully.
2) The block is written to.
Until then, the block is put on a list of pending blocks that need to be reallocated. (Basically, modern discs try really hard to handle unreadable data.)
I believe there may be ways of reading raw blocks, but I'm not sure that'll help you. Modern drives us
Re: (Score:2)
Thanks for that very clear non-bull explanation.
I was all about to go on a rant about QA and release management, but I should have known it was middle-management cutting corners.
(Let's face it, that does tend to be behind a vast number of faults in any technical company.)
Re: (Score:2)
Re: (Score:2)
OEM drives like that often have a special firmware designed by the OEM themselves based on Seagate's stock firmware.
It may or may not have the problem, but all the OEMs have been given details about this issue and are responsible for checking their firmware and updating it as necessary.
Re: (Score:3, Interesting)
Re: (Score:2)
I work for Seagate. I was there when the fit hit the shan
Aha! Azathoth-worshipping insects from Shaggai are running the company. I knew there must have been a reasonable explanation for all this.
Re: (Score:2)
The word is 'delta green.'
And get an update out on the Hermes network while you're at it. I'll initiate the call in to Argus.
Fnord.
Journal/Log Space... (Score:2)
"Then the bricking issue came to their attention. It took so long because it's an issue that's hard to track down - pretty much the journal or log space in the firmware is written to if certain events occur. IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS."
Can we view and clear this with ourselves to avoid the bricking in
Re: (Score:2)
Re: (Score:2)
Thanks Maxtorman! I did subscribe to the notifications to those KB documents and the forum threads. And of course my /. replies to your /. comments. I want this resolved before it is too late. :)
Re: (Score:2)
Hmm, and I'm supposed to believe that when your nickname is "maxtorman"? Your plan was cunning and clever, but your subconscious knew of your evil plot and forced you to reveal your hand!
Re: (Score:2)
DRAT! My carefully laid plans have been revealed!
This is just my ploy to get you to buy Maxtor DiamondMax drives instead! :D
CURSES! *shakes fist*
Re: (Score:2)
If someone named "Maxtorman" works at Seagate, we know where the problem comes from :)
Sorry couldn't stand remembering my 450 MB Maxtor drive *g*
Re: (Score:2)
*laughs*
I just thought it sounded like a cool nickname.
Re: (Score:2)
750G too? (Score:2)
Hi
Thanks for all this info.
I have a question. Got a new as yet unused Barracuda 7200.11 750GB ST3750330AS, date code 09103, P/N 98X156-303, firmware SD15.
Will this experience the issues mentioned in this thread (stutter, bricking)?
If so, would a firmware upgrade fix it?
If so, which firmware level?
Thanks.
Re: (Score:2)
Seagate's knowledge base article [custkb.com] says it's affected and that you should email in if you're running older than SN06.
Re: (Score:2, Insightful)
I absolutely agree with you.
If we had been allowed proper development and proving time, this may not have been an issue at all.
But the moment Seagate even admitted there was an firmware issue with the 1.5Tb drives, lawyers began recruiting for class action suits.
Disgusting ambulance chasing fecal sniffers.
Re: (Score:2, Informative)
Wait a few days - Seagate will have in place a procedure to get bricked drives due to a bad firmware, in place. Once they do, you should just be able to send them the drive and it'll be reflashed with good firmware and sent back. I can't say this for absolute certain, but that's what they're telling us now.
If you have confidential data on the drive, you have two options:
a) if you send it in for a reflash, there will be a tech who flashes the drive using a serial interface, and then verifies good read/writes
Re: (Score:2, Interesting)
I've worked tech support for a while now, and that's something that's struck me working for seagate.
Just about everyone there actually.. well, -cares- about what they do. The atmosphere inside is completely different then say, dell or cox or AT&T (from what I hear - I've never worked at any of those places, but many of my coworkers have).
The management lets support agents have license to use their experience and skills in fixing issues, instead of reading from a script. They don't heavy-hand them, and t
Re: (Score:2)
Yeahh.. about that.
Chat is *not* where we keep our brightest agents.
I'll leave it at that.
Re: (Score:2)
That fact is a shame, since it is the most easily recorded process of interaction. In particular, I note the representative's lack of capitalization, punctuation, and other grammar issues.
Frankly, it puts a face of ignorance on Seagate. Support is, and I apologize for the cliche, "Director of First Impressions." For this reason I have little to do as possible with the likes of Dell and ComCast (though my experience with AT&T has been mixed.)
Seriously, you do not want to upset people who come to you fo
Re: (Score:2)
Nice to know Seagate's warranty covers Acts of Dog.
Thank you, I'll be here all week...
Re: (Score:2)
Re: (Score:2)
Sorry, I was using "One in a Million" as more of an expression then a valid statistic. :)
But yeah, your math adds up, though actual field results I'm sure are much lower. 30% of 14 million drives(the number of drives potentially affected) is 4.2 million - we'd be overran with dead drives.
Re: (Score:2)
The drives have to go through a calibration and burn-in as part of the manufacture process, which should have already detected any bad sectors and reallocated them and then the SMART zero'ed out before going into the field. It's always a possibility that a few sectors were just on the tipping point and weren't detected during burn-in or later went bad for other reasons.
Having a few reallocated sectors like that is a pretty consistant event across all drives, no matter then make, model or manufacture date or
Re: (Score:2)
There are windows programs that will do such tests.. otherwise in linux you can just "dd if=/dev/random of=/dev/ count=1" or something like that (I can't remember exactly the commands to end the dd at the end of the drive), and just run your SMART check after the drive's finished being written to. Then do the same and zero it out (/dev/zero instead of /dev/random) and check again. this will obviously wipe any data on the drive, of course, so don't do it on a drive that you care about the data, as it won't b
Re: (Score:2)
There is a set of part numbers that it regularly bricks, but I don't have access to the details why and what just yet. When I do, I'll try to update with details.
I do know that swapping the PCB won't help a firmware bricked drive as the firmware is not kept on the PCB - it's kept on the drive itself mostly.
Are your drives recognized in the BIOS? If so, then better to just wait until they have the good firmware. If not, then wait until they have the process in place for returning the drives for a physical fi
Re: (Score:2)
Everyone has their personal manufacturer blacklist due to personal experience or anecdote. I can't say I blame you!
To make matters worse, the list needs constant maintenance. Nothing else will help. I agree that reliability in "normal times" is pretty similar across vendors. There may be issues specific to one vendor or drive model, like older Maxtors reacting badly to heat or WDs having interface compatibility issues, but these are longer term and you can find out about them beforehand. For the one that is
Re: (Score:2)
Man, I wish it was that easy.
We had an old WD 30GB fail in a RAID-1 array. While awaiting approval to replace the drive, the IBM Deskstar 40GB mirror failed. (Yeah, old drives, right?)
As an act of desperation against reloading the server and paying a software vendor to reinstall their system (and, no, this server is not frequently backed up for reasons and discussion I wish to avoid,) I put the Deathstar in the freezer overnight. I figured, what the hell, the worst that could happen is I would still have
Re: (Score:2)
Re: (Score:2)
The update is on a bootable FreeDOS CD image. I'm not sure if you can persuade an x86 Mac to boot DOS natively, but at least you don't have to use Windows.