cancel
Showing results for 
Search instead for 
Did you mean: 

Harddrive woes

N/A

Harddrive woes

I have posted our setup in the past, but in a nutshell, we have one system with two identicle drives used for central storage.

Last August, right in the middle of the Blaster virus problem, one failed, and it took 3 weeks to get an engineer to come out and fix it.

We had backups anyhow, but we where able to do a full transfer to an old disk through which we could provide services on in the mean time.

The system has started acting up again over the past two weeks, with 5 freezes. 1 two weeks ago, 1 at the weekend, 1 yesterday and 2 today.

We are waiting to book an engineer to come out and do a swap out, however, we need to determin which disk it is first so that we are in a possition to run a instant rollback once completed.

Does anybody know a method to work out which disk is causing problems at all?
13 REPLIES
Community Veteran
Posts: 14,469
Registered: 30-07-2007

Harddrive woes

No, but it may not be the disk at all. Lockups can be caused by so many things. PSU failing (not supplying enough power, one of the supply rails not up to voltage/current), driver conflicts, CPU overheating - what are the CPU temps? etc.

Lockups are one of the most difficult problems to solve and may involve replacing individual components until it remains stable i.e. stops locking up.

How old is the PC and what PSU rating are you using (Watts)?

I assume this is not a raid setup (striped/mirrored) - maybe you need to consider this for future reliability.
N/A

Harddrive woes

RAID is in the pipeline, however, I am a one man team and my job isn't dedicated to our own IT platform (but by hell it is when it fails).

I am stuck with what we have until that time however.

Planning is almost done (2 loats of mirrored disks, 4 disks in total), approval is a month or so away, then I have to cost it.

Anyway, back to the original issue.

It is most definatly the the problem. Hard disks are thankfully the easiest problem to spot, but if there is more than one, locating which one it is, is a bit of a problem.

As noted, it is the same issue that happened to us last august.
The HD light is perminanly lit when it happens
The damned clock-of-death looms in the air
You see the light come on solid and can use the computer for a few seconds, however, as soon as you move the mouse and hover over a icon (within the few seconds you can) for a screentip it freezes (disk access).

As noted, it happened twice today, and on both occasions, litteraly after large COD's.

Last time, it was happening often enough to hold the drive and work out which one it was. Without spending a lot of wasted hours to locate which one of the two it is this time, I am looking for an easier method.
Community Veteran
Posts: 14,469
Registered: 30-07-2007

Harddrive woes

Is it possible to run a scandisk or norton diskdoctor on each drive and see if it causes the lockup. If your running XP and have NTFS then that is not possible while XP is running.

One alternative is to run chkdsk from a cmd prompt on each drive. The default mode is to not fix anything but it will exorcise the disk and may cause a lockup.

Other than finding another app that will cause lots of disk accesses I can't think of a way to prove which one it is.
N/A

Harddrive woes

On Monday I ran a full scandisk (NTFS disks) in XP, wihtout the repair option, such that the whole disk is scanned.

It didn't cause it.

The failure yesterday was actualy monday. Talk about time flying.

Anyhow, I had MySQL running along with Convea to use as our Intranet application. We already had IIS running to provide localised AVG updates. I first though it was down to them so disabled them on monday.

That left us problem free yesterday, then we had two locksups today, within an 3 hours space.

Needless to say, I am performing backups overnight (praying for no system crash), and will ghost them tomorrow too.

Luckily I already have our second hand backup disk from last time, and will start populating it. I also have a disk waiting for me to fit in my brothers system (long time to wait there though), so I will nick it and rebuild the Operating System too.

This will have to be done of Windows 2K. I will place that system into full time operation next week.

At that point, I can start saturation tests on the XP system and diagnose which disk it is.

Once I know which disk it is, I can put a further plan into operation.

Should it be the Secondary (data) disk, I can use the data restored from the 2K system once we move everything back. Should it be the Primary, the secondary will take a ghost restore of the primary.

Effectivly, once identified, I can set the thing up so it is the secondary failing.

If we lose the Primary completely, we are talking 24 hours downtime, though overtime will be the bonus for me.

Trust me, I can't wait to get RAID in place too. These issies will be non-existant, or at least be down to hot-swap and regeneration.

I did ask some time ago, but do you know of any SATA RAID cards that provide hot-swap abilities?

The idea will be to have two cards, 4 drives and a mirrored setup.
N/A

Harddrive woes

There's a free version of 'Tufftest' Tufftest Lite.
It's similar to memtest, run via a boot floppy, but it check the
entire pc including hard drives, by carrying out Hysterisis &
Surface analysis. It scans each drive and lists any errors.
I'm not sure if the tests are rigorous enough to reveal the problem
drive you have but I'd give it a go.
New hard drives have 'SMART' monitoring technology built in,
you can enable this feature in the BIOS if the motherboard supports
it and will warn when there are problems.
Another Utility is DataAdvisor,
this uses both basic and SMART tests. There's also an evaluation version,
but I don't know what features are available for evaluation.
Active SMART
is a pretty good utility to have when you eventually upgrade, it utilizes
the SMART features of the disks to carry out comprehensive checks
and reports on each disk in the system.
Also there's the usual online tests and downloadable utilities,
from the major manufacturers, Fujitsu, Maxtor, Western Digital etc.
Alec.
N/A

Harddrive woes

If you haven't already you could try taking the machine down for half an hour (less if you're faster than me), remove the drives, cables and RAM chips, then put everything back.

I guess it's the equivalent of a kick, but it would be my first step. I've not yet had a drive fail, but cable connectors etc do seem vulnerable to heat, dust etc over time and simply taking a machine apart and putting it back together has sometimes worked wonders for me.
N/A

Harddrive woes

I already know 100% that it is a harddisk failure. I am constantly relacing them at a rate of one a month. However, the number of machines I deal with that isn't much and does present an average life of over 3 years still.

Anyhow, today we are suffering a MSO because of it. We have no document available to us.

The backup failed at 43% after I left the office and todays attempts to restore service have so far failed.

I have traced the failing disk (not 100% sure, but some saturation tests after will clear that up) to the Secondary, which is good news, however, the broken segment resides in the document area of the disk.

I am begining to wish I hadn't defragged the system last week now, as that would have meant the borken data was likely not in the document region of the disk.

I will have to get back to this topic later though, as I am only waiting for the whole system to cool down, before attempting to grab the files again.

Once I have the files, the disk can go out of order and service will be restored until it is time to fit the new disk.
N/A

Harddrive woes

Talk about pear shapped objects.

After letting the system cool down, it refused to boot. Fans are spinning but there is no power light.

Anyhowever, after a full refit later, still refuses, and it isn't the HD causing it this time.

However, I am still right about the drive. It is now in a backup 2K machine, where it took an annoying amount of time to get the system back in place. I am still mopping up the bits on that.

It looks like many parts of the system are cooked with multiple heat fractures. AKA, only the system being on before was keeping it alive.

I am now gonna push for this raid setup, though that won't help the heat fracture issue.

The strange part is, system temperatures are only 7c above temperatures at off (measured and recorded prior to original install).

Anyhow, I have just been informed of another job, involving a possible HD failure. Hopfully it is just a boot sector corruption.
N/A

Harddrive woes

Multiple heat fractures - yuk. I've definitely not had any of those. I lead a very sheltered life.

Hope they buy you some new drives (computers?) sometime soon ...
N/A

Harddrive woes

These systems are only 11 months old.

The issue is with our backup computer. We don't want a full blown server, as there is little point, just a place where all our file can be at any one time, then backed up easily.

The second system that went was on another part of my job, and was total toast.

To make matters worse, I havn't eaten, because Manweb made my bank ballance read a juicy figure. Though I don't understand the "-" character Tongue

The drives are all under warrentee, which is good.

I would say multiple heat fractures is probably over the top. It only takes one tiny fracture to prevent a system operating.

The idea being thermal expantion keeps the electrical bond, and only when it cools and contracts do you see the system fail.
N/A

Harddrive woes

LOL - Here is the news of the day.

I have used the wrong backup system Tongue

As such, it isn't our secondary disk that is failing, but two primary ones. Leaving the secondary OK.

For some reason the system wants to play ball now, sugesting a possible intermitant heat fracture on the Primary disk. However, disconnecting all drives prsented problems.

Regardless, the backup system will be in use for some time, and I will likely push for the RAID setup to be put in place before going back onto the normal platform.

The had-drives I have no trouble with, however, I am still after a recomendation for a quality SATA bus card, that supports RAID and hotswap functionality.

That is one of the things delaying the rollout of this, as we want to plan it such that hotswap is built in.
Community Veteran
Posts: 14,469
Registered: 30-07-2007

Harddrive woes

Quote
I am still after a recomendation for a quality SATA bus card, that supports RAID and hotswap functionality.

That is one of the things delaying the rollout of this, as we want to plan it such that hotswap is built in.


Lots of questions here:

- Is this a brand new system - probably to support hotswap
- 32 or 64bit PCI (32bit PCI will limit your choice as most raid controllers are 64bit now)
- How many disks including possible future expansion
- Hot spare support?
- Internal or external hotswap disk cage
- rackmount or free standing

A useful review/benchmark article from Tom's Hardware: http://www.tomshardware.com/storage/20031114/index.html The RAIDCore looks like a good candidate.

This US web site gives you some ideas even if it's just for reference: http://www.scsi4me.com/index.php?menu=menu_sata&cat=S-ATA

Cheers

Peter Cool
N/A

Harddrive woes

If I wanted SATA RAID I would probably go for something like this.

The hotswap capability seems to be a high-end feature I'd find hard to justify unless 24/7 availability was essential.

But I'd definitely eat something, whatever Manweb have done. Smiley