Hard crashes, instability, please help find the cause

Discussion in 'CPU, Motherboards and Memory' started by Cogitation, Mar 28, 2009.

  1. Cogitation

    Cogitation Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    Please, if you're a technician/professional/skilled user, take a look at my story, as I've exhausted what I can figure out on my own. (Also, if I should be putting the post somewhere else, let me know.)

    I built my own power machine about a year and a half ago, and I've done nothing but suffer with it. I'll try to keep the description of the issues as brief as I can, but there's a lot of backstory/specific issues.

    First, I'm an A+ certified tech, and I built the machine myself, so I've done a bunch of investigating already myself. I've become convinced the motherboard has to be the problem, but I've been burned multiple times, and I'd really like to be certain before I shell out *another* 200+ bucks.

    Here's the system specs:

    Gigabyte DQ6-X48 mobo
    2 x 2 GB OCZ Platinum PC3-10666 ram (ie DDR3-1333) back at stock speeds
    Intel Core 2 QX9450 2.66Ghz, was at 3.40Ghz, back at stock
    Radeon 4870 1GB video card (was previously nVidia 9600GT)
    Sound Blaster X-Fi Elite Pro soundcard
    2 x 750 GB WD Sata II drives in Raid 1 (part of the story)
    1 x 250 GB IDE WD
    1 x 200 GB IDE Seagate
    LG Blu-ray/HD-DVD reader/dvd burner
    2nd DVD/CD burner
    Corsair TX650W power supply
    CoolerMaster Cosmos Case with 4x 120mm fans

    Legit Windows XP Pro Service Pack 3 with all updates
    Latest update of DirectX 9 installed

    Tests I've done on my own:

    CPU Burn-in x 4 - Passed with flying colors
    MemTest86 - Passed all tests twice
    Video card stability test - immediate failure with both ATi and nVidia cards
    HDD Health on drives - All drives in excellent condition

    Now to the problems. First, the two 750GB drives used to be set up in a raid 0 array, instead of raid 1. Several months back, my machine did a hard freeze in the middle of watching a blu-ray movie, I heard the pc speaker beep once, and rebooting the machine resulted in the raid 0 array showing failed.

    I thought one of the hard drives had failed. After testing in another machine, I learned both hard drives still worked fine, and HDD health reported them as in excellent condition.

    I tried reinstalling windows on the drives in raid 0, using the raid boot disc on the Intel ICHR9 raid controller, and windows blue screened during installation (more than once.) Using the gigabyte raid controller (the motherboard has two sata/raid controllers, intel having 6 ports and gigabyte having 2) which is exactly what I was using before with the drives when the raid 0 array failed, I was able to install windows without problems, resulting in the current raid 1 setup.

    Since then I've had repeated crashes, identical to the first (always under load on the hard drives from uTorrent, or load on the video card from a game or something), where my system will hard freeze, sometimes with a beep and sometimes not. These last few crashes resulted in blue screens, with windows reporting a problem with either a non-specific device driver error, or the latest time, a display driver error.

    Because I'd also been having some non-fatal, but noticeable issues with with video (specifically, any time I run a game in 1600x1200 resolution (and only this resolution) I will get green speckles that flicker along the edges of polygons and spread over the screen. Note that running a game in 1920 x 1200 (or any other) resolution does *not* result in this speckling, to my great confusion.

    But even still, with the device driver/display driver blue screen messages, the in-game distortion and the fact that I got the 9600GT card cheaply and the PCB on it is a little warped, I became convinced it was the video card. So I figured I'd fix the problem and upgrade the card at the same time, getting the Radeon 4870. I uninstalled the old drivers, used driver cleaner in safe mode to clean everything out and then installed the ati card, downloading catalyst 9.3 drivers, not using the ones on the disc.

    Playing a game in 1600x1200 resolution *still* results in the exact same green speckling all over the screen, (while 1920x1200 still does not) and it still freezes quickly, even playing something as old as Diablo II. The only positive, is that Ati's VPU recover seems to help to some degree, because now when it crashes, I don't have to reboot, the driver just resets.

    So, if both video cards produce the same results, that means it's the motherboard, right? I can't see what else it could be, save maybe the power supply, but though the radeon 4870 draws serious watts, the 9600GT didn't, and this was happening well before the radeon was installed. 650W (and it was not a cheap model) should be plenty for a single video card, yes?

    What I want badly to know is whether I can be certain it's a bad motherboard, because if that's the case, I'll just replace it and rebuild the machine (even though I really loved the features the board offered and you can't even buy a DQ6-X48 anymore.)

    Is there something I'm missing though? If I replace the motherboard for $240 for like an Asus X48 model and I still have problems I might go bald at an exceptionally young age.
     
  2. Cogitation

    Cogitation Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    For anyone who's thinking tl;dr, please, I really need help with this. Short version: My computer is crashing and I don't know if it's the motherboard or the power supply or something else. Please help!
     
  3. BoBBYI986

    BoBBYI986 Geek

    Likes Received:
    1
    Trophy Points:
    18
    have you ran prime95? you are running quite alot of stuff(4 hard drives,4x 120mms fans, 2x opticle drives,core2quad processor, 1x ati 4870) that's alot of stuff for a 650w psu, must be pushin it close to its limits. when you overclocked did you run any stability programs e.g. prime95. sisoft sandra? to check if it is 100% stable?

    btw: where you using stock cpu heatsink when overclocking? what are your temps? cpu, northbridge & southbridge?
     
  4. alexcmia

    alexcmia Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    Having A+ means nothing my friend. I have A+, Network+, Security+, MCP, MCSA, MCSE, and CCNA, I'm still dumb. Those certs are what you can show to companies so they could hire you, though even that does not guarantee you will be successful. Social skills, which most of us nerds lack, is (link)what it's all about. If you have them social skills, you do not need certs and you can get 80+ k a year on your first job. On the other hand, if you're a blub, they'll offer you 20k and treat you like crap and when they done they'll discard you with extreme prejudice.

    I found that those who think they know the most, are often the once with the most problems. In my own life, I have found that taking it easy is the best approach.


    A) I had similar problem for two months. Hard crashes, I thought it was flash plugin, because it would happen when I would watch flash movies. But NO, it was SCSI.

    I have two SCSI drives, and they have to be configured with jumpers for different SCSI channels. Somehow, during one of the re assemblies for whatever reason, one of the jumpers disappeared, I didn't notice. Further more, I stuck a hard drive LED in place of where the jumper should go. This gave me all sorts of weird stuff, like firefox and IE crashing when viewing flash movies, or problems with other, none SCSI hard drives. I found my problem by mistake, had nothing to do with my training and certs.


    B) Go to BIOS, press CTRL + F1, for advanced settings (in Award bios), screen will blink once. There is a performance setting somewhere. Change it from turbo to normal. In fact, Load IDE Defaults, and make sure that particular setting is set to normal. TURBO will OC.



    C) memtest86 is a free memory testing program. Test your memory while in dual channel mode (if you have that and I bet you do). Run it over night. (note I too had intermittent memory problems, but it turned out to be misconfigured SCSI).




    D) Forget that you're all that, instead think that you are new, and take it easy. Use your A+ training instead of your brain, (link)which can lie to you. If above advice does not do nothing, start removing components.




    -Alex


    P.S. If you near Aventura, FL, I can take a look at it.
     
  5. Cogitation

    Cogitation Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    BoBBYI986, I have not run prime95. Basically because I know for certain it's not stable. At all. It's constantly hard crashing under any kind of load.

    As for the load on the PSU... I did some research on this, after it was suggested to me that that might be the problem. Under 100% load on the whole system, a benchmarking site reported 553W in use with a line meter testing on a filtered power source for an almost identical system. Also, the problems all existed before when I was running the 9600GT, which uses less power. I'm not saying I know it's not the power supply, but from what I've read, unless it's malfunctioning, 650W should be plenty. I just don't know it's not malfunctioning (though the voltages don't dip at all under load.)

    When I had it overclocked, I used CPUBurn-in running in four threads for several hours initially, without any reported problems or even slowness (despite 100% usage.) Didn't test the video because I wasn't overclocking the card.

    For temperatures, motherboard idles at 90*F, and under load it gets up to 105*F. Those are from the motherboard utility, not speedfan. CPU idles around 112*F and doesn't get above 120 degrees. Used to go up to 138~140 at the most when I had it overclocked. I've got a huge Zalmon cooler on it and the case has good airflow. As far as northbridge/southbridge, the utility doesn't distinguish for me, but they're copper heatsinked.

    alexcmia:

    About mentioning I have an A+... I was saying that in case anyone had a technical question, or wanted to suggest a complicated/delicate procedure, so they would know I have at least some basic level of competence and they could suggest it. Certainly not to brag! My track record with this machine is abysmal and embarrassing. I'd *welcome* someone pointing out a dumb but simple error I've made.

    About the bios, I know exactly what setting you're talking about, and it's never been off normal. I manually overclocked the system back when it was, and everything's at stock speeds again. I mentioned in my first post a list of tests I had already done:

    "
    CPU Burn-in x 4 - Passed with flying colors
    MemTest86 - Passed all tests twice
    Video card stability test - immediate failure with both ATi and nVidia cards
    HDD Health on drives - All drives in excellent condition
    "

    And I'm in CA. Really, I know it's a long post, but I kind of listed the history of what's happened above. Based on what I described, what should I be checking? They're sata drives, not scsi, so there're no terminators or anything... not even dip switches to have wrong for configuring master/slave.

    Or, what should I be removing? They're two expansion cards installed. Sound blaster and Radeon 4870.

    The fact that the video card test immediately fails with *both* cards makes me believe something is wrong hardware-wise. Either both cards are bad, or something in the motherboard/ram/cpu system has to be bad. Right? Or the PSU... (and I really wish I had a different one to test with and be sure--I don't) but these problems seem...hardware related... and I've *never* had a sudden reboot, or the beeps upon bootup or anything like a PSU failure to boot (which I have seen personally on other systems.)
     
  6. alexcmia

    alexcmia Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    I'm suspicious of raid. It's worth trying with one hard drive only, or linux CD with hard drives unplugged. Also know that memory could have intermittent problems, which will not show up in memtest. Memtest has test #9 which runs for 3 hours, and must be selected for use manually, try it.

    A+ training tells us to try basic setup and remove a few things.

    Gigabyte DQ6-X48 mobo
    2 x 2 GB OCZ Platinum PC3-10666 ram (only one if possible)
    Intel Core 2 QX9450 2.66Ghz, was at 3.40Ghz, back at stock
    Radeon 4870 1GB video card (was previously nVidia 9600GT)
    Sound Blaster X-Fi Elite Pro soundcard
    2 x 750 GB WD Sata II drives in Raid 1 (part of the story)
    1 x 250 GB IDE WD
    1 x 200 GB IDE Seagate
    LG Blu-ray/HD-DVD reader/dvd burner
    2nd DVD/CD burner
    Corsair TX650W power supply


    Order of suspicion is like this, with the most likely suspect at the top.

    Misconfiguration
    SCSI, RAID
    Memory
    Power
    Motherboard


    It could be as simple and as stupid as wrong jumper. On the other hand, leaky cap on the board, or psu could cause all kinds of weird ufo stuff. Also, software, try linux or linux CD, those have the ability to not boot and prompt you a problem if something fishy is going on. None of this has to make sense or reason, it's just a process of elimination. When you throw reason and thinking into it all, that's when you screw up, it's the way our brains work.
     
  7. Cogitation

    Cogitation Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    Before my post, I'd just like to say thanks for any help you've tried and continue to try to provide. If I tend to disagree in my post at all, it's just because I'm trying to figure out what's wrong and going by what I've seen personally. If you think otherwise, please tell me and why.

    "None of this has to make sense or reason, it's just a process of elimination. " That's a good point, and worth keeping in mind.

    Still... why would raid cause the video card stability test to immediately fail for both cards? It doesn't write anything to the hard drives at all.

    Additionally, I backed up the system over the last week using 51 fully filled DVD-R's. That was directly from the Raid 1 array to the DVD-RW burner, 51 times in a row without a glitch, with some splitting into part rar files and the like to get a file onto disc. I should have mentioned that, but my first post was already really long.

    The moment I do something video intensive however, (like a video test, or a game) it could crash at any moment, and reports device driver errors/display driver errors and all manner of hardware alarm bell signals. Then there's the green speckling *only* in 1600x1200 fullscreen resolution, but not in 1920x1200 or any other. That's not in one game, but in every 3d fullscreen program that runs in 16x12.

    Further, if neither the ICHR9 controller works (I couldn't get it to work when installing XP, with multiple attempts), nor the gigabyte controller (since you're saying that might be the problem)... doesn't that mean the motherboard is the problem? I mean, the controllers are on the motherboard. Are you saying I have it misconfigured for raid? Because I don't see how that could be. It's a step by step process and if you fail at a step, it doesn't work. They're not SCSI drives, nor IDE... there aren't any jumpers or terminators or anything to be set at all. There are two sata ports on the motherboard that go to the Gigabyte controller and they're color coded.

    The gigabyte controller is bios configured to RAID/IDE, of the three choices: RAID/IDE, AHCI, IDE. The first runs sata ports in raid mode and the ide ports in ide mode. The second runs sata ports in achi (and neither motherboard documentation nor the bios stipulates what happens to the IDE ports, though they still work) Third runs everything in legacy IDE mode. Putting both controllers in legacy IDE mode, which is something I did during one of my attempts to install XP, resulted in 5 ide channels with master/slave each showing up in the bios during boot up, and in XP, there was so terrible a DPC lag (checked with process explorer and dpclat.exe), that every 1 second, the system would seem to freeze for half a second. Completely unusable, and I reconfigured into raid 1 on the gigabyte controller.

    The ram is dual-channel, unfortunately, at least for testing purposes. What's a leaky cap? Do you mean a SCSI terminator?

    It's going to take a while, but since I agree with you that it can't hurt to narrow it down, I'll go ahead and take everything apart tonight and get it to barebones. I'm going to have to back up the smaller seagate drive and use that one though, because the WD is filled from a previous backup that I have not yet been able to put on DVDs.

    I will report results with a clean install and minimum hardware as soon as I have them.
     
  8. Cogitation

    Cogitation Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    MAJOR UPDATE:

    Okay... so as I said, I followed your advice, alexcmia. Instead of immediately stripping everything, I did something a little different, with the intention of stripping everything out afterwards.

    Without disconnecting anything, I installedg XP on the 200GB IDE drive, planning to run tests on it with the full system still loaded, to try to isolate it to only your hypothesis that RAID was the problem. If there were no errors this time, that would say that either the previous XP install, or specifically the RAID was the issue. If not, I could then strip stuff out, to try to see if it was the power supply or a piece of hardware.

    XP installed fine on the 200GB IDE drive. No hitches installing motherboard drivers, avast, catalyst drivers, .net 2.0, sound card drivers and firefox, in that order.

    Trying video card stability test produced different results. This time, upon starting the test, an identical several second system freeze immediately occurred, with a motherboard beep almost immediately. Just as quickly as it froze, the test suddenly jumped from 0:02 seconds completed to 0:14 seconds completed, and started running. I let the test go on for 10 minutes, then stopped it. During the run, nothing seemed abnormal, and checking the video card temps in catalyst control center resulted in average, even low temperatures with low fan spin speed throughout.

    So still something definitely up, but improvement. Don't know if it was just chance that it ran better this time, or the difference of new xp install/IDE instead of raid.

    Then I decided to think I was getting clever, and thought I would try to overclock the system to the same levels as it was completely stable at initially, to try to test the power supply (425 x 4 frontside bus, 3.40Ghz proc, 1415mhz ram.) System booted, got to the windows startup screen and froze. Shutting down, waiting and restarting resulted in the PC failing to get into bios, with no beeps. It automatically shut down and tried again before I realized it hadn't booted and hit the switch on the back of the PSU.

    At this point, I disconnected all non-essential things, including both optical drives, removed the sound card and disconnected the raid hard drives. Turned system on, failed to boot into bios. Turned off, waited, turned on, got into the bios, reset to stock speeds, let it reboot, and got into windows, whether coincidentally or as a result, without trouble this time.

    Despite my earlier thinking, this does seem to scream "Power supply!" in a rather loud, hard to ignore voice. General agreement? Is it just my stubborn thinking/worrying that makes think there's still a possibility it's a motherboard issue, or potentially both the motherboard and the power supply, with different problems for each?
     
  9. alexcmia

    alexcmia Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    There is an easy way to test PSU. Attach meter to ground and 5V for test one, then for ground and 12V for test two. Run your tests when everything fails. Voltage should be stable.

    But we really aren't getting anywhere here, and giving you further advice is absolutely pointless. (you still haven't eliminated memory problem or ran everything in basic, or even used linux cd) It's as if your mind is fighting it over with you.


    5 Ways Your Brain Is Messing With Your Head | Cracked.com
     
  10. Cogitation

    Cogitation Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    Okay.......

    As for getting the memory checked... I've run memtest86 now, twice. Both times without error. I'm not sure exactly in what respect I've failed here. Did you mean hardware checking? I'm using DDR3 memory. I don't think the local computer shop is going to have a hardware DDR3 memory tester. The fact that I didn't report trying a linux live cd? I get why you suggested this, but I'm not interested in running linux on this machine as my O.S. Not because I have a problem with it, but because I want the machine to run XP. As a tester, or boot disc, sure... If you're saying I can't test the memory without using a linux live cd, in what way is that true? Haven't ran everything in basic? Do you mean with most of the components removed, because that's precisely what I'm doing, right now, and what I said in my post.

    I pulled everything out of the system, as you suggested, and the problems remain... in fact I'd say they got worse, as the system failed to boot, in a way that sounds exactly like a power supply problem.

    ...I read the link you posted the first time. I don't see any relation whatsoever to my PC. Saccadic Masking, change blindness, prioception, etc... are not related to this at all, and you just sound like you're trying to be condescending. I don't understand why; I was appreciative of your suggestions so far and said so.
     
  11. BoBBYI986

    BoBBYI986 Geek

    Likes Received:
    1
    Trophy Points:
    18
    your temps are fine so can't see that causing any malfunctions. so its either PSU or motherboard. could be your ICH9R southbridge chipset Raid controller has malfunctioned.
    could also be your graphics controller on the norhbridge that's malfunctioned.
    could also be a PWM on your motherboard that monitors the voltage regulator output that's malfunctioned. could even be a dodgy mosfet voltage regulator transistor or driver. even a messed up ferrite core choke.
     
  12. Cogitation

    Cogitation Geek Trainee

    Likes Received:
    0
    Trophy Points:
    0
    BoBBYI986:

    Wow... all of those possibilities sound... like they would be causing exactly the problems I was experiencing. I'm amazed, as everyone's just told me "no, almost certainly not the motherboard." Thank you for the reply...

    I picked up a PSU, based on my own experience and the latest symptoms, so I'm hoping it was that.

    Would you agree with me that the memory does not seem a likely culprit? If after replacing the PSU, nothing changes, what should I do from there?
     
  13. BoBBYI986

    BoBBYI986 Geek

    Likes Received:
    1
    Trophy Points:
    18
    if you have ran memtest numerous times for hours and it passed everytime, I can't see it being memory. you replaced your graphics card to a Radeon 4870, so it aint graphics card. it could of been that first overclock you did that caused the damage, you claim it was stable but it could of not been, I overclocked my machine and ran sisoft sandra cpu burn in 5x passed with flying colours, but i was experiencin crashes, freezes etc. then i ran prime95 picked up unstability error within first 30 seconds of running, had to alter my vcore to make it stable and now everythin is fine. so when overclockin should always run prime95(prime is the king of all cpu/ram stability programs). so it could of caused damage to the motherboard with it being unstable, caused possible malfunctions on chipset controllers.

    if you replace psu and no luck. I would say its most likely motherboard. but you should awlays go for the cheapest option first, ram, psu etc and work from there.

    BTW: I also noticed you pushed your FSB to far when overclocking using a 1700mhz FSB your board handles upto 1600mhz, so that could also be another reason why its hard crashing, most likely damaged your board.
     
  14. Dwarfer

    Dwarfer Guest

    if your motherboard fails to detect a drive from your raid then its your ICH9R raid controller, remove raid or buy another raid controller
     

Share This Page