PC Temperature and Blue Screens of Death (BSOD)


Contents:
Overview ---- Body ---- Conclusion

1Overview
The Problem

Ordinarily I write about things directly related to CAD, AutoCAD, AutoCAD Architecture or other things related to software for architecture, but this article will be a little different.  I recently had one of those computer experiences where the system becomes so unstable that it is basically unusable and you are left looking at a stupid blue screen of death wondering what the $#@@ those failure codes are supposed to mean.  I have been working with CAD on PC's since DOS so I figure that I know a little about what goes on inside the case of a computer.  I even got to the point where I could build one from parts; though they never ran quite as good as professional systems.  When I got older and life became more complicated, I stopped keeping up with the inner workings of the latest machines beyond what became necessary to keep them going.  Over the last decade I have noticed that personal computers, including Mac's, appear to be failing earlier and earlier in their life cycles.  Hard drive failures occur at what I would consider an alarming rate and I would even go so far as saying that PC's should come with two drives in Raid configurations by default.  I don't know if there are problems with the Chinese manufacturing quality or that greater complexity is just leading to greater failure rates but when you lose a $1200 Nvidia Quattro card four years after buying it, it doesn't seem right.

Getting back to the main point of this article, my primary production machine started crashing and going right to the BSOD; leading to a two week adventure in trouble-shooting that I felt I should share in the hopes that it will help someone else.

2Body
My First Step - Save Your Data

The first BSOD occurred while I was testing AutoCAD Architecture 2011's new render material improvements (see story) so I was inclined to think that the software was the cause.  What is that mantra, "coincidence does not imply correlation" or something to that effect.  I actually ignored the first few BSOD's because they didn't occur every time I rendered.  However, one day my system had numerous BSOD's when trying to render just a small test vignette so I tried rendering in an older release and found another BSOD.

By testing two different releases I eliminated the software as a problem because I knew the older version had worked fine in the past.  I also realized that I may be experiencing the beginning of a far greater problem so I told myself that I would work on it the next morning.  I had work to get done today and I didn't need to render anyway.

Wouldn't you know it, the next morning when I attempted to boot up my system it got all of the way to displaying my desktop, I could even move my mouse, and pop right to a BSOD.  I fired up the system again and the same thing happened.  Now panic set in because I realized that I didn't have all of my data backed up.  To me, the first step anyone should take when facing system crashes is to take the steps to back up everything.  If your hard drive is failing, however, you could damage it further by trying to access it (a tough call to make).

I decided to try my best to get to my data and back it up in any way that I could so I restarted my machine and booted to "Safe Mode with Networking".  This seemed to work fine so I pulled out a couple of DVD's and got ready to burn some backup up discs only to discover that the drivers for doing this weren't loaded in Safe Mode.  I had networking running so I figured that I would just transfer everything to another machine but none of my machines had enough space for that.  My external backup drive, less than a 1 year old, is failing (another example of hard drive failure rates) so I didn't trust using it and my piece of junk Dell Precision 350 died over a year ago.  If you have networking and this happens to you, you could look into using one the numerous on-line backup services but even at high speed, pushing through a couple of Gigabytes won't be fun.

I decided to use my network connection to find a new hard drive for my piece of junk Dell Precision 350 so I could not only transfer everything over to it but have a backup system to work on if needed. Pick here for a story on replacing a hard drive on a Dell Precision 350. 

My Second Step

Because I wanted to get my data backed up and I really wanted to get to the heart of my PC's problems, I decided to continue working on this problem. 

I opened the case and took a look at everything.  There was a bit of dust here and there, especially around the numerous fans so I got out a small vacuum and proceeded to do some cleaning.  I also released and reseated all of my cards even though I think it's a silly act.

I fired up the system and decided to see if I could get it to boot in Normal Mode.  It worked.   I immediately stuck in a DVD, formatted it and dragged over my "Clients" folder.  After about fifteen minutes, BSOD.  I get a bit stubborn at times, to my own detriment, so I did this all over again and got the same results.  Then, I tried with a CD and less files which actually worked.  In frustration, I connected to my piece of junk Maxtor external drive and managed to get everything copied over to it.  That drive is failing so it was only a minor comfort but it gave me the courage to really begin the work of trying to figure out why my system was crashing.

My Third Step - Review and Collect Error Codes for Google Searches

One of the biggest problems with a PC crashing problem is that it is almost impossible to remain unemotional about it.  In my youth I once got so mad at a crashing PC that I kicked it really hard, with a steel toed boot, and reset the BIOS.  For a short period in my life, I worked on other people's PC's and found that I was less emotional when it wasn't my own.  I did find that, like me, other people get emotional and start to let anger and frustration drive their decisions.  A PC is a binary machine; it can't get more logical than that so problem solving should be easy.

If I had paid attention, I would have realized that the slight improvement I got out of my system occurred after I opened the case.  I had not replaced the cover figuring that I may need to get back inside it.

So, all emotional and almost certain, based on previous experience, that my problems were based on a Video Card failure I went searching for proof.  My reasoning was also based on the fact that my system started crashing when rendering.

I think in this day and age everyone knows that you can go to the Device Manager in a Windows Operating System and look for those caution or warning icons (see illustration upper right).  You might hope that the cause of something as gruesome as a BSOD is simply the result of a driver problem but in my experience it only gets this easy and obvious if it comes right after you just messed around with a new Video Card or updating your drivers, for example.  Checking the Device Manager is still a good place to start.  I found nothing for my problems here despite the image I show to the right - done just for effect.

Another good tool in Windows is the Event Viewer which displays Warnings and Errors that may have occurred right before a system crash.  The Event Viewer can lead you down a long path of irrelevant issues though so use sound logic when reading the information.  In some cases, for example, one device failure may lead to numerous Warnings and Errors that aren't the cause but the casualty.  In other cases you may find information that states the obvious such as, "The previous system shutdown...was unexpected".   I found numerous error codes that I combined with the error codes I display on my BSOD and used them in numerous Google searches.

The Fourth Step - Pick Hardware or Software to Eliminate

I spent the better part of a day using Google and Microsoft's website in an effort to turn the jumble of error codes I had into something meaningful.  At some point I realized that none of the error codes provided me with something definitive and I still didn't know if my problems were based on hardware or software.  With the right tools, I think is it easier to eliminate hardware as a possible cause of system failures than software.  I haven't checked hardware in many years so I changed my Google searches to things like "ram testing software" and "video card testing software".

Though my custom built Xi Computer is past it's warranty I also decided to see if I could get some tech. support out of them.

Listed below are some of the hardware testing utilities I downloaded and used to test my machine.

Video Card Test

Video Cards are getting more and more sophisticated and thus more demanding in terms of system resources and power (as in power supply).  They can run quite hot and thus have their own potential for cooling issues.  You can even over-clock them like you can with CPU's.

If you suspect your Video Card may be causing failures, you can use a Video Stability Testing program such as the very basic one from http://freestone-group.com/ or check your Video Card manufacturer's website.  For the most sophisticated work with video cards, Nvidia cards specifically, you can get RivaTuner at http://www.guru3d.com/category/rivatuner/ but you will need to know what you are doing in order to use this product.

This site offers great information about troubleshooting Video Cards: http://www.playtool.com/pages/troubleshooting/intro.html

RAM Test

Windows Vista has a Memory Diagnostics Tool that you can use to check your RAM but apparently it isn't as thorough as other system tools on the market.  I found "Memtest86+" to be pretty decent and you can download it here: http://forum.corsair.com/forums/showthread.php?t=38201

The two irritating problems with "Memtest86+" for novices, like me, are that it comes in an .iso format and you have to burn that format to a CD or other bootable device because it runs as a boot utility.  As the website link above states, don't just copy the iso image to your bootable device.

I found  "ISORecorderV3RC1x64.zip" at "http://isorecorder.alexfeinman.com/Vista.htm" and it worked just fine to create the bootable CD I needed from Memtest86+.iso.  If you are reading this and needing something you may already have CD image burning software on your system so check that first.  Also, this utility is for Vista only - there is a 32 bit and a 64 bit version.

Hard Drive Test

Most of the Windows Operating Systems being used these days has a built-in utility to check your hard drive for errors.  There are utilities on the market for this task as well but I have never found the need to use them.  In addition, my system is preconfigured to run a checkdisk and defrag at regular intervals so if these tools were to find something I would have already been alerted.

If you want to run a checkdisk, you can simply use the Properties menu option by right-clicking your a drive in Explorer - see illustration to the right.

 

System Temperature Test

If your system has temperature issues, the most likely candidate for system shutdown is the CPU.  It's amazing how fast a CPU can go from 30° F to 75° F when running at 100%.  CPU's have temperature sensors so they will shut down before being damaged but I don't know if the temperature is the same for all CPU's or if it varies.  My CPU, an Intel Core 2 6700 over-clocked from 2.66 GHz to 2.90 GHz, was showing temperatures of about 74° F before shutting down.  In the BIOS, it was idling at around 43° F.

By the time I had gotten around to testing my GPU and CPU temperatures, a tech. support person at Xi Computer called me and after explaining my problems, he asked me to check the fluid levels in the hoses of my water cooled CPU.  My hoses had some air gaps in them and thus I set out to get some more fluid.  Supposedly this stuff is, "engineered", but I get the feeling it's distilled water.  I ordered some and while I waited for it to arrive, I went to work on my piece of junk Dell Precision 350 because the new hard drive I order for it had arrived.

When the cooling fluid arrived, I took my Xi Computer to a work table and inspected my radiator.  I decided that not only was I going to fill the radiator but I was going to pull it out and completely clean it.  I tell you, I could not believe how solid the dust was across the inside face of the radiator.  I don't even know if any air was getting through it.  It was so thick I had to use a toothbrush to break it free.

http://www.techpowerup.com/realtemp/

3Conclusion
Problem Solved?

After all the cleaning and fluid refilling I felt confident that I had finally solved my shut down issues.  When I ran the machine I checked my system temperature and found that sure enough, it was now running at 30° F and maybe reaching upper 50's at full load.  However, during the day I experienced several BSOD's and I just couldn't believe my bad luck.  For some reason I decided to check my CPU usage as opposed to temperature and I noticed that it was running at 100% even when I wasn't doing anything.  The cause was Explorer.

 

After using Google to see if other people had ever observed Explorer on Vista 64 go crazy like that I found that not only had others experienced it but had actually found that resizing Explorer would fix it.  I decided to try that trick until I noticed that covering up the default MSN.com icon and my own ARCHIdigm.com icon links actually dropped the CPU usage from 100% to 2% or less.  When I deleted both links, the problem was completely solved.  

 

© Copyright 2010 ARCHIdigm. All rights reserved.
Disclaimer and Copyright Information

See Article as Originally Intended