Debugging Blue Screen of Death (BSOD) problems

Debugging Blue Screen of Death (BSOD) problems

The dreaded Blue Screen of Death (BSOD) has existed at least as long as Windows, and baffles even fairly advanced computer users. It is usually triggered by low-level errors in the CPU, which in turn are usually caused by driver problems (e.g. illegal pointer references and such) or hardware problems (e.g. deffective RAM or CPU overheating). Usually the errors listed are not very descriptive, just one of the problems in actually making use of them. Whenever I've begun experience frequent BSODs, I've always just formatted my computer and reinstalled Windows, thinking that whatever is happening is probably a software or driver issue so obscure I'd never track it down. However, during the most recent outburst of BSODs on my new laptop (a 10 month old Dell), I decided to take a serious crack at analyzing the BSOD details.

 

Blue Screen of Death

 

 

What follows is an outline of what I did, including the results. This is not meant to be an extensive debugging guide but rather a mix of general techniques and specific steps that will likely shed a little more light on the situation. Note that the instructions are all written for Windows 7 (specifically Professional x64), but should extend easily to at least Windows XP.

 

My Problem

My BSOD problem was encountered on a 10 month old Dell Inspiron 1564 with a Core i3-m330 processor, 4 GB RAM, 500GB HDD, 1GB ATI Radeon HD4330, running Windows 7 Pro 64bit (from MSDNAA). My last clean install of Windows was about 8 months prior to the BSODs beginning, and they in fact started immediately after I flew with my laptop though it now looks like this was pure coincidence.

 

I began encountering BSODs on a daily basis, usually within 1-2 hours of reboot, sometimes in as little as 5 mins. There was no clear correlation with any particular program; I uninstalled a couple of recently installed programs and printer drivers to no avail. Sometimes a BSOD seemed to follow an event, like it would happen as I opened a new tab in Firefox, other times I could reboot, do nothing, and get one just with my computer sitting idle for a while.

 

I'll return to the saga of my BSODs throughout this post.

 

Hardware versus Software BSODs

The most common explanation for BSODs I've seen in online forums is defective RAM, regardless of the actual BSOD error codes. I suspect that in reality RAM is rarely the cause, particularly because in most cases, people are experiencing them on a computer that has previously worked fine for a long time (months to years); defects in RAM and other HW are likely to be discovered very early in your computer's life span. Test solutions recommended are to either run Windows' memory diagnostics overnight or Memtest 86. I tried both and everything came out clean.

 

A quick test that can largely rule out hardware issues if it passes is to simply run your computer in Safe Mode for a while. For example, if you're reliably getting a BSOD within 1-2 hours run time in normal mode, reboot into Safe Mode (Windows should give you the option after a reboot after a BSOD). Try Safe Mode with Networking. This will allow you to do most tasks (e.g. internet browsing and such) in the meantime. In this mode, try using your computer as normally as you can, and for way longer than it usually takes to hit a BSOD. If the problem is defective RAM or CPU overheating or some other HW issue, it should still crop up in Safe Mode. If you can run in Safe Mode indefinitely and not have problems, this is a good indication (though not proof) that your problem is software related, and specifically related to one or more drivers or background programs that are disabled in Safe Mode.

 

Dump Files

So, if the Safe Mode test passes, the next thing you want to try is analyzing the Windows "Dump Files." When a BSOD occurs, Windows saves a Dump file to disk before shutting down. This is basically a record of some portion of the computer's memory (RAM) right before shutdown and tends to include information about which drivers were loaded, what threads were running, some of the last operations before the crash, etc.

 

There are several different dump file options you can set, as well as options for what Windows does when a BSOD occurs. To change the options, in Windows 7, right click on My Computer, then go Advaced system settings > Advanced > Startup and Recovery > Settings and look under the heading System Failure. You have four things you can adjust:

 

  • Write an event to the system log. Checked by default, but not very useful.
  • Automatically restart. May be useful. By default, Windows reboots after a BSOD. So if your situation is that you frequently come back to your computer and find it has rebooted inexplicably, uncheck this. That way, the BSOD stays on screen until you hit the power button to reboot your computer. This allows you to look at the error codes, but more importantly, lets you know you're getting BSODs as opposed to some other type of reboot (e.g. automatic SW update resets).
  • Write Debugging information. This is the important option. The options are none (no dump file written), Kernel memory dump (a minimal dump file is written), or Small memory dump (more detailed). Set this to Small memory dump. You can also set an alternate location for the dump files, but the default of %SystemRoot%\Minidump is fine. If you don't know where this leads, just click on the Windows Start button and paste this into the search field; this will open a file browser at that location. 

 

Now, the next time you get a BSOD, Windows will save a dump file.

 

Note: for my problem, I was not able to diagnose it based on the Kernel dump file, but after changing to Small memory dump, I got enough info to diagnose. Also, in some versions of Windows there may be an option for full memory dump; this is not recommended as it saves your entire RAM to disk, i.e. for my computer, this could take up to 4GB of disk space per dump file.

 

Analyzing Dump Files

Once you have a record of what happened right before a BSOD in the form of a dump file, you want to "read" this file. Unfortunately, these can't be read in a normal text viewer so you need to install a BSOD reader of some kind. Two popular options are WinDdg and Blue Screen View. I used WinDbg.

 

WinDbg is part of the Microsoft Windows SDK, and you can get it here. If that link doesn't work, search for Microsoft Windows SDK and find a version compatible with your version of Windows. You can find basic installation instructions here. Those tell you how to install and configure WinDbg so that it can access the necessary debugging symbols online. Note that you probably won't be able to install it in Safe Mode in Windows 7; I tried this and failed, so you need to reboot into full mode at least to do the install. If you can't run in full mode long enough to even install the SDK, you may want to try Blue Screen View instead.

 

More detailed instructions on using WinDbg are available here, but the basic steps are as follows:

  • Once WinDbg is installed, open it via Start > All Programs > Debugging Tools for Windows > WinDbg.
  • Choose File > Symbol File Path and enter "SRV*c:\symbols*http://msdl.microsoft.com/download/symbols" (without the quotes).
  • Now open your minidump file by choosing File > Open Crash Dump. Remember that the location is %SystemRoot%\Minidump and you can type this directly into the File name field to go there. Also notice the naming convention of mmddyy-xxxxx-xx.dmp which should make it clear what is your most recent dump file.
  • Choose no to saving workspace.

 

Now WinDbg will show you some output including a BugCheck code and a "Probably caused by" field.

 

In my case the output looks like:

 

BugCheck 7F, {8, 80050031, 6f8, fffff80003249ec0}

Unable to load image aswTdi.SYS, Win32 error 0n2
*** WARNING: Unable to verify timestamp for aswTdi.SYS
*** ERROR: Module load completed but symbols could not be loaded for aswTdi.SYS
Probably caused by : NETIO.SYS ( NETIO!CompareSecurityContexts+6a )

 

You can click the analyze -v link or type that string into the field at the bottom of the debug window to get verbose output (nerd speak for more details). Mine looks like:

 

Debugging Details:
------------------

BUGCHECK_STR:  0x7f_8

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  2

LAST_CONTROL_TRANSFER:  from fffff80003280ca9 to fffff80003281740

STACK_TEXT:  
(lots of stack text here)

STACK_COMMAND:  kb

FOLLOWUP_IP:
NETIO!CompareSecurityContexts+6a
fffff880`0154dc5a 448b442470      mov     r8d,dword ptr [rsp+70h]

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  NETIO!CompareSecurityContexts+6a

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: NETIO

IMAGE_NAME:  NETIO.SYS

DEBUG_FLR_IMAGE_TIMESTAMP:  4bbe946f

FAILURE_BUCKET_ID:  X64_0x7f_8_NETIO!CompareSecurityContexts+6a

BUCKET_ID:  X64_0x7f_8_NETIO!CompareSecurityContexts+6a

 

If you've read this far, and looked at some other How-Tos about debugging BSODs, you're probably wondering what new points I'm bringing to the table, and here they are (don't get too excited). Most of the guides I've seen so far at this point either direct to you to a laundry list of stop codes and their supposed meanings (usually very ambiguous or just meaningless to anyone other than computer HW engineers), or they'll just tell you that any bug check code means RAM errors and you should start swapping your RAM in and out of various slots and other computers to test it.  

 

What I recommend you do at this point is copy the Bugcheck string and some or all of the probable cause string and just google it. Chances are someone else, or in fact many other people have already had the same problem and come to some concensus about its meaning in forums, which will point you to the problem SW or HW.

 

For my error, I googled "0x7f_8 NETIO.SYS" without quotes. I immediately found many forums posts with the same problem, and they all pointed towards ZoneAlarm as the culprit and recommended uninstalling it. I did so and haven't had a BSOD since (about 4 days and counting now).

 

Conclusion

So, to quickly review, if you're having frequent BSODs, a good plan of attack is:

  • Try running in Safe Mode with Networking. If you seem to be able to do so indefinitely, you're probably (though not definitely) not looking at a HW issue. If you still get BSODs, dial it down another notch by running in just Safe Mode.
  • Turn on Small memory dump saving
  • Get WinDbg and analyze your dump files. If you have several or many, analyze a handful of recent ones to see if the errors are consistent or not.
  • Google the stop code and probable cause strings that WingDbg gives you to see if there's a consensus in forums about the guilty driver/ SW/ device, and try uninstalling/ disabling/ removing said item. If you don't get a good result with your search, try adding or removing some of the details from WinDbg.

 

Thanks to user Adrynalyne at Majorgeeks for this post which provided most of the details on installing and configuring WinDbg.

 

Hope that helps!

 

Add new comment

Guest

  • Web page addresses and email addresses turn into links automatically.