Sysinternals Freeware - Mark Russinovich & Bryce Cogswell

Mark's Sysinternals Blog

Unkillable Processes

Have you ever terminated an application only to see in your favorite task manager (Process Explorer, of course) that the process still exists? Or have you tried logging out or shutting down only to have the logoff or shutdown stall indefinitely for no apparent reason? These scenarios are usually the result of buggy device drivers that don’t properly handle the cancellation of outstanding I/O requests.

Over the last few years I’ve developed a tool called Notmyfault that demonstrates a number of common device driver bugs, including accessing freed memory, overrunning buffers, and leaking memory. The crashes generated by Notmyfault are featured in the crash analysis chapter of Windows Internals book I coauthored with Dave Solomon. I’ve recently added a new error selection, Hang Irp, in order to show the effects of drivers that don’t cancel I/O requests.

When you run Notmyfault and select the Hang Irp bug Notmyfault sends an I/O request into its helper driver, Myfault.sys, that Myfault.sys never completes. The names of the executable and driver reinforce the fact that user-mode code can never directly cause a Windows crash: Notmyfault relies on the Myfault driver to do the dirty work. The Notmyfault thread that issues the request never continues executing because it ends up stuck in the kernel waiting for the I/O request to complete. However, because Notmyfault issues the request from a second thread the UI remains responsive and you can issue other bugs, more hanging IRPs, or try to terminate the process.

Terminating Notmyfault reveals the effect of a hung IRP. Even after you close the Notmyfault window the Notmyfault process still shows in Process Explorer’s process list. Logging off and back in, even into a different account, does not cause the zombied process to exit. So what’s going on under the hood? If you’ve configured Process Explorer to take advantage of Microsoft’s symbol support (steps for doing so are documented in Process Explorer’s help file) you can view the stack of the hung thread by double-clicking on the Notmyfault process, navigating to the resulting Process Properties dialog’s Threads tab, and double-clicking on the thread:

A stack reflects a history of subroutine invocation and reads top to bottom from most to least recent. The stack above indicates that Notmyfault called DeviceIoControlFile, which called ZwDeviceIoControlFile. ZwDeviceIoControlFile transitioned into kernel-mode (the frames that are prefixed with “ntkrnlpa.exe”) where the kernel’s system call dispatcher executed NtDeviceIoControlFile. Since the I/O request was synchronous the I/O manager waits for the driver at which the I/O is targeted to complete the request.

When a process terminates the Process Manager performs process rundown, which includes terminating all the threads in the process, closing handles to opened system resources (e.g. files and registry keys) and tearing down the address space of the process. When the Process Manager sees a terminating thread has outstanding I/O requests it informs the drivers processing the requests that the requests should be cancelled. You can see that in the stack as the call to IopCancelAlertedRequest. Because the completion of an I/O request requires access to the address space of the owning thread’s process the system can’t finish tearing down a process until all its I/O requests have completed or cancelled. The I/O Manager has no choice but to wait indefinitely, which you can see in the stack as the call to KeWaitForSingleObject.

If you run across this type of problem in the real world you’ll need to run a kernel debugger to look at the outstanding I/O requests of any hung threads and the determine driver that owns them. If the system is hung you need to debug it from a second computer running a kernel debugger. Since the system as a whole isn’t hung when you create a hung thread with Notmyfault you can use local kernel debugging with LiveKd or, if you’re running Windows XP or higher, the Windows Debugging Tools for Windows built-in local kernel debugging. If you’ve never used a kernel debugger the easiest approach is to download the Debugging Tools for Windows and then run Livekd from the directory in which you install the tools.

The first kernel debugger command to execute is one to look at the hung process and its threads. Look at the IRP List area, which a list of outstanding I/O requests, of any threads that are listed. Here’s the command to dump hung process and partial output that includes the IRP list for the Notmyfault thread:

kd> !process 0 7 notmyfault.exe
PROCESS 8183ad18 SessionId: 0 Cid: 02dc Peb: 7ffdf000 ParentCid: 04e4
DirBase: 08b40280 ObjectTable: e107cd10 HandleCount: 23.
Image: NotMyfault.exe
VadRoot 817d8d68 Vads 44 Clone 0 Private 98. Modified 1. Locked 0.

THREAD 81810560 Cid 02dc.02e4 Teb: 7ffdd000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
81821d0c NotificationEvent
IRP List:
: (0006,0094) Flags: 40000000 Mdl: 00000000

The next step is to look at the IRP (I/O Request Packet) or IRPs you find:

kd> !irp 82370f68
Irp is active with 1 stacks 1 is current (= 0x82370fd8)
No Mdl Thread 81810560: Irp stack trace.
cmd flg cl Device File Completion-Context
>[ e, 0] 5 0 8172daa8 81821cb0 00000000-00000000
*** ERROR: Module load completed but symbols could not be loaded for myfault.sys
Args: 00000000 00000000 83360020 00000000

The output reports that \Driver\Myfault, the internal name of the Myfault driver, owns the IRP and is therefore the driver that’s guilty of not completing the I/O and not responding to the system’s cancellation request. The error regarding missing symbols for myfault.sys is expected since Microsoft only stores symbols for its own drivers and components.

The reason that the Notmyfault bug does not result in logoff or shutdown hangs is that the system doesn’t care if user applications really terminate during either of those activities. As long as the TerminateProcess API returns success, which it does for such zombie processes, the system is happy. However, if Explorer or one of the core system processes gets into a zombie state the system will be effectively hung.

posted by Mark Russinovich @ 3:52 PM

How useful is this? :)

!process 0 7 cmd.exe yields a _massive_ output; several threads who have a couple of IRPs each.

Checking an IRP at random:
0: kd> !irp 870e67c8
Irp is active with 7 stacks 7 is current (= 0x870e6910)
No Mdl Thread 8870b720: Irp stack trace.
cmd flg cl Device File Completion-Context
>[ c, 2] 1 1 88f89760 887055a8 00000000-00000000 pending
Args: 00000020 00000003 00000000 00000000

Basically I have a server (with an Adaptec SATA RAID controller) which freezes (no BSOD :( ) whenever SQL Server Agent runs certain jobs (unless I break my RAID-1 mirror). So I read your blog entry and set out to see what my system looks like when I believe it is sane, but already I spot a pending IRP stuck in \FileSystem\Ntfs... Is this normal? (my guess: Ntfs perhaps handles standard input as well, so it is waiting for me to type something)

A follow-up blog entry on how to seperate the good from the bad (and the ugly) would be much appreciated. :)

(BTW: I had to create a blog on to submit this comment...)

Using the older NT4 syntax (!process 870b2160 7) yields a much more comprehensible output... :) (Win2k Server SP4)

Strange, but atleast now I can see if I can get closer to the problem at hand. Thx.

The syntax I specified is best if you are targetting a unique process. You don't have to dump all the processes (!process 0 0) looking for the one you're interested in to pass to a "!process [process object address] 7" command.
Are there any third party applications that can effectively kill and "unkillable" process?
Rune, use the !irpfind extension to display all the allocated IRPs in the system. Then you can use !irp to display more detail. The steps Mark used were to display IRPs for a certain process in order to demonstrate a specific event. With !irpfind you can take a step back and see the whole picture.
Thanks Dan and Mark.

Turns out that my efforts were moot (so far). I used livekd to launch the debugger on my server, and executed .server tcp:port=7979 so that I could hook up to it using tcp.

But when it froze, both Remote Desktop and my remote debugging session dies (both sockets are still connected though, and remote desktop managed to paint three lines from Task Manager before it finally threw in the towel)

I think I'll try your troubleshooting forum. :)

The article says that user-mode code cannot directly crash the system. This is not true. Under Windows XP SP2, this may happen if wrong parameters and/or parameter mask are supplied to NtRaiseHardError(). This may be done by ANY application and ANY user,regardless of user account privileges - Windows XP SP2 is not as secure as Microsoft claims.

Anton Bassov
Do you have sample code that demonstrates that?
Hey Mark. Not related per se, but is it possible to add something to process explorer so you can track the "Desktop Heap" ? I had a problem recently with a machine with 2gb of memory running out of this resource, and I had a hard time tracking it down.

Mark, being a newbie I was very interested in the topic, i loaded NotMyFault ran the Hang IRP and followed the discussion. Now I can't get rid of the process. is there a way ... please bail my sorry ... out. thx.
Very intersting problem, at first, when I've read this, I thought it was bizzar and almost impossible to replicate without the propper tool (Notmyfault).

Not only all those comments scared me, but I've also found one unkilable process my self today.

On a Win2K server, the IE frooze downloading a 23Kb file...

I found my self completely undable to stop it using kill.exe, taskmgr.exe, procexp.exe, pskill.exe or logout (restart).

Not even the kd> was able to show me what was going on.

I finally gave up and pressed the cold reset.

Hopefully the windows came back.

I and trying to reproduce the problem now to further study and understand what went wrong (I have captured the output from pslist and psservices) but with no success yet.

It might me something else the Notmyfoault does, but it smells awlful.

Good luke for us all, I'll come back when I have news.
is there any way to fix this problem, now that you've recognized that's the case? how would you go about identfying the buggy driver?
This post has been removed by the author.
This post has been removed by the author.
I've encountered a case that leads to two processes being stuck in NT I/O manager. One was writing a file sequentially, second was reading same file with retries if no data. Reader had the file opened in unbuffered mode. After a while, maybe few 100 MB, deadlock. No way to kill either process.
As far as user code causing BSOD, that can be done as follows: allocate two blocks of memory using VirtualAlloc but have the two blocks form a contiguous block of memory. Do unbuffered file i/o to or from this memory and you get BSOD.
It would be awesome if you all were to add to or create entries for Windows and windows programs in Bugzilla and thereby people can track, debug, and issue "unofficial" hotfixes to prevent these things.

I personally, avoid 99% of problems because I removed IE from WinXP using "nLite". Well, about 95% of IE is removed, leaving only the core components, but doing so has broken a few sloppily-written applications like GameSpy Arcade (stand-alone version). You can also remove Outlook, MSN, Windows Messenger, and many other useless windows junk like the "tour" file that's 30 MB alone. Or the useless MSOOBE directory that sits there eating up space after you've already activated Windows XP.

Someone contact Black Viper.
User mode application can't crash OS?

Kernel Complete Dump File: Full address space is available

Symbol search path is: ....
Executable search path is: ...
Windows Server 2003 Kernel Version 3790 MP (2 procs) Free x86 compatible
Product: LanManNt, suite: Enterprise TerminalServer
Built by: 3790.srv03_rtm.030324-2048
Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
Debug session time: Tue Mar 14 17:01:26.312 2006 (GMT+3)
System Uptime: 0 days 0:26:21.031
Loading Kernel Symbols
Loading User Symbols
Loading unloaded module list
* *
* Bugcheck Analysis *
* *

Use !analyze -v to get detailed debugging information.

BugCheck 50, {86000000, 0, 805d54fa, 0}

Probably caused by : win32k.sys ( win32k!Win32HeapFree+10 )

Followup: MachineOwner

1: kd> !analyze -v
* *
* Bugcheck Analysis *
* *

Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arg1: 86000000, memory referenced.
Arg2: 00000000, value 0 = read operation, 1 = write operation.
Arg3: 805d54fa, If non-zero, the instruction address which referenced the bad memory
Arg4: 00000000, (reserved)

Debugging Details:

READ_ADDRESS: 86000000

805d54fa 0b4e10 or ecx,[esi+0x10]





LAST_CONTROL_TRANSFER: from 8052f640 to 805435b9

f5f3ba78 8052f640 00000050 86000000 00000000 nt!KeBugCheckEx+0x19
f5f3bac8 804e2dfc 00000000 86000000 00000000 nt!MmAccessFault+0x796
f5f3bac8 805d54fa 00000000 86000000 00000000 nt!KiTrap0E+0xc8
f5f3bbc4 bf90225c 85fffff0 00000000 be24e868 nt!RtlFreeHeap+0x2d
f5f3bbd4 bf8ce56a 85fffff0 be24e868 bf8ce657 win32k!Win32HeapFree+0x10
f5f3bbe0 bf8ce657 82e503d0 be24e868 bc215584 win32k!ClassFree+0x1b
f5f3bbfc bf903048 bc215584 00000001 bc1e0498 win32k!DestroyClass+0xa7
f5f3bc0c bf901def bc215500 81d0e020 00000000 win32k!DestroyProcessesClasses+0x14
f5f3bc54 bf8ebce3 00000001 bf8ebd2b 81d0e020 win32k!xxxDestroyThreadInfo+0x252
f5f3bc5c bf8ebd2b 81d0e020 00000001 00000000 win32k!UserThreadCallout+0x48
f5f3bc74 805922d0 81d0e020 00000001 81d0e020 win32k!W32pThreadCallout+0x37
f5f3bd0c 8059a73e 00000000 00000000 81c25020 nt!PspExitThread+0x3a2
f5f3bd24 805cf3c8 81d0e020 00000000 00000001 nt!PspTerminateThreadByPointer+0x49
f5f3bd54 804dfd24 00000000 00000000 00000000 nt!NtTerminateProcess+0x136
f5f3bd54 7ffe0304 00000000 00000000 00000000 nt!KiSystemService+0xd0
0012fdd4 77f43617 77e4f257 ffffffff 00000000 SharedUserData!SystemCallStub+0x4
0012fdd8 77e4f257 ffffffff 00000000 00000000 ntdll!NtTerminateProcess+0xc
0012fec8 77e4f1f6 00000000 77e8f3b0 ffffffff kernel32!_ExitProcess+0x57
0012fedc 7c348d03 00000000 7c3476c9 00000000 kernel32!TerminateProcess
0012fee4 7c3476c8 00000000 00000000 00000000 MSVCR71!__crtExitProcess+0x2e [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 463]
0012ff14 7c348d11 00000000 00000000 00000000 MSVCR71!doexit+0xab [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 414]
0012ff24 004b15be 00000000 00000000 00000000 MSVCR71!exit+0xd [f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 303]
0012ffc0 77e4f38c 00000000 00000000 7ffdf000 MyApplication!wWinMainCRTStartup+0x1b1 [f:\vs70builds\3077\vc\crtbld\crt\src\crtexe.c @ 406]
0012fff0 00000000 004b140d 00000000 78746341 kernel32!BaseProcessStart+0x23


bf90225c 0fb6c0 movzx eax,al




SYMBOL_NAME: win32k!Win32HeapFree+10


IMAGE_NAME: win32k.sys


FAILURE_BUCKET_ID: 0x50_win32k!Win32HeapFree+10

BUCKET_ID: 0x50_win32k!Win32HeapFree+10

Followup: MachineOwner

Of cause, I've checked the memory. There isn't a problem.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?

RSS Feed



Full Blog Index

Recent Posts

Running Windows with No Services
The Case of the Periodic System Hangs
Popup Blocker? What Popup Blocker?
An Explosion of Audit Records
Buffer Overflows in Regmon Traces
Buffer Overflows
Running Everyday on 64-bit Windows
Circumventing Group Policy Settings
The Case of the Mysterious Locked File
.NET World Follow Up


03/01/2005 - 03/31/2005
04/01/2005 - 04/30/2005
05/01/2005 - 05/31/2005
06/01/2005 - 06/30/2005
07/01/2005 - 07/31/2005
08/01/2005 - 08/31/2005
09/01/2005 - 09/30/2005
10/01/2005 - 10/31/2005
11/01/2005 - 11/30/2005
12/01/2005 - 12/31/2005
01/01/2006 - 01/31/2006
02/01/2006 - 02/28/2006
03/01/2006 - 03/31/2006
04/01/2006 - 04/30/2006
05/01/2006 - 05/31/2006
07/01/2006 - 07/31/2006

Other Blogs

Raymond Chen
Dana Epp
Aaron Margosis
Wes Miller
Larry Osterman
Bruce Schneier
Larry Seltzer