Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.

Debugging a crash using WinDbg

I've finally managed to replicate a OI crash on demand with a very specific sequence of steps! (Usually these crashes appear to be quite random).
- The crash occurs at the same point with SRPeditTable.ocx v3.0.7, 3.0.8, or SRPcontrols.ocx v4.0.0 as the Faulting Module. This doesn't occur with SRPeditTable v3.0.6 .
- The crash occurs on closing an MDI child window. The OEprofile log shows this occurring during a Utility() function called from End_Window().
- I've replicated on a Windows 10 and a Widows 8.1 machine. On another Windows 7 machine, a crash can still occur but not until closing the MDI frame and in an 'unknown' faulting module, and not with consistency.
- All OLE events are qualified explicitly (ie no ALL_OLES) and asynchronous.

So I've installed WinDbg as Jared suggested the other day on the Revsoft forum and trying to understand how to use this. So I'm posting here as it seems relevant and this is getting quite technical ;).

Here's a portion of the List Loaded Module report (lm command):
start			end			module name
00000000`54b10000 00000000`550c0000 mscorwks
00000000`550c0000 00000000`561ec000 srpcontrols
00000000`59d20000 00000000`59d9f000 vbscript
Does it matter that the start of the address space for srpcontrols is the same as the end of the previous module's address space? These seems rare in the complete module list. Is it suspicious? - too close for comfort?

And here is the Stack Backtrace (k command):
# Child-SP          RetAddr           Call Site
00 00000000`000966e0 00000000`6399b9b9 wow64!Wow64NotifyDebugger+0x1d
01 00000000`00096710 00000000`6399c873 wow64!HandleRaiseException+0x13d
02 00000000`00096c00 00000000`6399cb54 wow64!Wow64NtRaiseException+0x9b
03 00000000`00096c90 00000000`63996245 wow64!whNtRaiseException+0x14
04 00000000`00096cc0 00000000`63a61c87 wow64!Wow64SystemServiceEx+0x155

05 00000000`00097580 00000000`6399bdb2 wow64cpu!ServiceNoTurbo+0xb
06 00000000`00097630 00000000`639a8c36 wow64!RunCpuSimulation+0x22
07 00000000`00097660 00000000`639e2e0d wow64!Wow64KiUserCallbackDispatcher+0x426
08 00000000`00097a70 00007ff8`eff98b94 wow64win!whcbfnNCDESTROY+0xbd
09 00000000`00098450 00000000`63a621bc ntdll!KiUserCallbackDispatcherContinue

0a 00000000`000984d8 00000000`63a6217f wow64cpu!CpupSyscallStub+0xc
0b 00000000`000984e0 00000000`6399bdb2 wow64cpu!Thunk0Arg+0x5
0c 00000000`00098590 00000000`639a8c36 wow64!RunCpuSimulation+0x22
0d 00000000`000985c0 00000000`639e440a wow64!Wow64KiUserCallbackDispatcher+0x426
0e 00000000`000989d0 00007ff8`eff98b94 wow64win!whcbfnDWORD+0x21a
0f 00000000`000993d0 00000000`63a621bc ntdll!KiUserCallbackDispatcherContinue

10 00000000`00099458 00000000`63a6217f wow64cpu!CpupSyscallStub+0xc
11 00000000`00099460 00000000`6399bdb2 wow64cpu!Thunk0Arg+0x5
12 00000000`00099510 00000000`639a8c36 wow64!RunCpuSimulation+0x22
13 00000000`00099540 00000000`639e440a wow64!Wow64KiUserCallbackDispatcher+0x426
14 00000000`00099950 00007ff8`eff98b94 wow64win!whcbfnDWORD+0x21a
15 00000000`0009a350 00000000`63a621bc ntdll!KiUserCallbackDispatcherContinue

16 00000000`0009a3d8 00000000`63a6217f wow64cpu!CpupSyscallStub+0xc
17 00000000`0009a3e0 00000000`6399bdb2 wow64cpu!Thunk0Arg+0x5
18 00000000`0009a490 00000000`639a8c36 wow64!RunCpuSimulation+0x22
19 00000000`0009a4c0 00000000`639e440a wow64!Wow64KiUserCallbackDispatcher+0x426
1a 00000000`0009a8d0 00007ff8`eff98b94 wow64win!whcbfnDWORD+0x21a
1b 00000000`0009b2d0 00000000`639f3844 ntdll!KiUserCallbackDispatcherContinue

1c 00000000`0009b358 00000000`639e5aa2 wow64win!NtUserMessageCall+0x14
1d 00000000`0009b360 00000000`639e5ef8 wow64win!whNT32NtUserMessageCallCB+0x32
1e 00000000`0009b3b0 00000000`63996245 wow64win!whNtUserMessageCall+0x128
1f 00000000`0009b470 00000000`63a61c87 wow64!Wow64SystemServiceEx+0x155
20 00000000`0009bd30 00000000`6399bdb2 wow64cpu!ServiceNoTurbo+0xb
21 00000000`0009bde0 00000000`639a8c36 wow64!RunCpuSimulation+0x22
22 00000000`0009be10 00000000`639e440a wow64!Wow64KiUserCallbackDispatcher+0x426
23 00000000`0009c220 00007ff8`eff98b94 wow64win!whcbfnDWORD+0x21a
24 00000000`0009cc20 00000000`639f3844 ntdll!KiUserCallbackDispatcherContinue

25 00000000`0009cca8 00000000`639e5aa2 wow64win!NtUserMessageCall+0x14
26 00000000`0009ccb0 00000000`639e5ef8 wow64win!whNT32NtUserMessageCallCB+0x32
27 00000000`0009cd00 00000000`63996245 wow64win!whNtUserMessageCall+0x128
28 00000000`0009cdc0 00000000`63a61c87 wow64!Wow64SystemServiceEx+0x155
29 00000000`0009d680 00000000`6399bdb2 wow64cpu!ServiceNoTurbo+0xb
2a 00000000`0009d730 00000000`639a8c36 wow64!RunCpuSimulation+0x22
2b 00000000`0009d760 00000000`639e440a wow64!Wow64KiUserCallbackDispatcher+0x426
2c 00000000`0009db70 00007ff8`eff98b94 wow64win!whcbfnDWORD+0x21a
2d 00000000`0009e570 00000000`639f3844 ntdll!KiUserCallbackDispatcherContinue

2e 00000000`0009e5f8 00000000`639e5aa2 wow64win!NtUserMessageCall+0x14
2f 00000000`0009e600 00000000`639e5ef8 wow64win!whNT32NtUserMessageCallCB+0x32
30 00000000`0009e650 00000000`63996245 wow64win!whNtUserMessageCall+0x128
31 00000000`0009e710 00000000`63a61c87 wow64!Wow64SystemServiceEx+0x155
32 00000000`0009efd0 00000000`6399bdb2 wow64cpu!ServiceNoTurbo+0xb
33 00000000`0009f080 00000000`6399bcc0 wow64!RunCpuSimulation+0x22
34 00000000`0009f0b0 00007ff8`eff7fca1 wow64!Wow64LdrpInitialize+0x120
35 00000000`0009f360 00007ff8`effb166d ntdll!LdrpInitializeProcess+0x176d
36 00000000`0009f750 00007ff8`eff66d5e ntdll!_LdrpInitialize+0x4a8b9
37 00000000`0009f7d0 00000000`00000000 ntdll!LdrInitializeThunk+0xe
There's no mention of SRPcontrols at all! But ntdll seems to be recursing until it finally crashes.

Are there any clues here on what I can look for in our application, or in WinDbg?

Cheers, M@

Comments

  • Matt,

    My compliments and gratitude for engaging in such extensive efforts to dig deep into this issue. I am going to have to let Kevin review and comment on the technical bits in your post. However, if you are able to replicate this consistently, can you provide us with the steps or an RDK that replicates the experience? There is probably no better way for us to troubleshoot than to duplicate the crash in-house and work on a solution.
  • edited March 2016
    The callstack dump, unfortunately, doesn't tell us much. It might not even be a fatal loop since those low level methods are often called when transitioning between threads or when OI is calling an OLE method. The problem is that WINDBG doesn't have access to any debug information inside our controls. I'd love it if you'd be willing to give something a try as I haven't had a chance to test this myself. (I'm perpetually in a different environment here since I always have Visual Studio installed.)

    Download this debug file for our controls along with the latest pre-release of SRPControls.ocx (4.0.1 RC1). Place them into the same directory and re-register the OCX. Then do the same steps you did before, with WINDBG, and see if we get more information in the dump. If not, then we'll have to fall back to Don's suggestion of an RDK and/or specific steps to reproduce the problem.
  • Yes, I'm quite willing to do what I can to help get a better understanding :).

    Thanks for the downloads, Kevin. I can still replicate the crash and get a dump file. I can't see any extra info in the lm or k commands though - but I'm not sure if I'm using this correctly. Is WinDbg supposed to be running alongside OI in realtime? - if so, I'm not sure how.
    I've uploaded the dump file in case there's anything you might be able to glean from it.

    I've been trying for several months now to reduce the issue to its core so that I could send you a simple RDK, but I don't think that's going to work. I could send you the whole application with instructions/video on how to replicate, but I suspect the operating environment is also a factor. Our current application release uses SRPeditTable v3.0.8 but we have some customers reporting crashes and others not - and it doesn't appear to be as simple as Windows version.
    Alternatively, now that I have WinDbg installed and the .pdb file, it may be easier if we have a shared remote session where I can replicate and you can use WinDbg?? Let me know either way.

    Cheers, M@
  • I was under the impression you were running WinDbg alongside OI already. It's not a path worth spending too much time on. The main issue is that I need to recreate it in my environment so I can see the offending code first hand.

    Are you only using SRPEditTable.ocx? Or do you also use SRPControls.ocx? Because we don't support both. One always overrides the others and could lead to unexpected behavior.

    I am sympathetic to the fact that you cannot create an RDK. If you have a way for me to download a copy of the system, I'm happy to walk through the steps on my end. With inconsistent crashes like these, I feel like it would be the shortest path to isolating the problem and getting a fix.

    Let me know what you think.
  • I've tried both SRPEditTable.ocx and SRPControls.ocx separately. That is, when I test SRPEditTable I don't have SRPControls registered at all, and vise-versa. Only SRPEditTable is out in the wild at this stage, and I'm just using SRPControls in house.

    Ok, I'll prepare a system you can download and do a video on how to crash it.

    Cheers, M@
  • edited March 2016
    Ok, I think I've worked out how to run WinDbg alongside OI. The following is the WinDbg output and call stack at the crash point:
    ModLoad: 00000000`0af40000 00000000`0af5b000   C:\TEMP\SRPcontrols_Crash.dmo\V119.dll
    HEAP[OINSIGHT.exe]: HEAP: Free Heap block 0C34CE10 modified at 0C34E410 after it was freed
    (190.1c98): WOW64 breakpoint - code 4000001f (first chance)
    First chance exceptions are reported before any exception handling.
    This exception may be expected and handled.
    ntdll_77830000!RtlpBreakPointHeap+0x19:
    7790ee8a cc int 3

    0:000:x86> k
    # ChildEBP RetAddr
    00 0019d0c0 778c5bef ntdll_77830000!RtlpBreakPointHeap+0x19
    01 0019d1c0 7786b4c8 ntdll_77830000!RtlpFreeHeap+0x59bef
    02 0019d1ec 7790e1fb ntdll_77830000!RtlFreeHeap+0x268
    03 0019d24c 7786cc3e ntdll_77830000!RtlDebugFreeHeap+0x212
    04 0019d350 7786b4c8 ntdll_77830000!RtlpFreeHeap+0xc3e
    *** WARNING: Unable to verify checksum for C:\Program Files (x86)\Vernon Systems\VCMSclient\srpcontrols.ocx
    05 0019d37c 570205e5 ntdll_77830000!RtlFreeHeap+0x268
    06 0019d390 56efdf74 srpcontrols!free+0x1a [f:\dd\vctools\crt\crtw32\heap\free.c @ 51]
    07 0019d39c 56b2c22a srpcontrols!CMemFile::Free+0xb [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\filemem.cpp @ 118]
    08 0019d3c4 56d2d1c0 srpcontrols!CColorFill::~CColorFill+0x13a [w:\common files\colorfill.cpp @ 77]
    09 0019d3ec 56c0e458 srpcontrols!CSRPOleBase::~CSRPOleBase+0x180 [w:\srp projects\srp activex controls\srpcontrols\srpolebase.cpp @ 114]
    0a 0019d41c 56c0db1b srpcontrols!CSRPEditTableCtrl::~CSRPEditTableCtrl+0x918 [w:\srp projects\srp activex controls\srpedittable\srpedittablectrl.cpp @ 547]
    0b 0019d428 56eef212 srpcontrols!CSRPEditTableCtrl::`scalar deleting destructor'+0xb
    0c 0019d434 56eebdaf srpcontrols!CCmdTarget::OnFinalRelease+0x2c [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\cmdtarg.cpp @ 549]
    0d 0019d45c 56f2ef05 srpcontrols!CCmdTarget::InternalRelease+0x3a [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\oleunk.cpp @ 178]
    *** WARNING: Unable to verify checksum for image00000000`00400000
    *** ERROR: Module load completed but symbols could not be loaded for image00000000`00400000
    0e 0019d464 004427bf srpcontrols!COleControl::XQuickActivate::Release+0x11 [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\ctlquick.cpp @ 68]
    WARNING: Stack unwind information not available. Following frames may be wrong.
    0f 00000000 00000000 image00000000_00400000+0x427bf
    Does this shed any light?? Is there anything more I can look at?

    Cheers, M@
  • Thank you so much for all your effort. The WinDBG didn't help in this particular case, but that doesn't mean it was a wasted effort. It just so happened that while the crash was happening in the place identified by WinDBG, it was due to problems initiated elsewhere. In other words, your ability to employ WinDBG could be extremely helpful in the future.

    Your copy of the system along with steps to repeat were the key. It was a tough fix with a lot of trial and error, but I think I nailed it. Try 4.0.1 RC3 and let me know if things are improved on your end.
  • Brilliant! That's done the trick on my machine. We'll do some wider range of testing on other machines over time and hopefully you've solved a general case issue.

    Yes, the WinDbg can be useful - particularly for determining which control within SRPcontrols is not happy. That then gives us something to target in our code for resolution.

    Thanks very much for working on this! This one has been a small straw in a haystack.

    Cheers, M@
Sign In or Register to comment.