Closed Bug 1222800 Opened 9 years ago Closed 6 years ago

startup crash in mozilla::a11y::ia2AccessibleText::UpdateTextChangeData in Firefox 43

Categories

(Core :: Disability Access APIs, defect)

43 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox43 + wontfix
firefox44 + wontfix

People

(Reporter: philipp, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: a11y:crash-win)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-5229e186-967e-42f6-8183-9c2f22151105.
=============================================================
Crashing Thread
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::a11y::ia2AccessibleText::UpdateTextChangeData(mozilla::a11y::HyperTextAccessibleWrap*, bool, nsString const&, int, unsigned int) 	accessible/windows/ia2/ia2AccessibleText.cpp
1 	kernel32.dll 	BaseThreadInitThunk

looking at crash stats data this is a new signature and startup crash in firefox 43 which is surfacing now that the version has reached beta.

it seems likely that the change in bug 1192330 is at the root of this.
hrm, something is really screwed up here.  the stack claims UpdateTextChangeData was called from thread initialization  functions which clearly makes no sense.  It also appears UpdateTextChangeData is being called off the main thread which makes no sense.  I guess somebody with msvc needs to poke at this :(
[Tracking Requested - why for this release]:
nominating this for tracking as well, since the signature is showing up in the top 10 of the crash score board in beta (#3 in 43.0b6, #8 in 43.0b7 at the moment) and it's a recent regression.
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(surkov.alexander)
Who can take this one?
see comment 1
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to David Bolter [:davidb] from comment #3)
> Who can take this one?

I can install Windows on my machine, but even having everything set up I'm not sure how to approach to this bug. It'd be good to have help from someone experienced with broken/weird stacks on windows.
Flags: needinfo?(surkov.alexander)
I don't see this in the top crashes for beta 8. Philipp do you think it's still a bad problem in beta?
Flags: needinfo?(madperson)
Tracking since it was definitely high up in beta 7 (#22 topcrash).  But I think we should probably end up wontfixing this for 43 unless there is a simple, testable fix.
in the crash score board for beta 7 it is #3 at the moment - as it seems to affect only a small fraction of users, but those repeatedly & on startup: https://crash-analysis.mozilla.com/rkaiser/crash-report-tools/score/?version=43.0b7&limit=30

with such persistent startup crashes it is difficult to assess their impact just by numbers, as affected users will stop using the product soon and be no longer included in subsequent data. in beta 8 there are only 11 of those crashes so far.

also i don't know how well a11y users are represented in the beta population, so the impact might be worse once it hits release.
Flags: needinfo?(madperson)
[Tracking Requested - why for this release]:  44 is affected, but as philipp points out it's a bit hard to tell how badly since once people hit a repeated startup crash they may stop trying to open Firefox. 

Too late to fix in 43. I'll email the crashdebug list about this to find help investigating.
Tracked for now, but if there isn't a fix available soon, given the low volume on this crash, I might end up wontfixing this.
Given that we don't have a fix ready and we are into RC mode, it's too late and this is now a wontfix for Fx44.
I suspect third-party code in this case.

Looking at https://crash-stats.mozilla.com/report/index/0b2b52c3-4082-4a37-9724-9885a2160222, it is the only nightly crash that I could see. FPO is turned off, so we should see a better stack, right?

WRONG! The frame pointer still shows BaseThreadInitThunk as the caller!

That's still not right though -- the signature for BaseThreadInitThunk is:
VOID BaseThreadInitThunk(IN DWORD LdrReserved, IN LPTHREAD_START_ROUTINE lpStartAddress, IN LPVOID lpParameter);

So if we look at the second parameter to BaseThreadInitThunk, we should know the start address of the thread. But in that crash dump, the start address does not point to code that is contained within any loaded executable images -- it's executable VM that has been dynamically allocated by something else!

Unfortunately our crash reports do not include that memory, so there is no way for me to gather clues as to what it's doing or who might have allocated it.

Our best bet is probably to examine module correlations, but since this signature isn't heavy-enough volume to be considered a topcrasher, the correlation scripts aren't being run on it.
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #12)
> Our best bet is probably to examine module correlations, but since this
> signature isn't heavy-enough volume to be considered a topcrasher, the
> correlation scripts aren't being run on it.

KaiRo: Do you know if there is a way to request correlations for a specific signature?
Flags: needinfo?(kairo)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #13)
> (In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #12)
> > Our best bet is probably to examine module correlations, but since this
> > signature isn't heavy-enough volume to be considered a topcrasher, the
> > correlation scripts aren't being run on it.
> 
> KaiRo: Do you know if there is a way to request correlations for a specific
> signature?

Not for signatures that the script doesn't analyze by itself - would be a rather hard undertaking. That said, we have 5 crashes with that signature on Nightly over a week, all from the same Firefox installation, nothing to really create correlations for anyhow. And even the 26 crashes on 44.0.2 are from few installations, probably can even be relatively manually be looked at.
Flags: needinfo?(kairo)
Here's okey stack [1], and it could be it's about another issue than the original one (which is not longer seen btw). However, the stack is still quite confusing: we crash when assigning a static variable [2]. Any ideas?

[1] https://crash-stats.mozilla.com/report/index/ea4ba6b3-f6ac-4662-b103-e7bd10180112
[2] https://hg.mozilla.org/releases/mozilla-beta/annotate/f155e109bb41/accessible/windows/ia2/ia2AccessibleText.cpp#l515
Jamie, if you have a minute, I'd love to hear your thinking on this one too.
Flags: needinfo?(jteh)
I get one additional inline frame when I look at this in WinDBG:

0:000> kp
 # ChildEBP RetAddr  
00 (Inline) -------- xul!operator new+0x2 [z:\build\build\src\obj-firefox\dist\include\mozilla\mozalloc.h @ 206] 
01 0042ec60 050cd1b7 xul!mozilla::a11y::ia2AccessibleText::UpdateTextChangeData(class mozilla::a11y::HyperTextAccessibleWrap * aAcc = 0x17e96760, bool aInsert = false, class nsTString<char16_t> * aStr = 0x2029cd20, int aStart = 0n0, unsigned int aLen = 7)+0x14 [z:\build\build\src\accessible\windows\ia2\ia2accessibletext.cpp @ 515] 
02 0042ec84 050e64d0 xul!mozilla::a11y::HyperTextAccessibleWrap::HandleAccEvent(class mozilla::a11y::AccEvent * aEvent = 0x2029cd00)+0x60 [z:\build\build\src\accessible\windows\msaa\hypertextaccessiblewrap.cpp @ 61] 

mozalloc.h at line 206 says:

    return moz_xmalloc(size);

That's not particularly enlightening. If it were an out of memory situation, the reason would be out of memory, not access violation. Also, we don't see the frame for moz_xmalloc, although that could just be debugging weirdness. Sadly, I don't hav eany ideas.
Flags: needinfo?(jteh)
Whiteboard: a11y:crash-win
Jamie, should we take this anywhere? Philipp, is this crash still showing up?
Flags: needinfo?(madperson)
Flags: needinfo?(jteh)
only one crash was recorded in the past 6 months - i think we should not put any efforts into debugging this further at this point.
Flags: needinfo?(madperson)
This remaining single Firefox 58 crash from months ago (bp-c6d43486-58c0-4c46-a4f3-ace630180226) also has a very different stack to the one described in comment 0. Furthermore, there have been significant changes to Windows a11y since Firefox 58. I'm closing this as worksforme.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(jteh)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.