Closed Bug 934509 Opened 11 years ago Closed 10 years ago

crash in nsGlobalWindow::RunTimeout(nsTimeout*)

Categories

(Core :: DOM: Core & HTML, defect)

27 Branch
x86
All
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla30
Tracking Status
firefox26 --- unaffected
firefox27 + verified
firefox28 + verified
firefox29 + verified
firefox30 + verified
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: tracy, Assigned: bzbarsky)

References

Details

(Keywords: crash, topcrash-win)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-3a4c2063-91c7-48f5-aeae-8419f2131103.
=============================================================

Frame 	Module 	Signature 	Source
0 	xul.dll 	nsGlobalWindow::RunTimeout(nsTimeout *) 	dom/base/nsGlobalWindow.cpp
1 	xul.dll 	nsGlobalWindow::TimerCallback(nsITimer *,void *) 	dom/base/nsGlobalWindow.cpp
2 	xul.dll 	nsTimerImpl::Fire() 	xpcom/threads/nsTimerImpl.cpp
3 	xul.dll 	nsTimerEvent::Run() 	xpcom/threads/nsTimerImpl.cpp
4 	xul.dll 	nsThread::ProcessNextEvent(bool,bool *) 	xpcom/threads/nsThread.cpp
5 	xul.dll 	nsRefPtr<XPCWrappedNative>::~nsRefPtr<XPCWrappedNative>() 	obj-firefox/dist/include/nsAutoPtr.h
6 	nss3.dll 	PR_Unlock 	nsprpub/pr/src/threads/combined/prulock.c
7 	xul.dll 	NS_ProcessNextEvent(nsIThread *,bool) 	xpcom/glue/nsThreadUtils.cpp
8 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) 	ipc/glue/MessagePump.cpp
9 	xul.dll 	xul.dll@0x127cd58 	
10 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
11 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
12 	xul.dll 	nsBaseAppShell::Run() 	widget/xpwidgets/nsBaseAppShell.cpp
13 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
14 	nss3.dll 	nss3.dll@0x79a0 	
15 	xul.dll 	XREMain::XRE_main(int,char * * const,nsXREAppData const *) 	toolkit/xre/nsAppRunner.cpp
16 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
17 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
18 	firefox.exe 	NS_internal_main(int,char * *) 	browser/app/nsBrowserApp.cpp
19 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
20 	firefox.exe 	__tmainCRTStartup 	f:/dd/vctools/crt_bld/self_x86/crt/src/crtexe.c
21 	kernel32.dll 	kernel32.dll@0x1336a 	
22 	ntdll.dll 	ntdll.dll@0x39f72 	
23 	ntdll.dll 	ntdll.dll@0x39f45

started occurring 10/28 on Aurora http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=52f24889dccc&tochange=7090e4028a66
and 
10/29 on Nightly http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4646259ab62d&tochange=518f5bff0ae4
Flags: needinfo?(benjamin)
https://crash-analysis.mozilla.com/bsmedberg/bug934509-nightly.csv
  regression appeared on buildid 20131028
https://crash-analysis.mozilla.com/bsmedberg/bug934509-aurora.csv
  3 hits on 20131029 but didn't appear "for real" until 20131101

I would look in the nightly regression range from the 27th-28th
Flags: needinfo?(benjamin)
I'm plagued by this crash right now in Win7 and Linux too, which happens randomly (even on chrome pages), if you need any info.
This is topcrash #11 on Aurora right now as seems to clearly be a 27 regression.
I use Nightly 28.0a1.
When I visit Google Calendar and other Google sites(except GMail and Google search),Firefox crash!
It can be reproduced.
https://crash-stats.mozilla.com/report/index/071b1574-d005-4ee4-bcbe-15dc02131121
Dapeng Wei, can you share when it crashes, is it upon login to gmail properties or some other action?

Please provide the steps you use to reproduce, thanks.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #5)
> Dapeng Wei, can you share when it crashes, is it upon login to gmail
> properties or some other action?
> 
> Please provide the steps you use to reproduce, thanks.
First,I login my Google account in Gmail.
Second,I open Google calender or Google accounts website,the Firefox crash.
It also crashes on my macbook.Thanks.
https://crash-stats.mozilla.com/report/index/d5b6feb6-b45e-44d7-bf41-7e36d2131121
Steps:
Install Greasemonkey, 
Install YouTube Lyrics by Rob W(http://userscripts.org/scripts/show/112070)
Open Grooveshark
Maybe it's Greasemonkey related? I can't really provide any steps as my crashes are just happening randomly.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #5)
> Dapeng Wei, can you share when it crashes, is it upon login to gmail
> properties or some other action?
> 
> Please provide the steps you use to reproduce, thanks.

I reproduced this crash on Mac and windows.
It related on Greasemonkey.
First,Firefox Nightly 28.0a1 install Greasemonkey 1.12 and this script(http://userscripts.org/scripts/show/159750)
Second,I open Google calender or Google accounts website,the Firefox crash.
Thanks.
Hi,
The crash happens on this specific piece of JavaScript:
clearInterval(v); // where "v" is null

However, it doesn't crash in every case, for example if you simply run "clearInterval(null);" in the Console (F12) on a webpage, then nothing will happen. Probably other conditions must be met.
Also, bypassing the crash is simple as that:
if (v) clearInterval(v);

Here is an extension that I develop, and which suffered from this crash:
https://files.myopera.com/Deathamns/extension/imagus/archive/firefox/imagus-0.9.7.xpi

To reproduce the crash (after the extension is installed and enabled):
- Open a new tab and load an image in it, for example: https://i.imgur.com/CV1QTXz.jpg
- At first time nothing should happen, the image normally loads, and the extension features should work, because at first "v" has a value returned by setInterval() (it runs a function that checks if the image is available, testing if the width property of the image isn't 0).
- Hit F5, which now should read the image from cache (if the image was already in your cache, maybe you had a crash already), and because of that the width is immediately available for the script, and it won't invoke the setInterval() for checking, but this leaves the "v" with its default null value (that I set earlier), and that will be passed to the clearInterval(), and the browser will crash.

My first report was made from (probably the build where the crash was introduced):
Product Firefox
Version 27.0a1
Build ID 20131028030205
7 Day Ranking:
> Firefox 26: N/A
> Firefox 27: #7 @ 0.17% (down 6 positions)
> Firefox 28: #23 @ 0.66% (down 2 positions)
> Firefox 29: #21 @ 0.55% (down 1 position)

Looks like this is still a top-crash (at least on Beta) but it appears to be dropping.
Right now (7-day):

Firefox 26: N/A
Firefox 27: #39 @ 0.26%
Firefox 28: #20 @ 0.70%
Firefox 29: #14 @ 0.78%

Also, 26.0 isn't completely unaffected, but at very low rate - the signature summary at https://crash-stats.mozilla.com/report/list?signature=nsGlobalWindow%3A%3ARunTimeout%28nsTimeout%2A%29 says that it exists in a lot of versions.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12)
> Right now (7-day):
> 
> Firefox 26: N/A
> Firefox 27: #39 @ 0.26%
> Firefox 28: #20 @ 0.70%
> Firefox 29: #14 @ 0.78%

This no longer qualifies as a top-crash.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #13)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12)
> > Right now (7-day):
> > 
> > Firefox 26: N/A
> > Firefox 27: #39 @ 0.26%
> > Firefox 28: #20 @ 0.70%
> > Firefox 29: #14 @ 0.78%
> 
> This no longer qualifies as a top-crash.

Given this, removing from release tracking here.
This is the #1 topcrash in the starting days of 27 release now.

Correlations:

64% (140/218) vs.   2% (773/47794) {e4a8a97b-f2ed-450b-b12d-ee082ba24781} (Greasemonkey, https://addons.mozilla.org/addon/748)
24% (52/218) vs.   0% (95/47794) scriptish@erikvold.com
23% (50/218) vs.   8% (3653/47794) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
11% (25/218) vs.   0% (234/47794) helper@savefrom.net
9% (20/218) vs.   1% (419/47794) feca4b87-3be4-43da-a1b1-137c24220968@jetpack
8% (18/218) vs.   1% (292/47794) elemhidehelper@adblockplus.org (Adblock Plus: Element Hiding Helper, https://addons.mozilla.org/addon/4364)
6% (14/218) vs.   0% (29/47794) memoryrestart@teamextension.com
7% (15/218) vs.   1% (311/47794) {46551EC9-40F0-4e47-8E18-8E5CF550CFB8} (Stylish, https://addons.mozilla.org/addon/2108)
6% (13/218) vs.   0% (59/47794) multifox@hultmann
9% (20/218) vs.   3% (1658/47794) {b9db16a4-6edc-47ec-a1f4-b86292ed211d} (Video DownloadHelper, https://addons.mozilla.org/addon/3006)

So, as earlier comments suspected, Greasemonkey makes this more likely, but there's other add-ons that help trigger this as well - anything noteworthy those have in common?
Keywords: topcrash-win
The correct regression range from comment 1 was:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a80dce1126db&tochange=4646259ab62d

It seems pretty clear to me that this is a regression from https://hg.mozilla.org/mozilla-central/rev/508288a2b62c bug 918345 which is quickstubbing "window" and landed right before the 27 train left.

The crash is here: http://hg.mozilla.org/releases/mozilla-release/annotate/b8896fee530d/dom/base/nsGlobalWindow.cpp#l11813 at crash address 0x0

So it seems that previously the timeout linked list would always hit the dummy timeout before it hit null, but now that's not true. peterv/bz do you know what might be going on here? Nested event loops resetting the dummy timer position, maybe?
Flags: needinfo?(peterv)
Flags: needinfo?(bzbarsky)
The steps in comment 10 don't reproduce the bug for me on trunk or when I build from rev 508288a2b62c, but the steps in comment 9 do.

clearInterval(null) is a no-op both before and after the changes in bug 918345, or at least should be.

I tried disabling the quickstubs entirely on trunk, and I still get the crash, but it's possible that the C++ changes involved broke something.  Looking into what that might be...
Oh, and I've confirmed via local builds that the crash I'm seeing with the comment 9 steps is in fact caused by bug 918345.
Blocks: 918345
Flags: needinfo?(bzbarsky)
> or at least should be.

Emphasis on _should_!  Deathamns, you were spot on, this is a clearInterval(0) problem.

Specifically, the nsTimeout constructor sets mPublicId(0).  Then nsGlobalWindow::SetTimeoutOrInterval sets mPublicId to the next sequence number before returning.

nsGlobalWindow::ClearTimeoutOrInterval walks the list of timeouts looking for the one whose mPublicId matches the argument and if it's not running removes it from the timeout list.

The callers of ClearTimeoutOrInterval are supposed to make sure that 0 doesn't get passed to it, but the XPCOM version of ClearInterval lost that check in bug 918345.  So now if you're in the middle of running timeouts on a content window and chrome calls clearInterval that content window (so you get Xrays, which are still calling the XPCOM ClearInterval for now) and passes 0 or something that coerces to 0 (null, undefined, "0", etc), we will helpfully remove dummy_timeout from the list.  After that, you get the observed crash.
Flags: needinfo?(peterv)
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
That patch applies cleanly to 27, 28, 29 as well.

Deathamns, Dapeng Wei, Theodoor, thank you all for steps to reproduce for this.  Having a way to reproduce made it much simpler to sort this out.

And Benjamin, thank you for pointing to the right place to look!
Whiteboard: [need review]
Comment on attachment 8372043 [details] [diff] [review]
This fixes the comment 9 steps to reproduce for me on trunk

[Approval Request Comment]
Regression caused by (bug #): Bug 918345 
User impact if declined: The crashes will continue until morale improves.
Testing completed (on m-c, etc.): Locally tested to fix the crash I could
   reproduce.
Risk to taking this patch (and alternatives if risky):  Very very low risk.
String or IDL/UUID changes made by this patch:  None

Do I need to request approval-mozilla-b2g28 on this too?
Attachment #8372043 - Flags: approval-mozilla-release?
Attachment #8372043 - Flags: approval-mozilla-beta?
Attachment #8372043 - Flags: approval-mozilla-aurora?
Keywords: verifyme
Comment on attachment 8372043 [details] [diff] [review]
This fixes the comment 9 steps to reproduce for me on trunk

Review of attachment 8372043 [details] [diff] [review]:
-----------------------------------------------------------------

Grmbl.
Attachment #8372043 - Flags: review?(peterv) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf640fe76500
Whiteboard: [need review]
Target Milestone: --- → mozilla30
https://hg.mozilla.org/mozilla-central/rev/bf640fe76500
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 969923
Comment on attachment 8372043 [details] [diff] [review]
This fixes the comment 9 steps to reproduce for me on trunk

Uplift accepted for aurora + beta.
Attachment #8372043 - Flags: approval-mozilla-beta?
Attachment #8372043 - Flags: approval-mozilla-beta+
Attachment #8372043 - Flags: approval-mozilla-aurora?
Attachment #8372043 - Flags: approval-mozilla-aurora+
No longer blocks: 969923
Multiple Greasemonkey users have reported crashes.  Sorry I don't understand Mozilla's terminology and buganizer fields better but:

Can you explain in plain English when exactly this bug will be fixed in Firefox?  Should I try to rush a patch into Greasemonkey to get it to work around the bug?
I don't think that release management has decided whether to do a dot-release of Firefox 27 for this issue. Setting NI? to clarify.
Flags: needinfo?(release-mgmt)
Anthony, this will definitely be fixed in Firefox 28.  As comment 30 says, whether it'll be fixed before that is unclear.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #30)
> I don't think that release management has decided whether to do a
> dot-release of Firefox 27 for this issue. Setting NI? to clarify.

At this point we still do not have a driver for a dot release or a chemspill. As mentioned, this is fixed in FF28 which will release in 5 weeks so if there is a workaround that would help users impacted by this over that time period, that would be ideal.
Flags: needinfo?(release-mgmt)
Reproduced in Nightly 2013-11-26 using the STR in comment 9.
Verified fixed FF 30.0a1(2014-02-10), Win 7 x64.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (X11; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0

Reproduced the initial crash on Nightly 2014-02-04 using STR from comment 9. Verified as fixed on Firefox 28 beta 1 and latest Aurora 29.0a2 (buildID: 20140211004037).
Status: RESOLVED → VERIFIED
Keywords: verifyme
Can we somehow test this?
Flags: in-testsuite?
See bug 969923 comemnt 7.
Anthony, do you have a possible workaround that you can land in GM until FF28 is released?
Flags: needinfo?(arantius)
I believe we can inject a clearInterval method into the sandbox, which wraps a private reference to the real clearInterval, and passes the call along only when the argument is non-falsy.
Flags: needinfo?(arantius)
(In reply to Anthony Lieuallen from comment #39)
> I believe we can inject a clearInterval method into the sandbox, which wraps
> a private reference to the real clearInterval, and passes the call along
> only when the argument is non-falsy.

We can certainly QA this if you have a patched version available.
(In reply to Lukas Blakk [:lsblakk] from comment #40)
> (In reply to Anthony Lieuallen from comment #39)
> > I believe we can inject a clearInterval method into the sandbox, which wraps
> > a private reference to the real clearInterval, and passes the call along
> > only when the argument is non-falsy.
> 
> We can certainly QA this if you have a patched version available.

Can we? I'm not sure how we could test this without a PoC and if we had one it'd be easier to just check that into a testsuite.
Anthony, comment 9 has steps to reproduce with greasemonkey.

If you just want a PoC not involving greasemonkey, run this in a browser scratchpad in a build without the fix:

  content.setTimeout(function() { content.clearInterval(0); }, 0)

but that won't tell you anything about whether Greasemonkey has shipped a workaround for this, of course.
Workaround patch:
https://github.com/greasemonkey/greasemonkey/commit/24f9860628536f897fb00b5a9eea5ea032c36982

Reproduce script:
https://gist.github.com/arantius/8943384

Install this script into anything up to Greasemonkey 1.15beta1 in Firefox 27 (click on the "raw" link) and confirm the "Crash?" dialog, and Firefox will crash.  Just built 1.15beta2:
https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/versions/#version-1.15beta2
Which contains the workaround patch.

If you can confirm the fix (works for me), I can push a non-beta build.
Using Firefox 27.0 and Greasemonkey 1.15beta1 I get the Crash? alert and a subsequent crash after loading accounts.google.com. After updating to Greasemonkey 1.15beta2 I do not get the Crash? alert on accounts.google.com.
Odd.  It (the prompt) should show up at all pages, and definitely does for me at https://accounts.google.com/ServiceLogin .  What do you observe at other pages?  I.e. http://www.example.com/ ?
(In reply to Anthony Lieuallen from comment #45)

> https://accounts.google.com/ServiceLogin
With GM 1.15beta2 nothing shows.

> http://www.example.com/
With GM 1.15beta2 nothing shows.
I just set up a new fresh Firefox profile, installed 1.15beta2 into Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0, added the linked script, and I totally get the "Crash?" prompt at both pages.

Do you get any errors logged to any console that look related?
Of the crashes with this signature in the past 7 days, 3967 of them have Greasemonkey installed and 1747 of them don't. This means to me that we're seeing this with web content also and whether or not Greasemonkey takes a workaround we should also spin a fixed build for users.
This works now on a clean profile following these steps:
1. Start Firefox 27.0 with a new profile
2. Install https://addons.mozilla.org/firefox/downloads/file/241259/greasemonkey-1.15beta1-fx.xpi?src=version-history
3. Restart Firefox
4. Load https://gist.github.com/arantius/8943384/raw/9db2349ffe1cdc172ba237fdbbe7635086f1571f/gm1869.user.js
5. Click Install on the installation dialog and OK on the doorhanger
6. Load https://accounts.google.com/ServiceLogin
> Crash? alert appears
7. Click OK
> "The page at https://accounts.youtube.com says: Crash?" alert appears
8. Click OK
> Crash (expected)
9. Restart Firefox
> Session restored with Crash? dialog
10. Click cancel 
> "The page at https://accounts.youtube.com says: Crash?" alert appears
11. Click cancel
> No crash
12. Install https://addons.mozilla.org/firefox/downloads/file/243206/greasemonkey-1.15beta2-fx.xpi?src=version-history
13. Restart Firefox
> Crash? alert appears
14. Click OK
> "The page at https://accounts.youtube.com says: Crash?" alert appears
15. Click OK
> No crash


I'm not sure why it isn't failing for me anymore as indicated in comment 46 but it works every time I try this now.
Ok I've pushed 1.15 final out.  I'd appreciate a quick review.
Comment on attachment 8372043 [details] [diff] [review]
This fixes the comment 9 steps to reproduce for me on trunk

We're going to take this on for a 27.0.1 to be released later this week when we unthrottle uptake for FF27 so please go ahead and land on m-r branch.
Attachment #8372043 - Flags: approval-mozilla-release? → approval-mozilla-release+
I reproduce the crash using the first 6 steps from comment 49 on Mac OS X 10.8.5 and Win 7 x64
Using the FF 27 release.

The bug was verified using the following environment:

FF 27.0.1 
Build Id : 20140212131424
OS: Win 7 x64, Mac Os x 10.8.5 and Ubuntu 12.04 x32
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: