Closed
Bug 1366812
Opened 7 years ago
Closed 7 years ago
Crash in mozilla::dom::TimeoutManager::MaybeStartThrottleTrackingTimout
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr45 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | fixed |
People
(Reporter: marcia, Assigned: ehsan.akhgari)
References
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(2 files)
1.04 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
1.15 KB,
patch
|
farre
:
review+
cpearce
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-07eb7eb6-fdff-4f47-935b-fbcd80170522. ============================================================= Seen while looking at crash stats - these crashes started using 20170522030207: http://bit.ly/2qNcbVX Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5b74bbf20e803e299790d266fc6ebf5d53b7a1b7&tochange=9851fcb0bf4d855c36729d7de19f0fa5c9f69776 Maybe Bug 1355311? ni on :farre
Flags: needinfo?(afarre)
Comment 1•7 years ago
|
||
It is. I'm not at a computer at the moment, so I can't back this out myself. Hello appreciated.
Flags: needinfo?(afarre)
Comment 3•7 years ago
|
||
[Tracking Requested - why for this release]: Regression ni per bug #1366817 comment #0
status-firefox53:
--- → unaffected
status-firefox-esr45:
--- → unaffected
status-firefox-esr52:
--- → unaffected
tracking-firefox55:
--- → ?
Flags: needinfo?(bkelly)
OS: Windows 10 → All
Hardware: x86 → All
Comment 4•7 years ago
|
||
Andreas has asked for a backout of bug 1355311.
Depends on: 1355311
Flags: needinfo?(bkelly)
Comment 5•7 years ago
|
||
Fixed by the backout of bug 1355311.
Updated•7 years ago
|
Crash Signature: [@ mozilla::dom::TimeoutManager::MaybeStartThrottleTrackingTimout] → [@ mozilla::dom::TimeoutManager::MaybeStartThrottleTrackingTimout]
[@ libxul.so@0x11f1cbf | libxul.so@0x2d9f446 | libxul.so@0x2d9f11e | libxul.so@0x2d9df67 | libxul.so@0x369a924 | libxul.so@0x373a44c | libxul.so@0x303eb71 | libxul.so@0x2c21b49 | libxul.s…
Updated•7 years ago
|
Crash Signature: libxul.so@0x2c1fea2 | libxul.so@0x2c20f24 | libxul.so@0x2b68b7f | libxul.so@...] → libxul.so@0x2c1fea2 | libxul.so@0x2c20f24 | libxul.so@0x2b68b7f | libxul.so@...]
[@ libxul.so@0x11f1cbf | nsGlobalWindow::PostHandleEvent ]
Comment 6•7 years ago
|
||
I got the crash again with 0524 nightly(with reland of bug 1355311). report: bp-8fe4ef9c-a2e2-4d7a-8acc-7ce7d0170524
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(ehsan)
Assignee | ||
Comment 7•7 years ago
|
||
I think I know what causes this. This isn't a real bug, just a bad diagnostic assert misfiring. TimeoutManager's constructor runs from nsPIDOMWindow's constructor here: <https://searchfox.org/mozilla-central/rev/2933592c4a01b634ab53315ce2d0e43fccb82181/dom/base/nsGlobalWindow.cpp#1015> which runs before nsGlobalWindow's constructor. mThrottleTrackingTimeouts is initialized here <https://searchfox.org/mozilla-central/rev/2933592c4a01b634ab53315ce2d0e43fccb82181/dom/base/TimeoutManager.cpp#309>. For the first window object, gTrackingTimeoutThrottlingDelay is zero initialized, and then initialized in TimeoutManager::Initialize() here <https://searchfox.org/mozilla-central/rev/2933592c4a01b634ab53315ce2d0e43fccb82181/dom/base/TimeoutManager.cpp#346>. But that only runs from nsGlobalWindows ctor <https://searchfox.org/mozilla-central/rev/2933592c4a01b634ab53315ce2d0e43fccb82181/dom/base/nsGlobalWindow.cpp#1648> which is after our mThrottleTrackingTimeouts has been initialized. This way, by the time we get to the diagnostic assert that is firing here, mThrottleTrackingTimeouts will be true and we will crash.
Assignee: nobody → ehsan
Flags: needinfo?(ehsan)
Assignee | ||
Comment 8•7 years ago
|
||
This way we won't rely on code which may have potentially not run yet.
Attachment #8870917 -
Flags: review?(bugs)
Assignee | ||
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•7 years ago
|
Attachment #8870917 -
Flags: review?(bugs) → review+
Pushed by eakhgari@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8547e166d6d0 Initialize TimeoutManager::mThrottleTrackingTimeouts to false; r=smaug
Comment 10•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8547e166d6d0
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Comment 11•7 years ago
|
||
Hmm... While I think my fix may have fixed some crashes, I *just* reproduced this crash bp-b6a6c7c0-f27f-494e-9a1e-212bf0170525 which happened with the following rough STR: 1. Go to https://blog.mozilla.org/javascript/feed/ (which loads about:feeds) 2. Click on one of the links to go to a different page. 3. Wait for a long time... Switch to other tabs, do stuff. 4. Switch back to this tab, and then press back. Boom! Crash happens. And then as I was typing these comments: 5. (I had restored the tab after the crash so I am looking at the feed page right now) 6. Press forward. Crash happens again! bp-e57c549b-6d54-4863-ba7a-1095a0170525 I also reproduced by creating a long chain of history and going back and forward in it. I think what needs to happen is for the page to get evicted from bf-cache and get reloaded when we navigate to it again... And perhaps about:feeds is special somehow! Andreas, do you mind seeing if you can reproduce with this STR please?
Flags: needinfo?(afarre)
Assignee | ||
Updated•7 years ago
|
Assignee: ehsan → nobody
Comment 12•7 years ago
|
||
There are 38 crashes with signature "libxul.so@0x11f7c69 | nsGlobalWindow::PostHandleEvent", all in nightly 55 with buildid 20170524100215. The moz crash reason is always "MOZ_RELEASE_ASSERT(!mThrottleTrackingTimeouts)".
Status: RESOLVED → REOPENED
Crash Signature: libxul.so@0x2c1fea2 | libxul.so@0x2c20f24 | libxul.so@0x2b68b7f | libxul.so@...]
[@ libxul.so@0x11f1cbf | nsGlobalWindow::PostHandleEvent ] → libxul.so@0x2c1fea2 | libxul.so@0x2c20f24 | libxul.so@0x2b68b7f | libxul.so@...]
[@ libxul.so@0x11f1cbf | nsGlobalWindow::PostHandleEvent ]
[@ libxul.so@0x11f7c69 | nsGlobalWindow::PostHandleEvent ]
Resolution: FIXED → ---
Updated•7 years ago
|
Updated•7 years ago
|
Crash Signature: libxul.so@0x2c1fea2 | libxul.so@0x2c20f24 | libxul.so@0x2b68b7f | libxul.so@...]
[@ libxul.so@0x11f1cbf | nsGlobalWindow::PostHandleEvent ]
[@ libxul.so@0x11f7c69 | nsGlobalWindow::PostHandleEvent ] → libxul.so@0x2c1fea2 | libxul.so@0x2c20f24 | libxul.so@0x2b68b7f | libxul.so@...]
[@ libxul.so@0x11f1cbf | nsGlobalWindow::PostHandleEvent ]
[@ libxul.so@0x11f7c69 | nsGlobalWindow::PostHandleEvent ]
[@ libxul.so@0xc1d36f | nsGlobalWindow::PostHandleEv…
Updated•7 years ago
|
Crash Signature: nsGlobalWindow::PostHandleEvent ] → nsGlobalWindow::PostHandleEvent ]
[@ libxul.so@0x11d6ee9 | nsGlobalWindow::PostHandleEvent ]
Comment 13•7 years ago
|
||
The signature "mozilla::dom::TimeoutManager::MaybeStartThrottleTrackingTimout" is #2 in content topcrashes and #15 in browser topcrashes.
Keywords: topcrash
Assignee | ||
Comment 14•7 years ago
|
||
OK, this is pretty easy to reproduce actually, and I have it recorded under rr and starting to debug it now. I suspect that this is a different bug biting us because of this: [Child 29989] WARNING: NS_ENSURE_TRUE(mDocViewer) failed: file /home/ehsan/moz/src.1347035/layout/base/nsDocumentViewer.cpp, line 3807 [Child 29989] ###!!! ASSERTION: Bad readystate: 'mDocument->IsXULDocument() || mDocument->GetReadyStateEnum() == nsIDocument::READYSTATE_INTERACTIVE || (mDocument->GetReadyStateEnum() == nsIDocument::READYSTATE_UNINITIALIZED && NS_IsAboutBlank(mDocument->GetDocumentURI()))', file /home/ehsan/moz/src.1347035/layout/base/nsDocumentViewer.cpp, line 1045 Assertion failure: !mThrottleTrackingTimeouts, at /home/ehsan/moz/src.1347035/dom/base/TimeoutManager.cpp:1586 It seems that by the time we get to the TimeoutManager code the state of things is severely messed up, but the problem is elsewhere.
Assignee: nobody → ehsan
Flags: needinfo?(afarre)
Assignee | ||
Comment 15•7 years ago
|
||
The issue is quite simple here. In order to explain, let me first explain a simple STR: 1. Go to https://blog.mozilla.org/javascript/feed/. 2. Click on the first link. 3. Wait for a few minutes. 4. Click Back. 5. Click Forward. The assertion fires. What happens is we first run nsGlobalWindow::Resume(), and mThrottleTrackingTimeouts is true, and due to this check <https://searchfox.org/mozilla-central/rev/a14524a72de6f4ff738a5e784970f0730cea03d8/dom/base/TimeoutManager.cpp#1405> MaybeStartThrottleTrackingTimout() isn't called and things are good. But the load event will still fire even though mIsDocumentLoaded is true, and the TimeoutManager code isn't accounting for this possibility. So we end up calling MaybeStartThrottleTrackingTimout() when we shouldn't have, and we assert fatally as a result. :-( (And after this happens rr gets into a super confused state when nothing including the assertion stack even look sane any more which wasted a few hours for me when investigating this otherwise simple bug!)
Assignee | ||
Comment 16•7 years ago
|
||
Requesting review from cpearce in order to try to get this into tomorrow's Nightly with a post-landing review from Andreas with the pledge of addressing his review comments after I land this fix.
Attachment #8871587 -
Flags: review?(cpearce)
Attachment #8871587 -
Flags: review?(afarre)
Comment 17•7 years ago
|
||
Comment on attachment 8871587 [details] [diff] [review] Account for the fact that the load event may fire even though mIsDocumentLoaded is true in TimeoutManager Review of attachment 8871587 [details] [diff] [review]: ----------------------------------------------------------------- Uh, yeah, that'll stop the diagnostic assert... I trust you'll follow up and fix the underlying issue!
Attachment #8871587 -
Flags: review?(cpearce) → review+
Assignee | ||
Comment 18•7 years ago
|
||
I think this is actually all there is to it as far as the fix goes, FWIW! This was just all a diagnostic assert misfiring all along. We should be really careful adding these asserts. I think we have started to use them way too liberally and have forgotten the possibility of causing top crashers like this in case of simple benign mistakes, etc.
Assignee | ||
Comment 19•7 years ago
|
||
I couldn't find an active sheriff on IRC, but this fix is trivial and I think it doesn't make sense to land this on inbound and potentially miss tomorrow's Nightly. I'm going to land this directly on central.
Comment 20•7 years ago
|
||
Pushed by eakhgari@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/5d6fe59a9a5d Account for the fact that the load event may fire even though mIsDocumentLoaded is true in TimeoutManager; r=cpearce a=topcrasher
Assignee | ||
Updated•7 years ago
|
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Comment 21•7 years ago
|
||
Ehsan, is the load event firing twice here? Or because we are suspended/resumed before the load event happened in the first place? I don't think we are using diagnostic asserts too liberally. Clearly there was a misunderstanding of the invariants involved. This helped us understand the code better and so we can avoid other bugs hitting release.
Flags: needinfo?(ehsan)
Assignee | ||
Comment 22•7 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #21) > Ehsan, is the load event firing twice here? Or because we are > suspended/resumed before the load event happened in the first place? The latter. > I don't think we are using diagnostic asserts too liberally. Clearly there > was a misunderstanding of the invariants involved. This helped us > understand the code better and so we can avoid other bugs hitting release. The point is, there was no real bug here, besides a bogus assertion crashing the user's browser. Had we not had this diagnostic assert, the code would successfully schedule a new nsITimer, which would reset the mThrottleTrackingTimeouts member to true. That's needless work, for sure, but it's not a bug, and has no adverse side effects beyond wasting some CPU cycles. I should have resisted when you made me make this a diagnostic assert in bug 1338691 comment 2. In the future I'm going to ask for a clear explanation as to why a debug only assert firing would be severe enough to justify crashing the user's browser, causing top crash issues, and affecting the Nightly quality especially as we're moving to the model where Nightly stability is being treated as more of a goal by release management.
Flags: needinfo?(ehsan)
Comment 23•7 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #22) > The point is, there was no real bug here, besides a bogus assertion crashing > the user's browser. Had we not had this diagnostic assert, the code would > successfully schedule a new nsITimer, which would reset the > mThrottleTrackingTimeouts member to true. That's needless work, for sure, > but it's not a bug, and has no adverse side effects beyond wasting some CPU > cycles. > > I should have resisted when you made me make this a diagnostic assert in bug > 1338691 comment 2. In the future I'm going to ask for a clear explanation > as to why a debug only assert firing would be severe enough to justify > crashing the user's browser, causing top crash issues, and affecting the > Nightly quality especially as we're moving to the model where Nightly > stability is being treated as more of a goal by release management. But the MOZ_ASSERT() still would have been *wrong*. Our understanding was *wrong*. Without the diagnostic assert we would not have discovered this. This is important because someone reading the code infers information from the assertion. And IMO nightly is exactly the place to discover these things. We did not ship this to beta or release. If you want to argue for depending less on invariants like this and just being more flexible about state, that is fine. But to say we should just hear-no-evil-see-no-evil about assertions doesn't make sense to me.
Comment 24•7 years ago
|
||
Comment on attachment 8871587 [details] [diff] [review] Account for the fact that the load event may fire even though mIsDocumentLoaded is true in TimeoutManager r+ing this even tough it has already landed.
Attachment #8871587 -
Flags: review?(afarre) → review+
Assignee | ||
Comment 25•7 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #23) > But the MOZ_ASSERT() still would have been *wrong*. Our understanding was > *wrong*. Without the diagnostic assert we would not have discovered this. > > This is important because someone reading the code infers information from > the assertion. > > And IMO nightly is exactly the place to discover these things. We did not > ship this to beta or release. > > If you want to argue for depending less on invariants like this and just > being more flexible about state, that is fine. But to say we should just > hear-no-evil-see-no-evil about assertions doesn't make sense to me. Just to clarify, it is the former thing I was arguing for, certainly not the latter! IOW, if we have an assertion about a state, promoting it to be a diagnostic assert only makes sense if we think that the state that it is checking is so vital that its check not passing is worth crashing the user's browser and ruining all of the other testing that we would like to achieve out of the user's Nightly session, since each one of these assertions isn't the only testing experiment that lives in Nightly in isolation, we should balance the potential risk of one of them fatally terminating the chances of running the rest against the reward of such a fatal termination uncovering a bug. In this case, the state being asserted wasn't really all that important to begin with... (Also, to be fair, I am talking now with hindsight on my side, these things are sometimes not super obvious during code development/review. But we should learn from experience. I've seen this in a few diagnostic asserts that I have put in or r+ed at least!)
Comment 26•7 years ago
|
||
If you're not confident an assertion is true, you probably shouldn't put it in at all. You should make it a dynamic check instead. If you think an assertion should be true, then you should make it a diagnostic so that it actually gets coverage. People just don't run with DEBUG builds. Anything else and we're just fooling ourselves and confusing people who read the code later. Soft assertions are a net negative IMO. And you act like we shipped this on release. We did not. If we can't take some risks on nightly then we are substantially limited in our development. Anyway, I suspect we won't agree here.
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•