Closed
Bug 1382768
Opened 7 years ago
Closed 7 years ago
7.46 - 7.7% Heap Unclassified (windows10-64, windows7-32) regression on push b7d81fea0b3324261236e91aafa0c70c8e08395b (Thu Jul 20 2017)
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 1159003
People
(Reporter: jmaher, Assigned: baku)
References
Details
(Keywords: perf, regression, Whiteboard: [MemShrink:P1])
Attachments
(1 file, 3 obsolete files)
6.04 KB,
patch
|
bzbarsky
:
review+
|
Details | Diff | Splinter Review |
We have detected an awsy regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=b7d81fea0b3324261236e91aafa0c70c8e08395b As author of one of the patches included in that push, we need your help to address this regression. Regressions: 8% Heap Unclassified summary windows7-32 opt 49,546,501.64 -> 53,360,030.26 7% Heap Unclassified summary windows10-64 opt 56,459,421.60 -> 60,669,380.32 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=8116 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format. To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/AWSY
Updated•7 years ago
|
Component: Untriaged → DOM
Product: Firefox → Core
Reporter | ||
Comment 1•7 years ago
|
||
:baku, I see your name as the patch author for the code from bug 1159003. That seems to be the root cause, for this regression in memory usage from awsy on win7/win10. Is this something that you were expecting or something that you could look into for a fix?
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(amarchesini)
Comment 2•7 years ago
|
||
Well, that patch removed a cap on the number of user timing entries we store. If one of the awsy tests stores a bunch of user timing entries, then that could do it. That said, if that's what's going on we should have a memory reporter here so it doesn't end up in the heap-unclassified bucket. And of course it's possible things are just leaking.
Assignee | ||
Comment 3•7 years ago
|
||
I can add a memory reporter to Performance so that we can see what has been allocated. I take this bug.
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 4•7 years ago
|
||
Assignee: nobody → amarchesini
Attachment #8888652 -
Flags: review?(n.nethercote)
Comment 5•7 years ago
|
||
Comment on attachment 8888652 [details] [diff] [review] measure.patch Review of attachment 8888652 [details] [diff] [review]: ----------------------------------------------------------------- Looks pretty good, but it doesn't quite follow the usual form for measuring objects that involve inheritance, so I'd like to see it again after revisions. Details below. ::: dom/performance/PerformanceEntry.cpp @@ +42,5 @@ > return mozilla::dom::PerformanceEntryBinding::Wrap(aCx, this, aGivenProto); > } > + > +size_t > +PerformanceEntry::CommonSizeOf(mozilla::MallocSizeOf aMallocSizeOf) const https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Memory_reporting#An_Example_Involving_Inheritance has details on how to do memory reporting when inheritance is involved. According to it, this method should be called SizeOfExcludingThis(). @@ +49,5 @@ > + mEntryType.SizeOfExcludingThisIfUnshared(aMallocSizeOf); > +} > + > +size_t > +PerformanceEntry::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const This should be called SizeOfIncludingThis(). ::: dom/performance/PerformanceMark.cpp @@ +30,5 @@ > return PerformanceMarkBinding::Wrap(aCx, this, aGivenProto); > } > + > +size_t > +PerformanceMark::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const This should be called SizeOfIncludingThis(). ::: dom/performance/PerformanceMeasure.cpp @@ +32,5 @@ > return PerformanceMeasureBinding::Wrap(aCx, this, aGivenProto); > } > + > +size_t > +PerformanceMeasure::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const This should be called SizeOfIncludingThis(). ::: dom/performance/PerformanceResourceTiming.cpp @@ +52,5 @@ > return PerformanceResourceTimingBinding::Wrap(aCx, this, aGivenProto); > } > + > +size_t > +PerformanceResourceTiming::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const This one should be split into two: SizeOfIncludingThis() which has the usual form, and SizeOfExcludingThis() which (a) calls PerformanceEntry::SizeOfExcludingThis(), and (b) measures mInitiatorType and mNextHopProtocol.
Attachment #8888652 -
Flags: review?(n.nethercote) → feedback+
Assignee | ||
Comment 6•7 years ago
|
||
Attachment #8888652 -
Attachment is obsolete: true
Attachment #8888733 -
Flags: review?(n.nethercote)
Updated•7 years ago
|
Attachment #8888733 -
Flags: review?(n.nethercote) → review+
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/abd1b9db4905 Performance API must be a memory reporter, r=njn
Comment 9•7 years ago
|
||
So... 1) The memory reporter bit should have gone in a different bug. It doesn't fix the memory regression! It's just diagnostics for trying to determine whether the memory regression is actual user timings or something else (e.g. leaked Performance instances). Please either file a new bug for the actual regression here and morph this one to being about adding a memory reporter, or mark this [leave open] and deal with the mess that is an open bug with landed patches... :( 2) Is there a reason the new reporter doesn't put the memory under the relevant window in about:memory, instead of sticking it at toplevel? Putting it under the relevant window would allow using about:memory to see which page is actually creating a ton of user entries... 3) The patch as landed uses "explicit/dom/performance/userEntries" as the path for both MOZ_COLLECT_REPORT invocations. I expect this is wrong.
Flags: needinfo?(amarchesini)
Comment 10•7 years ago
|
||
Backed out for leaks in mochitests, e.g. browser_bug822367.js: https://hg.mozilla.org/integration/mozilla-inbound/rev/fafcbde047540092ab629013acedb177026660e5 Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=1b2ab38ba0b83e2c5dd137b6c8959bfa575c4d7d&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable&filter-resultStatus=usercancel Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=116431755&repo=mozilla-inbound > [task 2017-07-21T15:46:38.234737Z] 15:46:38 ERROR - TEST-UNEXPECTED-FAIL | browser/base/content/test/siteIdentity/browser_bug822367.js | leaked 3 window(s) until shutdown [url = https://test1.example.com/browser/browser/base/content/test/siteIdentity/file_bug822367_6.html]
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(amarchesini)
Keywords: leave-open
Assignee | ||
Comment 11•7 years ago
|
||
Attachment #8888733 -
Attachment is obsolete: true
Attachment #8888949 -
Flags: review?(bzbarsky)
Comment 12•7 years ago
|
||
Comment on attachment 8888949 [details] [diff] [review] performance.patch Aha! This is more like it, in terms of fixing the memory increase issue. That said, it's odd that we can end up in Unlink before we CleanUp(). And what's particularly worrying is that if the window were actually in a cycle with the Performance object, then how would this get cleaned up if CleanUp() does not get called? The observer service holds a strong ref to the Performance the entire cycle is alive, it would never go away... In what circumstances do we hit this code?
Flags: needinfo?(amarchesini)
Comment 13•7 years ago
|
||
Comment on attachment 8888733 [details] [diff] [review] measure.patch Review of attachment 8888733 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/performance/Performance.cpp @@ +600,5 @@ > + resourceEntries += entry->SizeOfIncludingThis(MallocSizeOf); > + } > + > + MOZ_COLLECT_REPORT( > + "explicit/dom/performance/userEntries", KIND_HEAP, UNITS_BYTES, Oh, as bz pointed out, this is the same path as above. And about:memory paths avoid camelCase, so it should be "explicit/dom/performance/resource-entries". (Or put it under the actual window like bz suggests.) Thanks bz for spotting these; I should have caught them myself.
Assignee | ||
Comment 15•7 years ago
|
||
> In what circumstances do we hit this code?
What I suspect it happens here is that Window and Performance create a cycle of references. When we call UNLINK, this cycle is broken. CleanUp will not find mPerformance, and the observer is not unregistered. I don't see anything to present UNLINK to be called after CleanUp in the 3 calls we have: the DTOR (UNLINK was already executed here), DetachFromDocShell() and ReallyCloseWindow().
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 16•7 years ago
|
||
Hold on. I can reproduce it. I'm working on it.
Comment 17•7 years ago
|
||
Well, DetachFromDocShell and ReallyCloseWindow had both better happen before Unlink, I'd think.
As for the destructor, that just comes back to "why did we reach the destructor without reaching DetachFromDocShell or ReallyCloseWindow?"
> What I suspect it happens here is that Window and Performance create a cycle of references.
But my point was that _if_ this happens then that cycle as a whole is live, because the Performance object has a strong ref from the observer service. So this really can't be what's happening, and if it _did_ happen we would leak in any situation where that patch would use unlink to unregister the Performance object from the observer service.
Reporter | ||
Comment 18•7 years ago
|
||
and the backout shows an improvement: == Change summary for alert #8179 (as of July 21 2017 18:19 UTC) == Improvements: 67% Images summary windows7-32 opt 19,401,553.86 -> 6,313,444.94 33% Explicit Memory summary windows7-32 opt 441,396,677.39 -> 294,803,182.86 27% JS summary windows7-32 opt 151,862,067.64 -> 110,230,039.86 25% Resident Memory summary windows7-32 opt 516,290,958.03 -> 388,241,880.20 16% Heap Unclassified summary windows7-32 opt 63,566,273.37 -> 53,452,794.17 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8179
Assignee | ||
Comment 19•7 years ago
|
||
> Well, DetachFromDocShell and ReallyCloseWindow had both better happen before > Unlink, I'd think. https://treeherder.mozilla.org/#/jobs?repo=try&revision=91ab683af4d68b20cc7586586b32f71eeb4f107a&selectedJob=117327072 Clearly this is not true. DetachFromDocShell is executed on outer windows, not inner windows. For each inner window, we call FreeInnerObjects(), and btw, there is a lot of duplicated code between FreeInnerObjects and CleanUp. CleanUp() is then executed in the DTOR, but of course, at that point we already leaked. It's also not true that FreeInnerObjects() is always executed before the UNLINK because also here I have seen crashes using a different patch. > > What I suspect it happens here is that Window and Performance create a cycle of references. > > But my point was that _if_ this happens then that cycle as a whole is live, > because the Performance object has a strong ref from the observer service. > So this really can't be what's happening, and if it _did_ happen we would > leak in any situation where that patch would use unlink to unregister the > Performance object from the observer service. No, because the cycle is broken by the UNLINK. mPerformance is unlinked and nullified. Performance UNLINK nullifies the window. the cycle is broken, but the performance object is still registered to the observer service. In this new version of the patch I add a weak observer. The removeObserver() is called in the DTOR of PerformanceMainThread.
Attachment #8888949 -
Attachment is obsolete: true
Attachment #8888949 -
Flags: review?(bzbarsky)
Attachment #8889828 -
Flags: review?(bzbarsky)
Comment 20•7 years ago
|
||
> DetachFromDocShell is executed on outer windows, not inner windows. Ah, right. OK. And same for ReallyCloseWindow. Which means that for _inner_ windows CleanUp only happens from the dtor at the moment. > It's also not true that FreeInnerObjects() is always executed before the UNLINK Hmm... ok. This stuff is such a mess. :( > No, because the cycle is broken by the UNLINK. mPerformance is unlinked and nullified. How could it possibly be unlinked if it has a strong reference to it from the observer service? It's not a dead cycle; it won't be unlinked. > In this new version of the patch I add a weak observer. OK, _that_ I buy as the right fix for this bug. ;)
Comment 21•7 years ago
|
||
[Tracking Requested - why for this release]: bad memory regression
Comment 22•7 years ago
|
||
Comment on attachment 8889828 [details] [diff] [review] performance.patch So it's not clear to me whether RemoveObserver in a destructor is actually ok. It will end up calling QI and refcounting and so forth... Nathan, can you tell whether it's OK? If it _is_ OK, this patch looks fine, assuming you add comments explaining how we don't actually have a "window is shutting down" thing that we call reliably so you can't remove us from the observer service in a sane non-destructor way. :(
Attachment #8889828 -
Flags: review?(nfroyd)
Attachment #8889828 -
Flags: review?(bzbarsky)
Attachment #8889828 -
Flags: review+
Updated•7 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment 23•7 years ago
|
||
Comment on attachment 8889828 [details] [diff] [review] performance.patch Review of attachment 8889828 [details] [diff] [review]: ----------------------------------------------------------------- We don't have a lot of instances of RemoveObserver being called from destructors in the codebase. But I think it *should* be safe, assuming you're using the refcounting macros from XPCOM, because the final release will stabilize the refcount at 1 before calling the object's destructor. Thus QIs and such shouldn't wind up deleting the object again. That being said, I'd feel better, and I think it would be safer, to RemoveObserver outside the destructor and thus I'm curious if... ::: dom/base/nsGlobalWindow.cpp @@ +2004,5 @@ > mExternal = nullptr; > > mMozSelfSupport = nullptr; > > + mPerformance = nullptr; ...couldn't we just continue calling Shutdown here... @@ +2198,5 @@ > mTabChild->BeforeUnloadRemoved(); > } > } > + > + mPerformance = nullptr; ...and here, rather than doing it in the destructor? ::: dom/performance/PerformanceMainThread.cpp @@ +58,5 @@ > if (NS_WARN_IF(!obs)) { > return nullptr; > } > > + nsresult rv = obs->AddObserver(performance, "memory-pressure", true); If we always explicitly Shutdown() the PerformanceMainThreadObject, we could keep this being a strong reference. We have to remove the observer service reference in a timely fashion, otherwise the Performance object holds on to a whole raft of stuff, right? So we can't rely on the observer service dropping our reference at some unspecified point down the road, correct? Having the destructor RemoveObserver() seems to negate the whole point of using weak references in the first place; weak references suggest to me "somebody else will take care of this", but we are very much not adopting that position with this patch. But I also confess that I don't understand the whole setup of Performance objects as they relate to nsGlobalWindow, so there might be some other reason that weak references would be preferred here.
Attachment #8889828 -
Flags: review?(nfroyd)
Assignee | ||
Comment 24•7 years ago
|
||
Nathan, thanks for your comment. I'll apply some of your ideas in bug 1159003.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Comment 25•7 years ago
|
||
I believe Andrea was saying that he was seeing us end up in ~nsGlobalWindow without having called either CleanUp() or FreeInnerObjects(), and hence that if we had the strong ref thing going on we'd fail to shut things down, because the CleanUp/FreeInnerObjects call never happened...
Comment 26•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #25) > I believe Andrea was saying that he was seeing us end up in ~nsGlobalWindow > without having called either CleanUp() or FreeInnerObjects(), and hence that > if we had the strong ref thing going on we'd fail to shut things down, > because the CleanUp/FreeInnerObjects call never happened... Hm, OK. That's...pretty weird, right? If that's the case, I guess it boils down to whether we think duplicating the code to shutdown and null out mPerformance in three (!) different places is better than simply RemoveObserver'ing in the destructor. In such a case, I think I have a slight preference for putting things in the destructor, but I don't think we have to use weak references. Maybe add a MOZ_RELEASE_ASSERT(mRefCnt == 1) after we make the RemoveObserver() call, to make sure that nobody else is trying to hold references to an about-to-be-deleted object?
Comment 27•7 years ago
|
||
I think the current plan is to reuse an already-existing observer, actually, but thank you for the suggestions!
Comment 28•7 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b17eb8514a10 Performance API must be a memory reporter, r=bz
Comment 29•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b17eb8514a10
Comment 30•7 years ago
|
||
Mark 56 fixed as bug 1159003 was fixed in 56.
Comment 31•6 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•