Closed Bug 1382768 Opened 7 years ago Closed 7 years ago

7.46 - 7.7% Heap Unclassified (windows10-64, windows7-32) regression on push b7d81fea0b3324261236e91aafa0c70c8e08395b (Thu Jul 20 2017)

Categories

(Core :: DOM: Core & HTML, defect)

53 Branch
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1159003
Tracking Status
firefox56 - fixed

People

(Reporter: jmaher, Assigned: baku)

References

Details

(Keywords: perf, regression, Whiteboard: [MemShrink:P1])

Attachments

(1 file, 3 obsolete files)

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=b7d81fea0b3324261236e91aafa0c70c8e08395b

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  8%  Heap Unclassified summary windows7-32 opt      49,546,501.64 -> 53,360,030.26
  7%  Heap Unclassified summary windows10-64 opt     56,459,421.60 -> 60,669,380.32


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=8116

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/AWSY
Component: Untriaged → DOM
Product: Firefox → Core
:baku, I see your name as the patch author for the code from bug 1159003.  That seems to be the root cause, for this regression in memory usage from awsy on win7/win10.  Is this something that you were expecting or something that you could look into for a fix?
Flags: needinfo?(amarchesini)
Well, that patch removed a cap on the number of user timing entries we store.  If one of the awsy tests stores a bunch of user timing entries, then that could do it.

That said, if that's what's going on we should have a memory reporter here so it doesn't end up in the heap-unclassified bucket.

And of course it's possible things are just leaking.
I can add a memory reporter to Performance so that we can see what has been allocated. I take this bug.
Flags: needinfo?(amarchesini)
Attached patch measure.patch (obsolete) — Splinter Review
Assignee: nobody → amarchesini
Attachment #8888652 - Flags: review?(n.nethercote)
Comment on attachment 8888652 [details] [diff] [review]
measure.patch

Review of attachment 8888652 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good, but it doesn't quite follow the usual form for measuring objects that involve inheritance, so I'd like to see it again after revisions. Details below.

::: dom/performance/PerformanceEntry.cpp
@@ +42,5 @@
>    return mozilla::dom::PerformanceEntryBinding::Wrap(aCx, this, aGivenProto);
>  }
> +
> +size_t
> +PerformanceEntry::CommonSizeOf(mozilla::MallocSizeOf aMallocSizeOf) const

https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Memory_reporting#An_Example_Involving_Inheritance has details on how to do memory reporting when inheritance is involved.

According to it, this method should be called SizeOfExcludingThis().

@@ +49,5 @@
> +         mEntryType.SizeOfExcludingThisIfUnshared(aMallocSizeOf);
> +}
> +
> +size_t
> +PerformanceEntry::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const

This should be called SizeOfIncludingThis().

::: dom/performance/PerformanceMark.cpp
@@ +30,5 @@
>    return PerformanceMarkBinding::Wrap(aCx, this, aGivenProto);
>  }
> +
> +size_t
> +PerformanceMark::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const

This should be called SizeOfIncludingThis().

::: dom/performance/PerformanceMeasure.cpp
@@ +32,5 @@
>    return PerformanceMeasureBinding::Wrap(aCx, this, aGivenProto);
>  }
> +
> +size_t
> +PerformanceMeasure::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const

This should be called SizeOfIncludingThis().

::: dom/performance/PerformanceResourceTiming.cpp
@@ +52,5 @@
>    return PerformanceResourceTimingBinding::Wrap(aCx, this, aGivenProto);
>  }
> +
> +size_t
> +PerformanceResourceTiming::SizeOf(mozilla::MallocSizeOf aMallocSizeOf) const

This one should be split into two: SizeOfIncludingThis() which has the usual form, and SizeOfExcludingThis() which (a) calls PerformanceEntry::SizeOfExcludingThis(), and (b) measures mInitiatorType and mNextHopProtocol.
Attachment #8888652 - Flags: review?(n.nethercote) → feedback+
Attached patch measure.patch (obsolete) — Splinter Review
Attachment #8888652 - Attachment is obsolete: true
Attachment #8888733 - Flags: review?(n.nethercote)
Attachment #8888733 - Flags: review?(n.nethercote) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/abd1b9db4905
Performance API must be a memory reporter, r=njn
So...

1)  The memory reporter bit should have gone in a different bug.  It doesn't fix the memory regression!  It's just diagnostics for trying to determine whether the memory regression is actual user timings or something else (e.g. leaked Performance instances).  Please either file a new bug for the actual regression here and morph this one to being about adding a memory reporter, or mark this [leave open] and deal with the mess that is an open bug with landed patches... :(

2)  Is there a reason the new reporter doesn't put the memory under the relevant window in about:memory, instead of sticking it at toplevel?  Putting it under the relevant window would allow using about:memory to see which page is actually creating a ton of user entries...

3)  The patch as landed uses "explicit/dom/performance/userEntries" as the path for both MOZ_COLLECT_REPORT invocations.  I expect this is wrong.
Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
Keywords: leave-open
Attached patch performance.patch (obsolete) — Splinter Review
Attachment #8888733 - Attachment is obsolete: true
Attachment #8888949 - Flags: review?(bzbarsky)
Comment on attachment 8888949 [details] [diff] [review]
performance.patch

Aha!  This is more like it, in terms of fixing the memory increase issue.

That said, it's odd that we can end up in Unlink before we CleanUp().  And what's particularly worrying is that if the window were actually in a cycle with the Performance object, then how would this get cleaned up if CleanUp() does not get called?  The observer service holds a strong ref to the Performance the entire cycle is alive, it would never go away...

In what circumstances do we hit this code?
Flags: needinfo?(amarchesini)
Comment on attachment 8888733 [details] [diff] [review]
measure.patch

Review of attachment 8888733 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/performance/Performance.cpp
@@ +600,5 @@
> +    resourceEntries += entry->SizeOfIncludingThis(MallocSizeOf);
> +  }
> +
> +  MOZ_COLLECT_REPORT(
> +    "explicit/dom/performance/userEntries", KIND_HEAP, UNITS_BYTES,

Oh, as bz pointed out, this is the same path as above. And about:memory paths avoid camelCase, so it should be "explicit/dom/performance/resource-entries". (Or put it under the actual window like bz suggests.)

Thanks bz for spotting these; I should have caught them myself.
Blocks: 1383691
> In what circumstances do we hit this code?

What I suspect it happens here is that Window and Performance create a cycle of references. When we call UNLINK, this cycle is broken. CleanUp will not find mPerformance, and the observer is not unregistered. I don't see anything to present UNLINK to be called after CleanUp in the 3 calls we have: the DTOR (UNLINK was already executed here), DetachFromDocShell() and ReallyCloseWindow().
Flags: needinfo?(amarchesini)
Hold on. I can reproduce it. I'm working on it.
Well, DetachFromDocShell and ReallyCloseWindow had both better happen before Unlink, I'd think.

As for the destructor, that just comes back to "why did we reach the destructor without reaching DetachFromDocShell or ReallyCloseWindow?"

> What I suspect it happens here is that Window and Performance create a cycle of references.

But my point was that _if_ this happens then that cycle as a whole is live, because the Performance object has a strong ref from the observer service.  So this really can't be what's happening, and if it _did_ happen we would leak in any situation where that patch would use unlink to unregister the Performance object from the observer service.
and the backout shows an improvement:
== Change summary for alert #8179 (as of July 21 2017 18:19 UTC) ==

Improvements:

 67%  Images summary windows7-32 opt      19,401,553.86 -> 6,313,444.94
 33%  Explicit Memory summary windows7-32 opt 441,396,677.39 -> 294,803,182.86
 27%  JS summary windows7-32 opt          151,862,067.64 -> 110,230,039.86
 25%  Resident Memory summary windows7-32 opt 516,290,958.03 -> 388,241,880.20
 16%  Heap Unclassified summary windows7-32 opt 63,566,273.37 -> 53,452,794.17

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8179
> Well, DetachFromDocShell and ReallyCloseWindow had both better happen before
> Unlink, I'd think.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=91ab683af4d68b20cc7586586b32f71eeb4f107a&selectedJob=117327072

Clearly this is not true. DetachFromDocShell is executed on outer windows, not inner windows. For each inner window, we call FreeInnerObjects(), and btw, there is a lot of duplicated code between FreeInnerObjects and CleanUp.
CleanUp() is then executed in the DTOR, but of course, at that point we already leaked.

It's also not true that FreeInnerObjects() is always executed before the UNLINK because also here I have seen crashes using a different patch.

> > What I suspect it happens here is that Window and Performance create a cycle of references.
> 
> But my point was that _if_ this happens then that cycle as a whole is live,
> because the Performance object has a strong ref from the observer service. 
> So this really can't be what's happening, and if it _did_ happen we would
> leak in any situation where that patch would use unlink to unregister the
> Performance object from the observer service.

No, because the cycle is broken by the UNLINK. mPerformance is unlinked and nullified. Performance UNLINK nullifies the window. the cycle is broken, but the performance object is still registered to the observer service.

In this new version of the patch I add a weak observer. The removeObserver() is called in the DTOR of PerformanceMainThread.
Attachment #8888949 - Attachment is obsolete: true
Attachment #8888949 - Flags: review?(bzbarsky)
Attachment #8889828 - Flags: review?(bzbarsky)
> DetachFromDocShell is executed on outer windows, not inner windows.

Ah, right.  OK.  And same for ReallyCloseWindow.

Which means that for _inner_ windows CleanUp only happens from the dtor at the moment.

> It's also not true that FreeInnerObjects() is always executed before the UNLINK

Hmm... ok.  This stuff is such a mess.  :(

> No, because the cycle is broken by the UNLINK. mPerformance is unlinked and nullified.

How could it possibly be unlinked if it has a strong reference to it from the observer service?  It's not a dead cycle; it won't be unlinked.

> In this new version of the patch I add a weak observer.

OK, _that_ I buy as the right fix for this bug.  ;)
[Tracking Requested - why for this release]: bad memory regression
Whiteboard: [MemShrink]
Comment on attachment 8889828 [details] [diff] [review]
performance.patch

So it's not clear to me whether RemoveObserver in a destructor is actually ok.  It will end up calling QI and refcounting and so forth...

Nathan, can you tell whether it's OK?

If it _is_ OK, this patch looks fine, assuming you add comments explaining how we don't actually have a "window is shutting down" thing that we call reliably so you can't remove us from the observer service in a sane non-destructor way.  :(
Attachment #8889828 - Flags: review?(nfroyd)
Attachment #8889828 - Flags: review?(bzbarsky)
Attachment #8889828 - Flags: review+
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment on attachment 8889828 [details] [diff] [review]
performance.patch

Review of attachment 8889828 [details] [diff] [review]:
-----------------------------------------------------------------

We don't have a lot of instances of RemoveObserver being called from destructors in the codebase.  But I think it *should* be safe, assuming you're using the refcounting macros from XPCOM, because the final release will stabilize the refcount at 1 before calling the object's destructor.  Thus QIs and such shouldn't wind up deleting the object again.

That being said, I'd feel better, and I think it would be safer, to RemoveObserver outside the destructor and thus I'm curious if...

::: dom/base/nsGlobalWindow.cpp
@@ +2004,5 @@
>    mExternal = nullptr;
>  
>    mMozSelfSupport = nullptr;
>  
> +  mPerformance = nullptr;

...couldn't we just continue calling Shutdown here...

@@ +2198,5 @@
>        mTabChild->BeforeUnloadRemoved();
>      }
>    }
> +
> +  mPerformance = nullptr;

...and here, rather than doing it in the destructor?

::: dom/performance/PerformanceMainThread.cpp
@@ +58,5 @@
>    if (NS_WARN_IF(!obs)) {
>      return nullptr;
>    }
>  
> +  nsresult rv = obs->AddObserver(performance, "memory-pressure", true);

If we always explicitly Shutdown() the PerformanceMainThreadObject, we could keep this being a strong reference.  We have to remove the observer service reference in a timely fashion, otherwise the Performance object holds on to a whole raft of stuff, right?  So we can't rely on the observer service dropping our reference at some unspecified point down the road, correct?

Having the destructor RemoveObserver() seems to negate the whole point of using weak references in the first place; weak references suggest to me "somebody else will take care of this", but we are very much not adopting that position with this patch.

But I also confess that I don't understand the whole setup of Performance objects as they relate to nsGlobalWindow, so there might be some other reason that weak references would be preferred here.
Attachment #8889828 - Flags: review?(nfroyd)
Nathan, thanks for your comment. I'll apply some of your ideas in bug 1159003.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
I believe Andrea was saying that he was seeing us end up in ~nsGlobalWindow without having called either CleanUp() or FreeInnerObjects(), and hence that if we had the strong ref thing going on we'd fail to shut things down, because the CleanUp/FreeInnerObjects call never happened...
(In reply to Boris Zbarsky [:bz] from comment #25)
> I believe Andrea was saying that he was seeing us end up in ~nsGlobalWindow
> without having called either CleanUp() or FreeInnerObjects(), and hence that
> if we had the strong ref thing going on we'd fail to shut things down,
> because the CleanUp/FreeInnerObjects call never happened...

Hm, OK.  That's...pretty weird, right?

If that's the case, I guess it boils down to whether we think duplicating the code to shutdown and null out mPerformance in three (!) different places is better than simply RemoveObserver'ing in the destructor.  In such a case, I think I have a slight preference for putting things in the destructor, but I don't think we have to use weak references.  Maybe add a MOZ_RELEASE_ASSERT(mRefCnt == 1) after we make the RemoveObserver() call, to make sure that nobody else is trying to hold references to an about-to-be-deleted object?
I think the current plan is to reuse an already-existing observer, actually, but thank you for the suggestions!
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b17eb8514a10
Performance API must be a memory reporter, r=bz
Mark 56 fixed as bug 1159003 was fixed in 56.
Depends on: 1389184
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: