Closed Bug 1376105 Opened 7 years ago Closed 7 years ago

2.57 - 4.76% Explicit Memory / JS / Resident Memory (linux64, windows10-64-vm, windows7-32-vm) regression on push 6d121d4c2033db893f0c21951f761822d87df960 (Fri Jun 23 2017)

Categories

(Core :: JavaScript Engine, defect)

56 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- disabled
firefox57 --- fixed

People

(Reporter: jmaher, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: [MemShrink:P1])

Attachments

(2 files)

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=6d121d4c2033db893f0c21951f761822d87df960

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  5%  Resident Memory summary windows7-32-vm opt      282,495,998.08 -> 295,931,616.19
  3%  Explicit Memory summary windows7-32-vm opt      232,764,868.48 -> 239,646,739.28
  3%  JS summary windows7-32-vm opt                   87,628,384.54 -> 90,087,117.77
  3%  JS summary linux64 opt                          113,609,766.50 -> 116,690,511.14
  3%  Explicit Memory summary windows10-64-vm opt     291,391,961.41 -> 299,105,137.82
  3%  JS summary windows10-64-vm opt                  118,091,362.05 -> 121,167,784.88
  3%  Explicit Memory summary linux64 opt             305,508,620.83 -> 313,347,012.88


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=7514

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/AWSY
there are two bugs that landed together, luckily both are same author/review :)  We don't have specific rules to backout AWSY regressions or expectations to address them quickly.  With that said, it would be nice to know if this is expected, if not is there something we can do to reduce the memory hit.

:florian, do you have any ideas here why this might be happening?
Flags: needinfo?(florian)
Component: Untriaged → Async Tooling
Product: Firefox → Toolkit
(In reply to Joel Maher ( :jmaher) from comment #1)
> there are two bugs that landed together, luckily both are same author/review
> :)  We don't have specific rules to backout AWSY regressions or expectations
> to address them quickly.  With that said, it would be nice to know if this
> is expected, if not is there something we can do to reduce the memory hit.
> 
> :florian, do you have any ideas here why this might be happening?

This patchset is switching all the Firefox front-end code from using Promise.jsm (Promise-backend.js) promises to native DOM promises.

So 2 possible reasons for this regression: either there's a bug somewhere in one of the patches causing a leak (not very likely but possible), or native DOM promises should be optimized to use less memory.

Is there a way to see a diff of about:memory before/after? Is this the memory of the parent process, or the content processes, or the sum of them? How are these tests triggered (or why was this invisible on try)?
Flags: needinfo?(florian)
:erahm, :florian has a question about before/after memory for different processes
Flags: needinfo?(erahm)
Whiteboard: [memshrink]
(In reply to Joel Maher ( :jmaher) (afk until july 12) from comment #3)
> :erahm, :florian has a question about before/after memory for different
> processes

In treeherder you can select the build, choose the awsy test, go to the artifacts pane and download a memory report (choose whichever subtest is regressing the most). Do this for two builds and diff them in about:memory. About memory will separate the main process, content process, and gpu process if present.
Flags: needinfo?(erahm)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #4)

> In treeherder you can select the build, choose the awsy test, go to the
> artifacts pane and download a memory report (choose whichever subtest is
> regressing the most).

Where can I see which subtest is regressing the most?

I tried to pick 3 at random, but couldn't see anything really relevant. The biggest change I observed was a 1-2MB increase in heap unclassified.

Base revision: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4da7ff4c92bf47c1ceb8383da4cc459e84a95e08&filter-searchStr=awsy&selectedJob=109504681

Revision with the patches suspected of regression memory usage: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=6d121d4c2033db893f0c21951f761822d87df960&filter-searchStr=awsy&selectedJob=109639568
Flags: needinfo?(jmaher)
:florian What is the status of this bug? Have you got time to look over it?
Flags: needinfo?(florian)
<1 week until merge day, I am not clear that we have confirmed this to be expected, can we get a resolution on this bug in the next day or two?
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7)
> :florian What is the status of this bug?

Still no idea of where the additional memory use comes from. Unless we have tools that can clearly tell us what's going on, this is heading in a frustrating wontfix direction.

> Have you got time to look over it?

No, and I'm off for the next 2 weeks.
Flags: needinfo?(florian)
:erahm, do you have ideas on building tools for making looking at memory differences between two awsy runs?  Having a bug on file and details of what needs to be done would be all I am asing
Flags: needinfo?(erahm)
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #10)
> :erahm, do you have ideas on building tools for making looking at memory
> differences between two awsy runs?  Having a bug on file and details of what
> needs to be done would be all I am asing

See comment 4, about:memory has diff support built in. Something directly integrated with perfherder would be nice, but that's more of a wlach question.
Flags: needinfo?(erahm)
filed bug 1383901 for this.
I looked at a diff of linux64 tabs open (that's the largest regresssion), here's a snippet:

> 24.83 MB (100.0%) -- explicit
> ├──18.97 MB (76.41%) -- js-non-window
> │  ├───9.33 MB (37.59%) -- runtime
> │  │   ├──8.10 MB (32.62%) -- gc
> │  │   │  ├──6.00 MB (24.16%) ── nursery-committed
> │  │   │  ├──2.12 MB (08.55%) ── nursery-malloced-buffers
> │  │   │  └──-0.02 MB (-0.09%) ++ store-buffer
> │  │   ├──1.31 MB (05.29%) -- code
> │  │   │  ├──1.06 MB (04.28%) ── baseline
> │  │   │  └──0.25 MB (01.01%) ++ (4 tiny)
> │  │   └──-0.08 MB (-0.32%) ++ (3 tiny)
> │  ├───8.65 MB (34.85%) -- zones/zone(0xNNN)
> │  │   ├──5.08 MB (20.44%) ++ (190 tiny)
> │  │   ├──1.66 MB (06.70%) ++ strings
> │  │   ├──-0.54 MB (-2.19%) ── unused-gc-things [3]
> │  │   ├──0.42 MB (01.70%) ++ compartment([System Principal], resource:///modules/sessionstore/SessionStore.jsm)
> │  │   ├──0.35 MB (01.41%) -- compartment([System Principal], resource://gre/modules/RemoteController.jsm)
> │  │   │  ├──0.33 MB (01.34%) ── cross-compartment-wrapper-table
> │  │   │  └──0.02 MB (00.07%) ++ (3 tiny)
> │  │   ├──0.30 MB (01.23%) ── baseline/optimized-stubs
> │  │   ├──0.29 MB (01.19%) ++ compartment([System Principal], resource:///modules/sessionstore/TabState.jsm)
> │  │   ├──0.28 MB (01.15%) ++ shapes
> │  │   ├──0.28 MB (01.13%) ++ compartment([System Principal], resource://gre/modules/RemoteWebProgress.jsm)
> │  │   ├──0.27 MB (01.09%) ++ compartment([System Principal], chrome://marionette/content/assert.js)
> │  │   └──0.25 MB (01.01%) ++ compartment([System Principal], resource://gre/modules/PromiseWorker.jsm)
> │  └───0.98 MB (03.96%) ++ gc-heap
> ├───4.11 MB (16.57%) -- window-objects
> │   ├──3.99 MB (16.05%) -- top(chrome://browser/content/browser.xul, id=NNN)
> │   │  ├──2.61 MB (10.52%) -- active
> │   │  │  ├──2.61 MB (10.51%) -- window(chrome://browser/content/browser.xul)
> │   │  │  │  ├──2.55 MB (10.28%) -- js-compartment([System Principal], about:blank)
> │   │  │  │  │  ├──1.51 MB (06.08%) -- baseline
> │   │  │  │  │  │  ├──1.25 MB (05.05%) ── fallback-stubs
> │   │  │  │  │  │  └──0.26 MB (01.03%) ── data
> │   │  │  │  │  ├──0.66 MB (02.66%) -- classes
> │   │  │  │  │  │  ├──0.36 MB (01.44%) -- (12 tiny)
> │   │  │  │  │  │  │  ├──0.09 MB (00.35%) ── class(SavedFrame)/objects/gc-heap
> │   │  │  │  │  │  │  ├──0.07 MB (00.30%) ++ class(XPCWrappedNative_NoHelper)/objects
> │   │  │  │  │  │  │  ├──0.06 MB (00.25%) ── class(Map Iterator)/objects/gc-heap
> │   │  │  │  │  │  │  ├──0.04 MB (00.17%) ++ class(XPC_WN_NoMods_Proto_JSClass)/objects
> │   │  │  │  │  │  │  ├──0.04 MB (00.14%) ── class(Call)/objects/gc-heap
> │   │  │  │  │  │  │  ├──0.04 MB (00.14%) ++ class(Object)/objects
> │   │  │  │  │  │  │  ├──0.01 MB (00.04%) ++ class(Array)/objects
> │   │  │  │  │  │  │  ├──-0.01 MB (-0.04%) ── class(With)/objects/gc-heap
> │   │  │  │  │  │  │  ├──0.01 MB (00.03%) ++ class(<non-notable classes>)/objects
> │   │  │  │  │  │  │  ├──0.01 MB (00.03%) ── class(Proxy)/objects/gc-heap
> │   │  │  │  │  │  │  ├──0.01 MB (00.02%) ── class(LexicalEnvironment)/objects/gc-heap
> │   │  │  │  │  │  │  └──0.00 MB (00.00%) ++ class(XULElement)/objects
> │   │  │  │  │  │  └──0.30 MB (01.21%) -- class(Function)/objects
> │   │  │  │  │  │     ├──0.30 MB (01.21%) ── gc-heap
> │   │  │  │  │  │     └──-0.00 MB (-0.00%) ── malloc-heap/slots
> │   │  │  │  │  └──0.38 MB (01.55%) ++ (9 tiny)
> │   │  │  │  └──0.05 MB (00.22%) ++ (3 tiny)
> │   │  │  └──0.00 MB (00.02%) ++ window(about:blank)
> │   │  └──1.37 MB (05.53%) ++ js-zone(0xNNN)
> │   └──0.13 MB (00.52%) ++ (3 tiny)

JS non-window is your biggest increase.
Attached file TabsOpen - before
Attached file TabsOpen - after
It seems to be a little bit everywhere. Is it possible that switching from Promise.jsm where lots of promises were allocated in the same compartment to native promises (allocated in each compartment I would guess) makes us GC promises (and everything they reference) less aggressively?
Whiteboard: [memshrink] → [MemShrink:P1]
Component: Async Tooling → JavaScript Engine
Product: Toolkit → Core
Jon, do you have any ideas about what we could do to improve this regression? Looking at the diff in comment 13, it seems odd to me that the nursery would grow by 8MB.

The baseline fallback stubs number also seems really huge, but I don't know what that is.
Flags: needinfo?(jcoppeard)
Jan, I see 2.8MB of regressions related to baseline memory. Is that expected from using promises more? 1.25mb of fallback stubs in a single compartment seems particularly bad.
Flags: needinfo?(jdemooij)
> it seems odd to me that the nursery would grow by 8MB

That seems odd to me too.  I'll have to think about why that would be.  I would not have expected native promises to use more memory.  

Till, any idea what might be going on?
Flags: needinfo?(till)
Were these measurements taken with javascript.options.asyncstack set to true? If yes, that'd certainly cause increased memory usage. It wouldn't explain the fallback stubs - I have no idea whatsoever what's going on with those.
Flags: needinfo?(till)
(In reply to Till Schneidereit [:till] from comment #20)
> Were these measurements taken with javascript.options.asyncstack set to
> true? If yes, that'd certainly cause increased memory usage. It wouldn't
> explain the fallback stubs - I have no idea whatsoever what's going on with
> those.

The regression was noted on m-i for desktop builds so it looks like `javascript.options.asyncstack` should be true [1].

[1] http://searchfox.org/mozilla-central/rev/bbc1c59e460a27b20929b56489e2e55438de81fa/modules/libpref/init/all.js#1433
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #21)
> (In reply to Till Schneidereit [:till] from comment #20)
> > Were these measurements taken with javascript.options.asyncstack set to
> > true? If yes, that'd certainly cause increased memory usage. It wouldn't
> > explain the fallback stubs - I have no idea whatsoever what's going on with
> > those.
> 
> The regression was noted on m-i for desktop builds so it looks like
> `javascript.options.asyncstack` should be true [1].

In that case it'd be worth remeasuring with that setting disabled. Generally, it should be disabled for memory measurements. see also bug 1280819.
Depends on: 1342050, 1280819
Depends on: 1386407
Flags: needinfo?(jcoppeard)
(In reply to Andrew McCreight [:mccr8] from comment #18)
> Jan, I see 2.8MB of regressions related to baseline memory. Is that expected
> from using promises more? 1.25mb of fallback stubs in a single compartment
> seems particularly bad.

It's hard to say. JIT code is usually discarded on GC, so it could be just a difference in GC timing or something.

If there are more scripts to jit there will be more fallback stubs, but I doubt that's the case here.
Flags: needinfo?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #23)
> If there are more scripts to jit there will be more fallback stubs, but I
> doubt that's the case here.

The opposite, in fact: the JS Promise implementation had scripts that might be JIT-compiled, whereas the native implementation doesn't.

I don't know how likely that is, but a potential explanation would be that we're using *less* memory, delaying a GC that would've otherwise discarded the JIT code before the memory measurements.
Version: 53 Branch → 56 Branch
Eric, can we take this any farther?

Also is this a memshrink:p1 based on comment 22?
Flags: needinfo?(erahm)
(In reply to Jim Mathies [:jimm] from comment #25)
> Eric, can we take this any farther?

With the fix in bug 1386407 we saw ~2% improvement which is in line with this regression. Given that and Till's work on bug 1342050 and bug 1280819 we're probably covered.

> Also is this a memshrink:p1 based on comment 22?

It's P1 because it's a large memory regression.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(erahm)
Resolution: --- → FIXED
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #26)
> With the fix in bug 1386407 we saw ~2% improvement which is in line with
> this regression. Given that and Till's work on bug 1342050 and bug 1280819
> we're probably covered.

FWIW, I just landed a patch for bug 1342050, but I'm afraid I don't have the cycles for working on bug 1280819 right now.
From talking with Eric - sounds like the flag causing this issue (in bug 1386407) is disabled in beta.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: