Closed Bug 1224577 Opened 9 years ago Closed 9 years ago

Leaking chrome windows with basic SDK addon

Categories

(Add-on SDK Graveyard :: General, defect)

defect
Not set
normal

Tracking

(firefox42-, firefox43+ fixed, firefox44+ fixed, firefox45+ fixed)

VERIFIED FIXED
mozilla45
Tracking Status
firefox42 - ---
firefox43 + fixed
firefox44 + fixed
firefox45 + fixed

People

(Reporter: bzbarsky, Assigned: mossop)

References

Details

(Whiteboard: [MemShrink])

Attachments

(2 files)

[Tracking Requested - why for this release]: Because this makes the browser really suck to use.

SYMPTOMS: I get 2s or so GC pauses after running the browser for a bit, which obviously makes it a huge pain to use.  I haven't figured out yet what causes this.

DETAILS: 

1) E10S is disabled.  Since that's what we're shipping anyway, I figure that's fair game.  I don't know whether this happens with e10s enabled.

2) I don't know whether 44 is affected; this started happening in the last several weeks, I think, but I don't know exactly when.

3) GC log says:

GC(T+685172.8) Summary - Max Pause: 2106.738ms; MMU 20ms: 0.0%; MMU 50ms: 0.0%; Total: 3434.846ms; Zones: 577 of 577; Compartments: 3322 of 3322; HeapSize: 1734.738 MiB; HeapChange (abs): +4 (4); 

That particular GC had 107 slices.  The max pause is this slice:

GC Slice 77 - Pause: 2076.859ms of 10ms budget (@ 2333.459ms); Reason: REFRESH_FRAME; Reset: no; Times: Sweep: 2055.293ms, Other: 0.966ms, Mark During Sweeping: 1965.940ms, Mark Weak: 1865.910ms, Mark Incoming Gray Pointers: 0.284ms, Mark Gray: 86.467ms, Mark Gray and Weak: 13.098ms, Finalize Start Callbacks: 5.840ms, Per-Slice Weak Callback: 2.607ms, Per-Compartment Weak Callback: 3.227ms, Sweep Compartments: 73.772ms, Sweep Discard Code: 44.667ms, Sweep Inner Views: 0.177ms, Sweep Cross Compartment Wrappers: 41.703ms, Sweep Base Shapes: 5.276ms, Sweep Initial Shapes: 20.571ms, Sweep Type Objects: 44.497ms, Sweep Breakpoints: 19.368ms, Sweep Regexps: 0.846ms, Sweep Miscellaneous: 21.691ms, Sweep type information: 0.204ms, Other: 0.133ms, Sweep Object: 8.434ms, Sweep Shape: 0.131ms, Minor GCs to Evict Nursery: 1.888ms, Other: 0.427ms, Mark Roots: 1.461ms, Other: 1.253ms, Mark Rooters: 0.208ms

Note the "Mark weak" number.

4) Whenever this happens, about:memory says that I have a single (or at least not annotated as multiple in about:memory afaict) "window(chrome://browser/content/browser.xul)" under "explicit/window-objects/top(none)/detached" that takes up about a gig of memory.  Right now it's taking up 1115MB.  most of that (841MB) is a "js-compartment([System Principal], about:blank)" (why about:blank??) and most of the stuff in there is either "scripts" or "classes/class(Function)".  Also a fair amount of Block, XBL prototype JSClass, Window (20MB??), CSS2Properties, Proxy, Call, With, nsXPCComponents_Interfaces, etc.
The fact that I'm claimed to have 6.19 MB of "classes/class(nsXPCComponents_Interfaces)"  under my detached browser.xul makes me doubt that this is in fact a single window we're talking about here....  Especially since if I open things up and look at "classes/class(nsXPCComponents_Interfaces)/shapes/malloc-heap/tree-tables" (which is where most of the memory is) I see a "[323]" after it.  Nicholas, is it possible that we only show the numbers in brackets on leaves in about:memory and I'm looking at multiple windows leaked here?
Flags: needinfo?(n.nethercote)
terrence, what does the "Mark Weak" bit in the GC log mean?  Is this WeakMap or something else?  We need to figure out which code is keeping all this junk alive, so I'm trying to figure out what code we might want to start auditing, and this seems like at least a lead.
Flags: needinfo?(terrence)
Oh, and I can provide GC/CC logs from about:memory on request.  Olli and Steve already have them...
Oh, and currently I'm using a 2015-11-05 nightly.
One other note.  Consider this bit from about:memory:

5,827.34 MB (100.0%) -- explicit
├──2,104.61 MB (36.12%) -- window-objects
│  ├──1,141.97 MB (19.60%) -- top(none)
│  │  ├──1,135.52 MB (19.49%) -- detached
│  │  │  ├──1,122.66 MB (19.27%) -- window(chrome://browser/content/browser.xul)
│  │  │  │  ├────847.20 MB (14.54%) -- js-compartment([System Principal], about:blank)
│  │  │  │  │    ├──502.71 MB (08.63%) -- classes
│  │  │  │  │    │  ├──218.02 MB (03.74%) -- class(Function)
│  │  │  │  │    │  │  ├──204.94 MB (03.52%) -- objects
│  │  │  │  │    │  │  │  ├──201.17 MB (03.45%) ── gc-heap [357]

Note the [357].  If I open and close a browser window and click "Measure" it becomes [358] (not surprising).  If I now click "Minimize memory usage" and then click "Measure" it's still [358].  This part _is_ surprising, and indicates that we're just leaking every single window I open and close.  Which certainly explains the suck; I open/close windows a _lot_.
Could you attach an about:memory report, please? You can use an anonymized one if you prefer.
Whiteboard: [MemShrink]
bug 1223445 certainly can lead to massive runtime leaks when one has background tabs. One way to test if that is possibly the cause is to iterate through all the tabs by making each of them being foreground for a paint (refresh driver tick + paint is enough to release the leak on each tab).
The leak is such that it doesn't show up in cc or gc logs, since cc doesn't end up seeing where the unknown edges are coming from.

Though I think this is actually different issue, but I couldn't reproduce when I tried earlier.
(I'll look at the gc log today.)
If you email me some GC logs I can try to analyze what exactly is in the heap, or I can get a copy from Olli later today. Looking through those can let you see exactly what scripts are alive.

One big change to how we GC XPConnect stuff that landed recently is bug 1214961.
> bug 1223445 certainly can lead to massive runtime leak

I don't see how bug 1223445 would cause me to leak a window that I just open and close, right?  But that's what I see happening in comment 5.

> One big change to how we GC XPConnect stuff that landed recently is bug 1214961.

Hmm.  That's plausibly the right timeframe...
> If you email me some GC logs

Very funny.  They're 300MB _after_ compression.  :) Ping me on IRC and I'll temporarily put them somewhere where you can download them.
FWIW, there is at least the leak where
0x1e41e4e00 [rc=2] FragmentOrElement (xhtml) input class='newtab-control newtab-control-block' (orphan) about:newtab
0x1e41e4c00 [rc=2] FragmentOrElement (xhtml) input class='newtab-control newtab-control-pin' (orphan) about:newtab
tend to be the roots.
https://dxr.mozilla.org/mozilla-central/source/js/src/jsgc.cpp#4121

It could be weakmaps, watchpoints, debugger, or jitcode global tables (whatever those are). My money is on weakmap. If you could reproduce in rr, it shouldn't be hard to break here and figure out exactly which table it is, then narrow it down to the exact script allocation.
Flags: needinfo?(terrence)
(In reply to Andrew McCreight [:mccr8] from comment #8)
> If you email me some GC logs I can try to analyze what exactly is in the
> heap, or I can get a copy from Olli later today. Looking through those can
> let you see exactly what scripts are alive.
> 
> One big change to how we GC XPConnect stuff that landed recently is bug
> 1214961.

I'd expect to see a long FinalizeStartCallback phase in that case, rather than a long weak marking phase.
> If you could reproduce in rr,

Not trivial given that this is something that just randomly happens during my normal browsing and I'm on Mac.  :(
(In reply to Boris Zbarsky [:bz] from comment #1)
> is it possible that we only show the numbers in brackets on leaves in
> about:memory and I'm looking at multiple windows leaked here?

Yeah, it certainly sounds like we are leaking chrome windows.
Flags: needinfo?(n.nethercote)
Based on comment 12, some change to the new tab code may be involved. Bug 1204361 is the only change that I could see in the last month to the new tab code. The New Tab Override addon was also updated 10/28. I'll see if I can reproduce.
I can reproduce this locally.

Install the New Tab Override addon. Disable e10s. Restart. Open and close some windows, maybe loading stuff maybe not. Now you have some top(none)/detached/window(chrome://browser/content/browser.xul).
Summary: Some sort of memory leak in nightly keeps leading to huge GC pauses → Leaking chrome windows with New Tab Override addon
The fun part is that afaict this addon just calls some addon sdk stuff and one of our JSMs in totally innocuous ways.  So chances are, this is a leak in the addon sdk.
I backed out bug 1204361 locally and I can still reproduce a leak so that's not the issue.
Boris has reduced this to a very tiny Addon SDK addon.
Product: Core → Add-on SDK
Version: Trunk → unspecified
I'm going to try bisecting it to see if it is really a regression.
OK, so I tried changing the addon in question so index.js just contains this:

  exports.main = () => {};
  exports.onUnload = () => {};

and I still see leaks.  I didn't try changign its bootstrap.js because all that seems like total boilerplate.

With the addon thus modified and installed, the steps to reproduce the leak are pretty simple:

1)  Start browser.
2)  Open about:memory.
3)  Open a new window.
4)  Close that new window.
5)  Repeat steps 3 and 4 five more times.
6)  Click "Minimize memory".
7)  Click "Measure".

Then expand "explicit", expand "window-objects", expand "top(none)/detached", expand "js-compartment([System Principal], about:blank)", expand "classes", expand "class(Function)", expand "objects".  Note the [6] after both entries under "objects", indicating that there are 6 detached windows still alive.  These are the windows we opened and closed.

Dave, do you know who might be able to look at this on the SDK end?
Flags: needinfo?(dtownsend)
Summary: Leaking chrome windows with New Tab Override addon → Leaking chrome windows with basic SDK addon
I can still reproduce this same issue with a May 3 Nightly build of Firefox, using the New Tab Override addon. Maybe a change in the addon caused the problem?
It's also possible that the problem has been there all along and I just finally noticed it.  I would not have started using this addon before July sometime (because the pref worked before then) and I was out for most of August...
> I open things up and look at
> "classes/class(nsXPCComponents_Interfaces)/shapes/malloc-heap/tree-tables"
> (which is where most of the memory is) I see a "[323]" after it.  Nicholas,
> is it possible that we only show the numbers in brackets on leaves in
> about:memory and I'm looking at multiple windows leaked here?

Yes. The "[323]" annotations indicate duplicate reports and they only show up on leaves and you're almost certainly seeing evidence of multiple windows.
Attached patch patchSplinter Review
Well this was dumb...
Assignee: nobody → dtownsend
Flags: needinfo?(dtownsend)
Attachment #8688623 - Flags: review?(bzbarsky)
Comment on attachment 8688623 [details] [diff] [review]
patch

r=me, though I'm hardly an expert on this code.  The old thing is clearly wrong, though. ;)

This code looks like it was part of the bug 1139189 uplift, right?  So are we leaking like this in release builds right now?  :(
Flags: needinfo?(dtownsend)
Attachment #8688623 - Flags: review?(bzbarsky) → review+
Or put another way, which branches do we need to backport this to?
I did confirm this fixes the leak per the steps in comment 23.
(In reply to Boris Zbarsky [:bz] from comment #29)
> Or put another way, which branches do we need to backport this to?

We should uplift this to every branch we can :(
Flags: needinfo?(dtownsend)
Well, we don't need this on ESR38, right?

[Tracking Requested - why for this release]: Leaks are bad.  Very very bad.
Is this the same as bug 1142597?
Seems likely, yes.
Blocks: 1142597
https://hg.mozilla.org/mozilla-central/rev/924aca2113a4
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Dave, are you planning to request branch approvals?
Flags: needinfo?(dtownsend)
Comment on attachment 8688623 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]: Bug 1130529
[User impact if declined]: Any users with SDK based add-ons will leak every opened browser window until shutdown or the add-ons get disabled.
[Describe test coverage new/current, TreeHerder]: On m-c, we don't have any test coverage for this specific kind of leak though.
[Risks and why]: Very low risk, this fixes what is essentially a typo.
[String/UUID change made/needed]: None.
Flags: needinfo?(dtownsend)
Attachment #8688623 - Flags: approval-mozilla-release?
Attachment #8688623 - Flags: approval-mozilla-beta?
Attachment #8688623 - Flags: approval-mozilla-aurora?
Comment on attachment 8688623 [details] [diff] [review]
patch

Fix for memory leak, please uplift to aurora and beta.
Attachment #8688623 - Flags: approval-mozilla-beta?
Attachment #8688623 - Flags: approval-mozilla-beta+
Attachment #8688623 - Flags: approval-mozilla-aurora?
Attachment #8688623 - Flags: approval-mozilla-aurora+
Tracked bcuz it's a memory leak issue, good to see that it's fixed already.
Bz, can you please verify that this issue is fixed as expected on a latest Nightly build? Thanks! :)
Flags: needinfo?(bzbarsky)
I'm not seeing the leaked detached windows, at least.  We'll see how the gc pauses do.
Flags: needinfo?(bzbarsky)
Verified as per comment 45.
Status: RESOLVED → VERIFIED
Tracking belatedly in case this reopens.
Comment on attachment 8688623 [details] [diff] [review]
patch

This uplift to m-r was requested a couple of weeks ago, i.e. for 43, but m-r is now 43. It is too late to take this in 42.
Attachment #8688623 - Flags: approval-mozilla-release? → approval-mozilla-release-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: