Closed
Bug 1264161
Opened 8 years ago
Closed 8 years ago
Leaking many MB of heap-textures each time you open and close a Firefox window with e10s enabled
Categories
(Core :: Graphics, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: bzbarsky, Assigned: kats)
References
Details
(Whiteboard: [gfx-noted][MemShrink:P1])
Attachments
(1 file, 1 obsolete file)
5.52 KB,
patch
|
botond
:
review+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Over the last few days I've commonly been getting the "our of application memory" dialog on Mac. I just tried looking at about:memory for my nightly, and it's showing about 2GB in the main process used by explicit/gfx/heap-textures. Is this expected? If this is a cache of some sort, can I somehow flush it to see what the effect is on memory usage of the nightly process and the kernel?
Flags: needinfo?(milan)
Reporter | ||
Comment 1•8 years ago
|
||
Oh, and I have 36 windows containing 132 tabs between them, though 82 of those tabs are not actually loaded.
Boris, I imagine you need a long enough workflow that we can't do a mozregression on it? Anthony, can you see if you can reproduce this as a regression? Looking at the last few days, there were a few bugs where we tried to improve the memory situation, it's possible one of those introduced a regression. * Bug 1259785 - we fixed a canvas memory leak * Bug 1262328 - handling memory pressure * Bug 1262846 - we updated a font-related library
Flags: needinfo?(milan) → needinfo?(anthony.s.hughes)
OS: Unspecified → Mac OS X
Reporter | ||
Comment 3•8 years ago
|
||
Yes, I don't know offhand how to get into this state. :( Also of note: 1) This is a 2016-04-07 nightly. The previous nightly I was using was probably another week or two older than that. 2) I just re-measured and now I'm at 2.8GB of gfx/heap-textures, according to about:memory....
(In reply to Milan Sreckovic [:milan] from comment #2) > Boris, I imagine you need a long enough workflow that we can't do a > mozregression on it? Anthony, can you see if you can reproduce this as a > regression? I'm not seeing this on my profile. I'm sitting at 16.72MB in gfx/heap-textures on my profile.
Flags: needinfo?(anthony.s.hughes)
Reporter | ||
Comment 5•8 years ago
|
||
OK, so I just tried restarting the browser (so now I'm on the 2016-04-08 nightly). With my session restored but without any browsing so far other than the things I already head loaded, I'm at ~590MB of gfx/heap-textures. Every time I open/close a window gfx/heap-textures goes up by 1MB or so. Minimizing memory usage via about:memory sometimes seems to reclaim this memory (or at least other gfx/heap-textures memory?), and sometimes not....
(In reply to Boris Zbarsky [:bz] from comment #5) > > Every time I open/close a window gfx/heap-textures goes up by 1MB or so. > Minimizing memory usage via about:memory sometimes seems to reclaim this > memory (or at least other gfx/heap-textures memory?), and sometimes not.... I think I see this on nightly, but Markus does not. Anthony, what about you?
Flags: needinfo?(anthony.s.hughes)
Updated•8 years ago
|
Whiteboard: [gfx-noted]
No I don't see this at all. I'm using Firefox Nightly 48.0a1 20160413030239 on Mac OS 10.11.5 Beta (15F18b).
Flags: needinfo?(anthony.s.hughes)
Reporter | ||
Comment 8•8 years ago
|
||
And now in that same session I mentioned in comment 5, I'm at 1.35GB for gfx/heap-textures. I have been browsing around with it in the meantime, obviously, opening/closing windows, etc. If I minimize memory usage, that number changes to .... 1.416GB (!). And now I minimized again and it's at 1.37GB.
Reporter | ||
Comment 9•8 years ago
|
||
One other thing. Under "Other Measurements", there are these two entries: 12,829.90 MB ── gfx-textures 15,854.78 MB ── gfx-textures-peak I'm not sure what those mean. explicit/gfx/heap-textures is currently around 1.6GB; do we also have non-heap textures?
Reporter | ||
Comment 10•8 years ago
|
||
Nicolas, mccr8 suggested asking you to take a look.
Flags: needinfo?(nical.bugzilla)
Comment 11•8 years ago
|
||
I can also reproduce the heap-textures measurement going up and staying up when you open and close windows on OSX.
Whiteboard: [gfx-noted] → [gfx-noted][MemShrink]
Let's start with Jeff first.
Flags: needinfo?(nical.bugzilla) → needinfo?(jmuizelaar)
Reporter | ||
Comment 13•8 years ago
|
||
I just got the "Your system has run out of application memory" popup again. At this time, I have, in my parent process: 3,923.12 MB (100.0%) -- explicit ├──2,773.27 MB (70.69%) -- gfx │ ├──2,773.07 MB (70.69%) ── heap-textures .... 17,718.42 MB ── gfx-textures 18,957.02 MB ── gfx-textures-peak Note that those last two numbers are now larger than my available RAM (16GB), if that means anything...
Updated•8 years ago
|
Assignee: nobody → bgirard
Flags: needinfo?(jmuizelaar)
Comment 14•8 years ago
|
||
Any chance this is related to switching to skia for content rendering on OS X in bug 1207332? bz, can you try setting the preference "gfx.content.azure.backends" to "cg", restarting Firefox and seeing if things clear up?
Updated•8 years ago
|
Flags: needinfo?(bzbarsky)
Comment 15•8 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #14) > Any chance this is related to switching to skia for content rendering on OS > X in bug 1207332? > > bz, can you try setting the preference "gfx.content.azure.backends" to "cg", > restarting Firefox and seeing if things clear up? Sorry for the noise, I was able to repro locally. Going back to cg didn't fix anything. This appears to be an e10s only bug, I can't repro with e10s disabled.
tracking-e10s:
--- → ?
Flags: needinfo?(bzbarsky)
Updated•8 years ago
|
Whiteboard: [gfx-noted][MemShrink] → [gfx-noted][MemShrink:P1]
Comment 16•8 years ago
|
||
I ended up getting sucked into this thinking it might be related to bug 1262088. We're leaking nsBaseWidget, and along with it the layer manager, compositor and any number of textures. The problem is that nsBaseWidget::ConfigureAPZCTreeManager() takes a reference (inside the ChromeProcessController), and then SetControllerForLayerTree stores it in sIndirectLayerTrees. Nothing calls DeallocateLayerTreeId for root layer tree id, so this entry exists as long as the process does. Kats, want to fix this? The threading model on this code confuses me a fair bit. It looks like the only use of mWidget within ChromeProcessController is to access views and pres shell, which would only be safe on the main thread. SetControllerForLayerTree explicitly posts the messages to the compositor thread though, and sIndirectLayerTrees is a structure usually used by the compositor for associating child process content with the chrome process window.
Updated•8 years ago
|
Flags: needinfo?(bugmail.mozilla)
Comment 17•8 years ago
|
||
Some users open and close a lot of windows, so this could be contributing to memory problems. about:memory reports from OOM crashes in the main process could be checked to confirm that.
Blocks: e10s-oom
Updated•8 years ago
|
Summary: Fairly large amount of memory used for textures → Leaking 1MB of heap-textures when opening and closing a Firefox window
Updated•8 years ago
|
Summary: Leaking 1MB of heap-textures when opening and closing a Firefox window → Leaking 1MB of heap-textures each time you open and close a Firefox window
Comment 18•8 years ago
|
||
My experience was we're leaking a lot more than 1MiB.
Summary: Leaking 1MB of heap-textures each time you open and close a Firefox window → Leaking many MB of heap-textures each time you open and close a Firefox window with e10s enabled
Assignee | ||
Comment 19•8 years ago
|
||
I can look into this, sure. I'm wondering how none of our automated tests are picking this up...
Assignee: bgirard → bugmail.mozilla
Flags: needinfo?(bugmail.mozilla)
Comment 20•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19) > I'm wondering how none of our automated tests are picking this up... Our main leak checking runs late in shutdown. I'd guess that this memory gets cleaned up when you close the main window. AreWeSlimYet opens pages in new tabs, not new windows, so it wouldn't detect this.
Updated•8 years ago
|
status-firefox47:
--- → affected
Comment 21•8 years ago
|
||
I'm going to guess that 47 is also affected.
Assignee | ||
Comment 22•8 years ago
|
||
So it looks like in the gtk and windows widgets, the nsWindow::Destroy override triggers destruction of the compositor by calling DestroyLayerManager (windows) or DestroyCompositor (gtk). nsChildView.mm on OS X, and Android's nsWindow don't appear to do this. This leaves sIndirectLayerTrees with a dangling entry for the root layer id, which holds the ChromeProcessController which holds the widget, as well as a bunch of other stuff. I added this bit of code to nsBaseWidget::OnDestroy(): + if (mCompositorBridgeParent) { + const CompositorBridgeParent::LayerTreeState* state = CompositorBridgeParent::GetIndirectShadowTree(mCompositorBridgeParent->RootLayerTreeId()); + if (state && state->mController) { + state->mController->Destroy(); + } + } which fixed the problem by clearing the root layer id entry from sIndirectLayerTrees on widget Destroy(). However I think a better fix is to just call DestroyCompositor() or DestroyLayerManager(), either in the nsBaseWidget::OnDestroy function or in the OS X and Android subclasses. I'm unsure if the ordering of stuff here matters, the nsChildView::Destroy function has a comment and a lock [1] that seems to imply we can't be destroying the compositor there. Markus, do you know what that's about? [1] http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.mm?rev=327c61df0bae#624
Flags: needinfo?(mstange)
Assignee | ||
Updated•8 years ago
|
Blocks: apz-desktop-blockers
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #20) > Our main leak checking runs late in shutdown. I'd guess that this memory > gets cleaned up when you close the main window. AreWeSlimYet opens pages in > new tabs, not new windows, so it wouldn't detect this. Yes, you're right. During shutdown CompositorBridgeParent::ShutDown is called, which clears the entire sIndirectLayerTrees table, and frees up everything.
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #22) > I'm > unsure if the ordering of stuff here matters, the nsChildView::Destroy > function has a comment and a lock [1] that seems to imply we can't be > destroying the compositor there. Markus, do you know what that's about? Actually looking at it closer, it should be safe to destroy the compositor. The lock is to prevent the composition interleaving with shutdown.
Flags: needinfo?(mstange)
Assignee | ||
Comment 25•8 years ago
|
||
For future reference, the shutdown procedure goes like this: 1. nsWindow::Destroy is called 2. Somewhere during that call, DestroyCompositor/DestroyLayerManager is called 3. This calls CompositorBridgeChild::Destroy, which sends a WillClose to CompositorBridgeParent and then schedules a DeferredDestroy 4. The RecvWillClose on the parent side clears out a bunch of layer managers 5. The DeferredDestroy closes the IPC channel, which triggers ActorDestroy on the parent side 6. CompositorBridgeParent::ActorDestroy clears the mRootLayerID entry from the sIndirectLayerTrees map, which clears the widget pointer 7. Stuff is destroyed. On OS X, step 2 wasn't happening. After I fixed that I noticed that step 6 wasn't happening either because mLayerManager was getting nulled out in step 4. And that seems to affect Linux as well, and probably windows too? So maybe this bug affects all platforms. Anyway, I fixed up the code to follow the shutdown procedure above and it seems to work. Try push and patches coming shortly.
Assignee | ||
Comment 26•8 years ago
|
||
Try push including this patch at https://treeherder.mozilla.org/#/jobs?repo=try&revision=05e96ebb3278
Attachment #8743931 -
Flags: review?(matt.woodrow)
Updated•8 years ago
|
Priority: -- → P1
Assignee | ||
Comment 27•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #16) > The threading model on this code confuses me a fair bit. It looks like the > only use of mWidget within ChromeProcessController is to access views and > pres shell, which would only be safe on the main thread. > SetControllerForLayerTree explicitly posts the messages to the compositor > thread though, and sIndirectLayerTrees is a structure usually used by the > compositor for associating child process content with the chrome process > window. Also for completeness I looked over this code and I think it's fine. You're right that mWidget can only be used on the main thread, and I think all the functions in ChromeProcessController that touch the mWidget do so from the main thread. The ChromeProcessController itself does get passed around and touched on the compositor thread as well though.
Updated•8 years ago
|
Attachment #8743931 -
Flags: review?(matt.woodrow) → review+
Comment 28•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27) > Also for completeness I looked over this code and I think it's fine. You're > right that mWidget can only be used on the main thread, and I think all the > functions in ChromeProcessController that touch the mWidget do so from the > main thread. The ChromeProcessController itself does get passed around and > touched on the compositor thread as well though. That's good to know. It would be nice if this were structured in a way that this was obvious, as it seems like we could easily add a bug in the future here. That can go on my wish list though :)
Assignee | ||
Comment 29•8 years ago
|
||
Looks like on OS X the patch causes an assertion failure in some tests. I can repro locally, investigating.
Assignee | ||
Comment 30•8 years ago
|
||
I'm not entirely sure why the assertion failure is happening, and I don't know that code too well. As best I can tell the early shutdown of the compositor/layermanager on the parent side triggers a premature shutdown on the client side as well? At this point the ShadowLayerForwarder hasn't finished emptying out its shmems and so we get the Assertion failure: mUsedShmems.empty() in ShadowLayers.cpp. I'm not sure why this doesn't already happen on Linux/Windows, maybe this is some Mac-specific gfx code. I'm going to try to sidestep this entirely and go with a simpler fix that is similar to comment 22 which carries less risk. Try push with that at https://treeherder.mozilla.org/#/jobs?repo=try&revision=09c77d899151
Assignee | ||
Comment 31•8 years ago
|
||
Attachment #8743931 -
Attachment is obsolete: true
Attachment #8744367 -
Flags: review?(botond)
Comment 32•8 years ago
|
||
Comment on attachment 8744367 [details] [diff] [review] Patch v2 Review of attachment 8744367 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/nsBaseWidget.h @@ +523,5 @@ > RefPtr<CompositorBridgeChild> mCompositorBridgeChild; > RefPtr<CompositorBridgeParent> mCompositorBridgeParent; > RefPtr<mozilla::CompositorVsyncDispatcher> mCompositorVsyncDispatcher; > RefPtr<APZCTreeManager> mAPZC; > + RefPtr<GeckoContentController> mGeckoContentController; Can we call this mRootGeckoContentController, to reflect the fact that this is the GeckoContentController for the root layer tree ID?
Attachment #8744367 -
Flags: review?(botond) → review+
Assignee | ||
Comment 33•8 years ago
|
||
Based on IRC discussion I renamed it to mRootContentController. The try push also revealed a missing null check which I added (again, got r+ over IRC). Latest try push is at https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c60a5753876&group_state=expanded&exclusion_profile=false - still going, but looking green on all the tests that were blowing up before. I'm going to go ahead and land this.
Comment 35•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3d22a2e2a594
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment 36•8 years ago
|
||
Is this something you feel we can uplift to 47 at some point?
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 37•8 years ago
|
||
Yeah. I'd like to have somebody (bz?) verify it first before requesting uplift. It's in today's nightly (Apr 25).
Flags: needinfo?(bugmail.mozilla) → needinfo?(bzbarsky)
Comment 38•8 years ago
|
||
This appears to have angered OSX debug on Ash as well. https://treeherder.mozilla.org/#/jobs?repo=ash&filter-searchStr=osx%20debug
Assignee | ||
Comment 39•8 years ago
|
||
That's the same failure mode I was seeing in the try push in comment 26; the new patch didn't seem to have that problem. Weird. I don't know what to do about that - maybe file a new bug, and Nical can look into it? It's in code that I'm not really familiar with and might be a pre-existing issue exposed by this fix.
Assignee | ||
Comment 40•8 years ago
|
||
(Mid-aired, thanks for filing the bug)
Reporter | ||
Comment 41•8 years ago
|
||
Will check once I update to a build with the fix. I thought I had, but hadn't realized the lag between inbound and m-c in this case. :( Sadly, updating is a 10-20 minute process that interrupts the world, so I don't do it every day.
Comment 42•8 years ago
|
||
Hello, would it be possible to uplift this fix to 47.0beta? I think, that this fix could solve this Bug 1190170. I tested the last 46.0beta and I ended up with 1 GB per main process after 1 day of usage and this fix could solve the problem or at least mitigate it. Thank you for feedback.
Assignee | ||
Comment 43•8 years ago
|
||
Yeah I might as well request uplift while we wait for verification.
Assignee | ||
Comment 44•8 years ago
|
||
Comment on attachment 8744367 [details] [diff] [review] Patch v2 Approval Request Comment [Feature/regressing bug #]: APZ [User impact if declined]: opening a new window and closing it leaks some of the structures from that window. opening and closing many windows will accumulate these memory leaks, resulting in high memory usage counts. [Describe test coverage new/current, TreeHerder]: tested locally only, waiting for verification from somebody. We don't appear to have leak-checking in automation that can be run on a per-window basis; it just runs late during shutdown after the leaked objects have been released, so it didn't catch this problem. [Risks and why]: medium-ish risk. One risk factor is that it might expose other pre-existing bugs (e.g. bug 1268169, bug 1267246) that were masked by this problem. Another risk factor is that the code is a little fragile and not super well-understood. I did the best I could in making the patch low-risk, but more bake time will be better. Still, it's a high-reward patch for trains where APZ is enabled. [String/UUID change made/needed]: none
Attachment #8744367 -
Flags: approval-mozilla-beta?
Comment 45•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #43) > Yeah I might as well request uplift while we wait for verification. I tested Nightly 2016-04-25 and results were good. Uptime > 2 days and ~400MB per main process consumed.
Comment on attachment 8744367 [details] [diff] [review] Patch v2 Verified on Nightly, though this is a bit risky, the memleak is severe enough to uplift quicky (20% of beta users have e10s on by default), Beta47+ If we find regressions, we can consider backing it out and leaving things as is.
Attachment #8744367 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 47•8 years ago
|
||
This doesn't apply as cleanly to beta as I would like. The main code dependency that is bug 1215265, if we can get that uplifted it would make me happier. There's also bug 1263200 but that's a one-liner that's trivial to uplift. If we can't uplift bug 1215265 I'll probably need to retest the patch on a local beta build to make sure it's doing what it's supposed to.
Assignee | ||
Comment 48•8 years ago
|
||
Nical is reluctant to uplift bug 1215265 so I rebased around it and verified locally that the rebased patch fixes the leak. I also did a try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=5260a53ebba0 which looks ok. I'll land the rebased version shortly.
Assignee | ||
Comment 49•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/c70884434329
Comment 50•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #49) > https://hg.mozilla.org/releases/mozilla-beta/rev/c70884434329 Does it mean that the fix is part of 47.0beta? If yes, I'm looking forward to test it if it helps or not. Actually I have 1.3 GB per main process, which is really tough :-)
Comment 51•8 years ago
|
||
Should be in 47b2, yes.
Assignee | ||
Comment 52•8 years ago
|
||
Yeah, the next beta build should go out today or tomorrow I believe. Please do test it and let us know if it helps. Thanks!
Reporter | ||
Comment 53•8 years ago
|
||
I've been browsing with a nightly with this patch for some time now. heap-textures is holding steady at about 600MB (instead of ballooning to multiple GB), gfx-textures holding steady at about 9GB (instead of ballooning to 16-20GB). Haven't had the "run out of application memory" problem. Looks good. ;)
Flags: needinfo?(bzbarsky)
Comment 54•8 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #53) > I've been browsing with a nightly with this patch for some time now. > heap-textures is holding steady at about 600MB (instead of ballooning to > multiple GB), gfx-textures holding steady at about 9GB (instead of > ballooning to 16-20GB). Haven't had the "run out of application memory" > problem. Looks good. ;) Am I the only one who thinks 9GB of gfx-textures is still way too high? Or is this a caching thing and since we have the memory we just go for it?
You're not the only one :) Right now, some caching is being done per tab. We have work in progress to switch that to per window.
Comment 56•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #52) > Yeah, the next beta build should go out today or tomorrow I believe. Please > do test it and let us know if it helps. Thanks! Finally I'm happy with 47.0b2. This uplift of your fix really, really helped. I have uptime more than 3 days and spent only ~325 MB per main process. It is really GREAT. Thank you.
Assignee | ||
Comment 58•8 years ago
|
||
Filed https://github.com/mozilla/areweslimyet/issues/125 for what I think is a way to prevent this from happening again.
You need to log in
before you can comment on or make changes to this bug.
Description
•