Closed Bug 1283264 Opened 8 years ago Closed 7 years ago

browser.xul leaked when tab is dragged to new window, but is free'd in an AllTraces CC

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1336467

People

(Reporter: bkelly, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P1])

I see some strange behavior here.  Its like the leak in bug 1277358, but the window is cleaned up after exporting verbose CC logs.  STR:

1) Open fresh nightly browser.  I am using 50.0a1 (2016-06-29)
2) Open an about:memory tab
3) Open a content tab (I used mozilla.org)
4) Drag content tab out to new window and then back into the original window
5) In about:memory minimize/measure
6) Observe the detached browser.xul window
7) Repeat steps 5 and 6 a few times
8) Save verbose CC logs
9) Minimze/measure in about:memory
10) Observe that the detached browser.xul window is gone

In IRC Andrew suggested this might mean we have a cycle collector optimization that is too aggressive and missing something.
Olli, Andrew, can one of you investigate this?
Flags: needinfo?(continuation)
Flags: needinfo?(bugs)
Saving concise logs does not allow the window to be cleaned up.  Only verbose logs.
See Also: → 1283262
Also note, find_roots.py could not find a path for the window.
Yeah, I'm looking at it.

I tried getting a concise log before step 8, and the leaking nsGlobalWindow does not show up in it. So much for my hope that this would be easy to investigate.
Flags: needinfo?(continuation)
Flags: needinfo?(bugs)
Assignee: nobody → continuation
(In reply to Ben Kelly [:bkelly] from comment #3)
> Also note, find_roots.py could not find a path for the window.

This is because the verbose CC frees the window. It shows up as garbage.
If I had to guess, this might be a regression from bug 741760, which is the only CC optimization that I'm aware of that has landed recently.
Reverting bug 741760 does not fix this leak, so I apologize for blaming it. :)
I went through the CC and deleted code so that it basically treated every call to CanSkipInCC and CanSkipThis as returning false. This did not fix the leak, but it at least made it so that the concise CC log shows the leaking browser.xul in the graph. The window is leaking due to missing references to the browser.xul nsGlobalWindow. I compared the references from the concise and verbose logs to find out which references were missing.

The three references are:

1. The JS reflector for the Window. This has been marked black. It is possible that there is some "black bit propagation" that is causing this to be marked black incorrectly.

2. An AnimationTimeline. The sole reference to this in the verbose log is from a nsDocument for browser.xul. The nsDocument is present in the concise log, but does not have any of its fields traced. I would guess that it returned NS_SUCCESS_INTERRUPTED_TRAVERSE.

3. An nsDOMMutationObserver. It looks like this is being held alive by its reflector, which in turn is held alive by the window reflector (through some TabsInTitlebar property), so I think this is really a variant of 1.
Disabling the nsINode::Traverse() optimization removes (2) as expected.
I removed the unmark gray stuff from TraceBlackJS, and made ForgetSkippable not do anything, but somehow the JS reflector for the window is still being marked black. I feel like I've disabled every CC optimization at this point, but maybe I've missed something.
I'm out of ideas for the moment. Maybe I'll think of something next week.
Summary: browser.xul leaked when tab is dragged to new window, but is free'd after CC logs are saved → browser.xul leaked when tab is dragged to new window, but is free'd in an AllTraces CC
Blocks: 1286008
I believe that people checked that these steps didn't leak after bug 1277358 landed, so the regression would have to be in between the landing of that bug and when this bug was filed.
This may not actually be a regression. I tried a 6-4 build in MozRegression, a day after bug 1277358 landed, and still hit this issue.
No longer blocks: 1286008
P1 for now, lets see how bad this actually is.
Whiteboard: [MemShrink] → [MemShrink:P1]
Can we manually bisect this applying the patch from bug 1277358 each step?  Seems like we should try to avoid letting this merge to aurora.
Flags: needinfo?(continuation)
(In reply to Ben Kelly [:bkelly] from comment #15)
> Can we manually bisect this applying the patch from bug 1277358 each step? 

Why do you think this is a regression? Are these the same steps to reproduce as bug 1277358? I didn't follow all of the details of these two bugs.
Flags: needinfo?(continuation)
Because we uplifted bug 1277358 to Aurora and this all traces thing does not reproduce there last I checked.  The window just doesn't show as attached.  So something must have changed in 50.
Flags: needinfo?(continuation)
I'm going to start the manual bisect.  See if I can make any headway.  I'll post where I end up here.
Flags: needinfo?(continuation)
Ok.  I was wrong.  I just proved to myself this reproduces on every branch where bug 1277358 was uplifted.
I put a breakpoint in nsGlobalWindow::Release conditional on an inner window without any outer window.  Most of the Release calls were from CC directly, but I did see Releases from two other pieces:

>	xul.dll!nsGlobalWindow::Release() Line 1796	C++
 	xul.dll!nsCOMPtr_base::assign_with_AddRef(nsISupports * aRawPtr) Line 53	C++
 	xul.dll!nsDOMMutationObserver::cycleCollection::Unlink(void * p) Line 493	C++
 	xul.dll!nsCycleCollector::CollectWhite() Line 3339	C++
 	xul.dll!nsCycleCollector::Collect(ccType aCCType, js::SliceBudget & aBudget, nsICycleCollectorListener * aManualListener, bool aPreferShorterSlices) Line 3689	C++
 	xul.dll!nsCycleCollector_collect(nsICycleCollectorListener * aManualListener) Line 4161	C++
 	xul.dll!nsJSContext::CycleCollectNow(nsICycleCollectorListener * aListener, int aExtraForgetSkippableCalls) Line 1437	C++
 	xul.dll!nsMemoryInfoDumper::DumpGCAndCCLogsToFile(const nsAString_internal & aIdentifier, bool aDumpAllTraces, bool aDumpChildProcesses, nsIDumpGCAndCCLogsCallback * aCallback) Line 401	

And:

>	xul.dll!nsGlobalWindow::Release() Line 1796	C++
 	xul.dll!nsCOMPtr_base::assign_with_AddRef(nsISupports * aRawPtr) Line 53	C++
 	xul.dll!mozilla::dom::AnimationTimeline::cycleCollection::Unlink(void * p) Line 18	C++
 	xul.dll!mozilla::dom::DocumentTimeline::cycleCollection::Unlink(void * p) Line 21	C++
 	xul.dll!nsCycleCollector::CollectWhite() Line 3339	C++
 	xul.dll!nsCycleCollector::Collect(ccType aCCType, js::SliceBudget & aBudget, nsICycleCollectorListener * aManualListener, bool aPreferShorterSlices) Line 3689	C++
 	xul.dll!nsCycleCollector_collect(nsICycleCollectorListener * aManualListener) Line 4161	C++
 	xul.dll!nsJSContext::CycleCollectNow(nsICycleCollectorListener * aListener, int aExtraForgetSkippableCalls) Line 1437	C++
 	xul.dll!nsMemoryInfoDumper::DumpGCAndCCLogsToFile(const nsAString_internal & aIdentifier, bool aDumpAllTraces, bool aDumpChildProcesses, nsIDumpGCAndCCLogsCallback * aCallback) Line 401	C++
The problem isn't in refcounting, it is in GC marking. The question is, what marks the window's JS reflector black.

Terrence, is there any way to figure out what is marking a particular JS object black? (This is going to happen over and over again, so rr shouldn't be necessary.)
Flags: needinfo?(terrence)
You should be able to set a breakpoint in markIfUnmarked in js/src/gc/Heap.h, either in ChunkBitmap or TenuredCell, conditional on the address and color. This will most likely be under processMarkStackTop, which will likely be from drainMarkStack, so it may not be immediately clear what the proximate cause is. For that you need to walk backwards using rr in processMarkStackTop. It's pretty easy to unravel the object chain at that point to see what the liveness looks like. Of course this is pretty labor intensive.

A possibly better alternative (note: I have not yet had the opportunity to try this) is to use ubi::ShortestPath. I think you should be able to make it conditional on an address stored somewhere and poke the address in via gdb once you know what you're looking for.
Flags: needinfo?(terrence)
I'm out of ideas here, unfortunately.
Assignee: continuation → nobody
Hi Olli, could you please take a look here?
Flags: needinfo?(bugs)
Priority: -- → P3
I have a new approach to looking at these, so I'll see if I can figure something out.
Assignee: nobody → continuation
Flags: needinfo?(bugs)
Using sfink's patches from bug 1337072, I managed to get a stack from what was causing the mark bits on the leaking window reflector to turn black, and it was happening under CycleCollectedJSContext::FixWeakMappingGrayBits. It turns out that jonco has been looking at this code recently in bug 1336467, and in fact his patch fixes the leak.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Blocks: GhostWindows
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.