Open Bug 1360280 Opened 7 years ago Updated 2 years ago

Ghost windows in current nightly with profiler enabled

Categories

(Core :: Gecko Profiler, defect, P3)

defect

Tracking

()

REOPENED
Performance Impact none
Tracking Status
firefox55 --- affected

People

(Reporter: mossop, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, perf, Whiteboard: [MemShrink:P3])

Attachments

(1 file)

Attached file memory-report.json.gz
I installed today's nightly this morning and since then I've accumulated some 100 ghost windows across my content processes. I do have 1password installed so I will try running with that disabled and see if they still accumulate.
Flags: needinfo?(continuation)
Blocks: GhostWindows
Flags: needinfo?(continuation)
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink][qf]
mossop says he has a touch screen on this machine.

dbaron was also seen some ghost windows on an updated build, but he said they went away eventually.
Still seeing this without 1password enabled
That touch handling related leak was really one of case, so I would be surprised if anything related to that shows up. That fix should be in 04-25 build I think.
Seems like I can easily reproduce by just visiting youtube.
(In reply to Olli Pettay [:smaug] from comment #3)
> That touch handling related leak was really one of case, so I would be
> surprised if anything related to that shows up. That fix should be in 04-25
> build I think.

I'm on the windows 64-bit build from 04-27
This is at least partially caused by the gecko profiler add-on. Without it enabled youtube does not ghost for me. With it it ghosts everytime. I say partly because even with the profiler add-on disabled I am still seeing one other site ghost almost immediately after startup.
Markus, maybe you could look at the profiler issues? It could be related to bug 1355566.
Flags: needinfo?(mstange)
Component: DOM → Gecko Profiler
Summary: Ghost windows in current nightly → Ghost windows in current nightly with profiler enabled
See Also: → 1360310
Running "minimize memory usage" in about:memory seems to free up the ghost window for me, at least with YouTube. In fact, clicking on the CC button a single time seems to be enough to eventually free the window. One time it went away on its own. I didn't get anything useful out of a CC log.
Markus said in bug 1355566 that the profiler holds onto JS functions. That would certainly cause leaks like this, if those are strong references. I don't understand enough about how JIT stuff works to know what the profiler is doing.
See Also: → 1355566
(In reply to Andrew McCreight [:mccr8] from comment #8)
> Running "minimize memory usage" in about:memory seems to free up the ghost
> window for me, at least with YouTube. In fact, clicking on the CC button a
> single time seems to be enough to eventually free the window. One time it
> went away on its own. I didn't get anything useful out of a CC log.

I was running GC, CC and Minimize memory usage multiple times and still seeing the ghosts.
One way to fix this might be to store any JS references separated out by compartment, then make WindowDestroyedEvent::Run() call into the profiler with a compartment to tell it to dump its references to that compartment.
But if you then capture a profile, we won't be able to look up the function names for any JS functions of that compartment, and your stacks in the profiler might be incomplete.
Flags: needinfo?(mstange)
Sure. Maybe you could detect when there's been a window destroyed event for a compartment, and also no JS has run in that compartment within the current profiler window, and then purge the references? (I don't know exactly how the profiler works.) We should no be running JS in a window that has been closed, under usual circumstances, so it would give the function a bounded lifetime.
We already release the references once the buffer has wrapped around.

I'm not actually sure if the profiler references keep the compartment alive, or only certain data about the JS functions. We might be dealing with a separate leak here that isn't related to the JS function references at all.
Maybe Kannan has ideas.
Flags: needinfo?(kvijayan)
The way it's supposed to work: the JitcodeGlobalTable stores jitcode and associated script info. It holds strong refs to jitcode while it may be needed, and then weakens the refs when it knows it doesn't need them anymore. This is done with a read barrier[1] that updates a generation number on anything that gets sampled, and then during sweeping, everything that didn't get sampled has its generation reset to 'invalid'.

When marking, jitcode from recent generations gets marked strongly[2], and older stuff gets marked weakly[3] (and then discarded during sweeping if the jitcode was otherwise dead.)

[1] http://searchfox.org/mozilla-central/source/js/src/jit/JitcodeMap.cpp#443
[2] http://searchfox.org/mozilla-central/source/js/src/jit/JitcodeMap.cpp#830
[3] http://searchfox.org/mozilla-central/source/js/src/jit/JitcodeMap.cpp#821
Oh, but in answer to the original question: jitcode keeps its compartment alive, and thus everything reachable from the compartment's global. (Er, I guess compartment => global is not strictly necessary, but if jitcode doesn't keep the global alive, I'm sure it can hang onto something else that does.) So yes, if we were leaking JitcodeGlobalTable entries, I could see it producing ghost windows.

If you have a large sampling window, I could definitely imagine that the JitcodeGlobalTable would appear to keep dead things alive, since you'd have to wait for things to age out. I'm not sure how long (in time) the sampler window typically is?

I'm not sure how to tighten this up. When a window dies, we might have *just* sampled jitcode from it. So we have to leave its generation as-is; we know that it won't get sampled in a later generation, but that knowledge doesn't help expire it any earlier since we're already doing it based on actual last sample times.

But I'm wondering if this is more a matter of cluttering up the testing procedure? With the profiler running, things would *appear* to leak windows if you were checking for them immediately after. I could imagine:

1. doing something to try to reproduce the issue
2. quickly going to about:memory
3. forcing a GC. Or two. Or three.
4. measuring and seeing ghost windows.

and maybe even repeating number 3, but still before the sampling buffer runs out.

(Not that I'm confident there isn't a bug here.)
Some interesting things I'm seeing here.

The ghost window doesn't show up immediately. After closing youtube I have to wait a while before it appears.

Stopping the profiler and discarding the profile and then GC/CC/Minimizing makes the ghost go away.

I set the profiler to gather the minimum amount of data (90kb), visited youtube to get the ghost to appear then waited around five minutes and the ghost went away after a CC/GC/Minimize.

So I think we're only seeing it ghost while it is in the profile buffer which makes this probably not much of a problem. Ideally though we wouldn't show these as ghosts since it makes it really difficult to see if there are any real ghost windows in memory.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: [MemShrink][qf] → [MemShrink:P3][qf]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
There were a few questions that came up in the memshrink meeting:

#1 - Is it possible this is affecting CC/GC perf (either positive or negative) by holding windows alive? That might give us misleading profiles.

#2 - Is there some way we can have the ghost window reporter ask the profiler what windows its holding open? Then we'd be able to at least not scare about:memory users.
#1 most probably causes at least higher GC times (whenever we're doing full GC), and probably also affects to CC, so yes, it could lead to a bit misleading profiles.
Out of curiosity, why was this bug reopened?
Flags: needinfo?(continuation)
(In reply to Mike Conley (:mconley) from comment #20)
> Out of curiosity, why was this bug reopened?

erahm accidentally closed it.

(In reply to Olli Pettay [:smaug] from comment #19)
> #1 most probably causes at least higher GC times (whenever we're doing full
> GC), and probably also affects to CC, so yes, it could lead to a bit
> misleading profiles.

The fun thing here is that it could actually cause BETTER CC times, because I think it roots the scripts, so more stuff could get marked black.
Flags: needinfo?(continuation)
Ehsan said it's unclear this is caused by our code so qf-.
Flags: needinfo?(kvijayan)
Whiteboard: [MemShrink:P3][qf] → [MemShrink:P3][qf-]
(In reply to Andrew Overholt [:overholt] from comment #22)
> Ehsan said it's unclear this is caused by our code so qf-.

This is almost certainly caused by the profiler leaking all pages within its circular buffer. I don't know that it is a problem per se.
(In reply to Andrew McCreight [:mccr8] from comment #23)
> (In reply to Andrew Overholt [:overholt] from comment #22)
> > Ehsan said it's unclear this is caused by our code so qf-.
> 
> This is almost certainly caused by the profiler leaking all pages within its
> circular buffer. I don't know that it is a problem per se.

It's certainly not a leak. It's just unhelpful if you're trying to see if there are ghost windows around and you happen to be running the debugger.
Keywords: perf
Performance Impact: --- → -
Whiteboard: [MemShrink:P3][qf-] → [MemShrink:P3]
Severity: normal → S4
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: