1360280 - Ghost windows in current nightly with profiler enabled

Reporter

Description

•

7 years ago

Attached file memory-report.json.gz — Details

I installed today's nightly this morning and since then I've accumulated some 100 ghost windows across my content processes. I do have 1password installed so I will try running with that disabled and see if they still accumulate.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Blocks: GhostWindows

Flags: needinfo?(continuation)

Ben Kelly [:bkelly, not reviewing]

Updated

•

7 years ago

Whiteboard: [MemShrink]

Dave Townsend [:mossop]

Reporter

Updated

•

7 years ago

Whiteboard: [MemShrink] → [MemShrink][qf]

Andrew McCreight [:mccr8]

Comment 1

•

7 years ago

mossop says he has a touch screen on this machine.

dbaron was also seen some ghost windows on an updated build, but he said they went away eventually.

Dave Townsend [:mossop]

Reporter

Comment 2

•

7 years ago

Still seeing this without 1password enabled

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 3

•

7 years ago

That touch handling related leak was really one of case, so I would be surprised if anything related to that shows up. That fix should be in 04-25 build I think.

Dave Townsend [:mossop]

Reporter

Comment 4

•

7 years ago

Seems like I can easily reproduce by just visiting youtube.

Dave Townsend [:mossop]

Reporter

Comment 5

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #3)
> That touch handling related leak was really one of case, so I would be
> surprised if anything related to that shows up. That fix should be in 04-25
> build I think.

I'm on the windows 64-bit build from 04-27

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox55: --- → affected

Keywords: mlk

Dave Townsend [:mossop]

Reporter

Comment 6

•

7 years ago

This is at least partially caused by the gecko profiler add-on. Without it enabled youtube does not ghost for me. With it it ghosts everytime. I say partly because even with the profiler add-on disabled I am still seeing one other site ghost almost immediately after startup.

Andrew McCreight [:mccr8]

Comment 7

•

7 years ago

Markus, maybe you could look at the profiler issues? It could be related to bug 1355566.

Flags: needinfo?(mstange)

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Component: DOM → Gecko Profiler

Summary: Ghost windows in current nightly → Ghost windows in current nightly with profiler enabled

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Comment 8

•

7 years ago

Running "minimize memory usage" in about:memory seems to free up the ghost window for me, at least with YouTube. In fact, clicking on the CC button a single time seems to be enough to eventually free the window. One time it went away on its own. I didn't get anything useful out of a CC log.

Andrew McCreight [:mccr8]

Comment 9

•

7 years ago

Markus said in bug 1355566 that the profiler holds onto JS functions. That would certainly cause leaks like this, if those are strong references. I don't understand enough about how JIT stuff works to know what the profiler is doing.

Comment 10

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #8)
> Running "minimize memory usage" in about:memory seems to free up the ghost
> window for me, at least with YouTube. In fact, clicking on the CC button a
> single time seems to be enough to eventually free the window. One time it
> went away on its own. I didn't get anything useful out of a CC log.

I was running GC, CC and Minimize memory usage multiple times and still seeing the ghosts.

Andrew McCreight [:mccr8]

Comment 11

•

7 years ago

One way to fix this might be to store any JS references separated out by compartment, then make WindowDestroyedEvent::Run() call into the profiler with a compartment to tell it to dump its references to that compartment.

Markus Stange [:mstange]

Comment 12

•

7 years ago

But if you then capture a profile, we won't be able to look up the function names for any JS functions of that compartment, and your stacks in the profiler might be incomplete.

Flags: needinfo?(mstange)

Andrew McCreight [:mccr8]

Comment 13

•

7 years ago

Sure. Maybe you could detect when there's been a window destroyed event for a compartment, and also no JS has run in that compartment within the current profiler window, and then purge the references? (I don't know exactly how the profiler works.) We should no be running JS in a window that has been closed, under usual circumstances, so it would give the function a bounded lifetime.

Markus Stange [:mstange]

Comment 14

•

7 years ago

We already release the references once the buffer has wrapped around.

I'm not actually sure if the profiler references keep the compartment alive, or only certain data about the JS functions. We might be dealing with a separate leak here that isn't related to the JS function references at all.
Maybe Kannan has ideas.

Flags: needinfo?(kvijayan)

Steve Fink [:sfink] [:s:]

Comment 15

•

7 years ago

The way it's supposed to work: the JitcodeGlobalTable stores jitcode and associated script info. It holds strong refs to jitcode while it may be needed, and then weakens the refs when it knows it doesn't need them anymore. This is done with a read barrier[1] that updates a generation number on anything that gets sampled, and then during sweeping, everything that didn't get sampled has its generation reset to 'invalid'.

When marking, jitcode from recent generations gets marked strongly[2], and older stuff gets marked weakly[3] (and then discarded during sweeping if the jitcode was otherwise dead.)

[1] http://searchfox.org/mozilla-central/source/js/src/jit/JitcodeMap.cpp#443
[2] http://searchfox.org/mozilla-central/source/js/src/jit/JitcodeMap.cpp#830
[3] http://searchfox.org/mozilla-central/source/js/src/jit/JitcodeMap.cpp#821

Steve Fink [:sfink] [:s:]

Comment 16

•

7 years ago

Oh, but in answer to the original question: jitcode keeps its compartment alive, and thus everything reachable from the compartment's global. (Er, I guess compartment => global is not strictly necessary, but if jitcode doesn't keep the global alive, I'm sure it can hang onto something else that does.) So yes, if we were leaking JitcodeGlobalTable entries, I could see it producing ghost windows.

If you have a large sampling window, I could definitely imagine that the JitcodeGlobalTable would appear to keep dead things alive, since you'd have to wait for things to age out. I'm not sure how long (in time) the sampler window typically is?

I'm not sure how to tighten this up. When a window dies, we might have *just* sampled jitcode from it. So we have to leave its generation as-is; we know that it won't get sampled in a later generation, but that knowledge doesn't help expire it any earlier since we're already doing it based on actual last sample times.

But I'm wondering if this is more a matter of cluttering up the testing procedure? With the profiler running, things would *appear* to leak windows if you were checking for them immediately after. I could imagine:

1. doing something to try to reproduce the issue
2. quickly going to about:memory
3. forcing a GC. Or two. Or three.
4. measuring and seeing ghost windows.

and maybe even repeating number 3, but still before the sampling buffer runs out.

(Not that I'm confident there isn't a bug here.)

Dave Townsend [:mossop]

Reporter

Comment 17

•

7 years ago

Some interesting things I'm seeing here.

The ghost window doesn't show up immediately. After closing youtube I have to wait a while before it appears.

Stopping the profiler and discarding the profile and then GC/CC/Minimizing makes the ghost go away.

I set the profiler to gather the minimum amount of data (90kb), visited youtube to get the ghost to appear then waited around five minutes and the ghost went away after a CC/GC/Minimize.

So I think we're only seeing it ghost while it is in the profile buffer which makes this probably not much of a problem. Ideally though we wouldn't show these as ghosts since it makes it really difficult to see if there are any real ghost windows in memory.

Eric Rahm [:erahm]

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Whiteboard: [MemShrink][qf] → [MemShrink:P3][qf]

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Eric Rahm [:erahm]

Comment 18

•

7 years ago

There were a few questions that came up in the memshrink meeting:

#1 - Is it possible this is affecting CC/GC perf (either positive or negative) by holding windows alive? That might give us misleading profiles.

#2 - Is there some way we can have the ghost window reporter ask the profiler what windows its holding open? Then we'd be able to at least not scare about:memory users.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 19

•

7 years ago

#1 most probably causes at least higher GC times (whenever we're doing full GC), and probably also affects to CC, so yes, it could lead to a bit misleading profiles.

Mike Conley (:mconley) (:⚙️)

Comment 20

•

7 years ago

Out of curiosity, why was this bug reopened?

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 21

•

7 years ago

(In reply to Mike Conley (:mconley) from comment #20)
> Out of curiosity, why was this bug reopened?

erahm accidentally closed it.

(In reply to Olli Pettay [:smaug] from comment #19)
> #1 most probably causes at least higher GC times (whenever we're doing full
> GC), and probably also affects to CC, so yes, it could lead to a bit
> misleading profiles.

The fun thing here is that it could actually cause BETTER CC times, because I think it roots the scripts, so more stuff could get marked black.

Flags: needinfo?(continuation)

Andrew Overholt [:overholt]

Comment 22

•

7 years ago

Ehsan said it's unclear this is caused by our code so qf-.

Flags: needinfo?(kvijayan)

Whiteboard: [MemShrink:P3][qf] → [MemShrink:P3][qf-]

Andrew McCreight [:mccr8]

Comment 23

•

7 years ago

(In reply to Andrew Overholt [:overholt] from comment #22)
> Ehsan said it's unclear this is caused by our code so qf-.

This is almost certainly caused by the profiler leaking all pages within its circular buffer. I don't know that it is a problem per se.

Dave Townsend [:mossop]

Reporter

Comment 24

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #23)
> (In reply to Andrew Overholt [:overholt] from comment #22)
> > Ehsan said it's unclear this is caused by our code so qf-.
> 
> This is almost certainly caused by the profiler leaking all pages within its
> circular buffer. I don't know that it is a problem per se.

It's certainly not a leak. It's just unhelpful if you're trying to see if there are ghost windows around and you happen to be running the debugger.

Firefox Bug Husbandry Bot

Updated

•

7 years ago

Keywords: perf

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → -

Whiteboard: [MemShrink:P3][qf-] → [MemShrink:P3]

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Updated

•

2 years ago

Severity: normal → S4

Priority: -- → P3