Closed Bug 1266830 Opened 8 years ago Closed 6 years ago

ContentParent leaks when the content process crashes

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
e10s + ---
firefox48 --- affected

People

(Reporter: froydnj, Unassigned)

References

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P2] btpp-fixlater)

In Nightly after the content process has crashed several times, about:memory says, among other things:

0 (100.0%) -- queued-ipc-messages
├──0 (100.0%) ── content-parent(???, pid=-1, closed channel, 0x7f0b57fff828, refcnt=1)
├──0 (100.0%) ── content-parent(???, pid=-1, closed channel, 0x7f0b6adb3828, refcnt=4)
├──0 (100.0%) ── content-parent(???, pid=-1, closed channel, 0x7f0b7b882828, refcnt=1)
├──0 (100.0%) ── content-parent(???, pid=-1, closed channel, 0x7f0b87075828, refcnt=2)
├──0 (100.0%) ── content-parent(???, pid=-1, closed channel, 0x7f0b943eb828, refcnt=1)
├──0 (100.0%) ── content-parent(???, pid=-1, closed channel, 0x7f0b96b9a828, refcnt=3)
└──0 (100.0%) ── content-parent(???, pid=-1, closed channel, 0x7f0b9dbf9828, refcnt=2)

The description string for this:

http://dxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#518

suggests that we are leaking ContentParent instances.  I think this might be similar to bug 1188056, but this is with e10s switched on, and this is definitely not about thumbnailing processes--at least, I don't think it is.

Some of these might be related to killing the content process with SIGTERM (which I had to do to get my computer out of swapdeath).
I don't know how to categorize the urgency of this bug: super high, within a couple of months, backlog?
Flags: needinfo?(nfroyd)
Whiteboard: [MemShrink] → [MemShrink] btpp-followup-2016-04-29
This isn't too surprising. B2G had a ton of parent process leaks when the children got killed. I think it is less important for desktop, because killing content processes is not part of standard operation. How important this is depends on how often a user might experience a content process crash.
(In reply to Andrew Overholt [:overholt] from comment #1)
> I don't know how to categorize the urgency of this bug: super high, within a
> couple of months, backlog?

My sense is that this is somewhere between "couple of months" and "backlog", unless we have reason to think people are keeping sessions open but encountering multiple content process crashes per session, even after we work on addressing content process crashes.
Flags: needinfo?(nfroyd)
This might also become more urgent when we start doing multiple content processes, since that would (likely) increase the frequency of content process crashes (both from our bugs and from memory pressure).
Thanks.
Whiteboard: [MemShrink] btpp-followup-2016-04-29 → [MemShrink] btpp-fixlater
Blocks: e10s-multi
Priority: -- → P2
Whiteboard: [MemShrink] btpp-fixlater → [MemShrink:P2] btpp-fixlater
I have e10s-multi enabled (3) child processes and not long after startup I'm seeing at least one channel leaked without any noticeable child process crash, so this might also happen due to normal process lifecycles, not just crashes.

0 (100.0%) -- queued-ipc-messages
├──0 (100.0%) ── content-parent(Browser, pid=-1, closed channel, 0x1aa2c98f6f8, refcnt=1)
├──0 (100.0%) ── content-parent(Browser, pid=125732, open channel, 0x1a994cdf6f8, refcnt=149)
├──0 (100.0%) ── content-parent(Browser, pid=126856, open channel, 0x1a9a8fbb6f8, refcnt=123)
└──0 (100.0%) ── content-parent(Browser, pid=87424, open channel, 0x1a9e30e46f8, refcnt=149)
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
I tried reproducing this on Linux with `kill` (with both SIGTERM and SIGSEGV), and I can't reproduce it — I did sometimes see channels in queued-ipc-messages with pid=-1 and refcnt=1 immediately after killing content processes, but they didn't survive a CC (and they don't show up in a verbose CC log, either).

I also have dom.ipc.processCount set to 64 in this profile, so there should have been a few normally-terminated content processes from closing tabs as well.

If someone else can still reproduce this, a CC log for the parent process would be a good place to start.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.