Open Bug 1414938 Opened 7 years ago Updated 2 years ago

Spike in memory use when navigating away from a page that uses a lot of memory

Categories

(Core :: XPCOM, defect, P3)

55 Branch
defect

Tracking

()

Performance Impact low
Tracking Status
firefox-esr52 --- ?
firefox57 --- ?
firefox58 --- affected
firefox59 --- affected

People

(Reporter: jesup, Unassigned)

References

(Depends on 1 open bug, )

Details

(Keywords: perf, Whiteboard: [MemShrink:P3])

Unsure if this is a regression; could be tested for

If you navigate to a page that leaks memory (likely still referenced by something on the page - bug in the page), you can see that the process memory use climbs by  O(n) to eventually (a minute or two later) free n bytes of memory that had been used by the page.

Seen on a number of sites using nightly 58 and beta 57.  Not tested on any other version.

Quite repeatable (at the moment) on Scientific American (almost any article page).  The leaks there appear to be in ads (you have to scroll down and expose them to start the leak).
Whiteboard: [MemShrink]
Randell, as a preliminary: what platform is this, and what are you using to measure process memory?
Flags: needinfo?(rjesup)
Windows 10.  64-bit (beta, most of the tests); 32-bit (Nightly).  Watching memory use with Process Explorer, and double-clicked on the Content process so I could watch the memory use graph.  Also, when it leaks/spikes like this it can dog the entire machine down (Desktop AMD Phenom II X4 965 3.4GHz (4 core), 12GB ram, NVidia GTX 950).
Flags: needinfo?(rjesup)
I've seen the same happen on Linux, fwiw.
I haven't been able to reproduce this and I'm unsure how much memory use counts as a spike, however I think this is unlikely the be the GC itself.  The GC takes pains not to allocate memory while it is collecting, for obvious reasons.  

I think this is more likely to be the graph building phase of CC.

Andrew, do you think this is plausible?  I can't think of a way to tell one way or another off the top of my head.
Flags: needinfo?(continuation)
Priority: -- → P1
I'm still confused about what type of memory we're talking about here. Is this committed, resident, or what? Because the most natural reason for memory usage to climb during a collection is because we walk through a bunch of it, so if you're looking at the resident size it makes sense for it to page things back in.

Do you have VMMap from SysInternals? It seems to break memory usage down in excruciating detail. But maybe process explorer is good enough; I don't have a way to try it out right now to see what it shows.
This is Windows - Process Explorer v16.02 from sysinternals.com

The measurement I'm watching is "Private Bytes" (it the main UI, and also in graph form by double-clicking on a Content process) == so it's not Resident.
Whiteboard: [MemShrink] → [MemShrink:P3]
I don't see why the CC is necessarily the cause of this. When you navigate to a new page, it is expected that memory usage will go up, then go down. We trigger a GC explicitly 5 seconds after page load starts for this reason.
Flags: needinfo?(continuation)
You don't need to actually navigate away for this to happen. I've had it happen when closing tabs.
(In reply to Andrew McCreight [:mccr8] from comment #7)
Agreed.  I am reading this bug as a report that there is memory increase triggered by navigating away from a page and not attributable to the page itself allocating memory.  I may be wrong though, and I haven't been able to reproduce (on MacOS).

I'll try and reproduce on Windows when I'm back from PTO next week.
Flags: needinfo?(jcoppeard)
Right, anything that causes the page's memory to be GC'd/CC'd apparently.  Very clear pattern watching Process Explorer.  And the rise is proportional (at least roughly) to the size of the leak.  I've frequently seen closing a page or reloading it cause memory to increase by 1GB, drop back down to the approx orig value for 10 seconds, then suddenly drop another GB or so (below starting point), and sometimes a second large drop.  I've also seen a pattern with ramp up O(n/2), back to start, ramp up O(n/2), back to start, delay, drop O(n)
This generated a slowish leak: 400MB in a few hours (ok, not *that* slow):
https://www.washingtonpost.com/news/inspired-life/wp/2017/11/22/at-yale-we-conducted-an-experiment-to-turn-conservatives-into-liberals-the-results-say-a-lot-about-our-political-divisions/

I took a profile after closing the tab, and caught a strong spike in memory usage around 600MB over ~6 seconds, dropped back to original briefly (<1s), then dropped 440MB lower. (770MB -> 1.3GB -> ~770 -> ~330)

Here's the profile (on Firefox 58 beta 6): https://perfht.ml/2nkssll

The ramp up is the 6.6s event processing delay in the middle

CPU used was in TraversalTracer::onChild() and PLDHashTable::Add() (called from TraversalTracer) - together in direct CPU hits (inverted call stack) they amounted to 43% of the CPU use.  A lot of the time was in PLDHashTable::ChangeTable() calling jemalloc code.  But a lot is simply directly in Add... I suspect the table simply has So Many Entries that the Add() code spends all it's time reallocating and moving entries.  still, it's pretty amazing that there are so many apparently-small entries (since the memory use for the CC (basically the table) is on the some order as the memory freed).

The process-size increase appeared ~linear; likely due to the limiting factor being moving hash entries to new tables on each realloc (and perhaps(?) pages allocated not getting shown in Process Explorer until they're touched).  It took 6s/~600MB, but with larger leaks it takes much more time, and some pages will leak 1-5GB if left sitting.

At the end there's 600ms of nsCycleCollector::Collect(), mostly ScanRoots and CollectWhite (73ms directly in CollectWhite).

This is interesting, though reading it it seems like google::dense_hash is pretty darn good overall, though not the fastest at lookup.  https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/  and also interesting: https://tessil.github.io//other/hash_table_benchmark.html
Flags: needinfo?(jcoppeard) → needinfo?(mh+mozilla)
Whiteboard: [MemShrink:P3] → [MemShrink:P3][qf]
Flags: needinfo?(mh+mozilla)
Needinfo Andrew as this seems to be happening in the cycle collector based on the profile in comment 11.
Flags: needinfo?(continuation)
Yeah, unfortunately we don't handle this situation well. I'm not sure how common it is, or what we can do to mitigate it.
Component: JavaScript: GC → XPCOM
Flags: needinfo?(continuation)
Is there anything happening here aside from memory usage spiking after tab-events (navigation, new-tab, close-tab), due to the overhead of CC being opportunistically invoked at those times?

This sounds like a peak memory issue, not a leak issue.  If it is a peak memory issue, do we have info on whether it's causing OOMs, or thrashing on lowmem machines, and if so how much.. because otherwise it doesn't seem like a huge problem.
Flags: needinfo?(continuation)
Sounds like many different things are discussed here. (1) Closing a tab and (2) leaky web pages.
Let's focus on (1) here.

The hashtable cycle collector uses for the graph is performance sensitive, but perhaps we should try to figure out something less memory hungry.
Whiteboard: [MemShrink:P3][qf] → [MemShrink:P3][qf:f63][qf:p1]
Whiteboard: [MemShrink:P3][qf:f63][qf:p1] → [MemShrink:P3][qf:f63][qf:p3]
(In reply to Olli Pettay [:smaug] from comment #16)
> Sounds like many different things are discussed here. (1) Closing a tab and
> (2) leaky web pages.
> Let's focus on (1) here.
> 
> The hashtable cycle collector uses for the graph is performance sensitive,
> but perhaps we should try to figure out something less memory hungry.

libstdc++'s <unordered_map> includes code to decide whether to cache the hash for stored keys in the hashtable, or to recompute it every time.  I don't know whether this is the right tradeoff; libc++ appears to always cache the hash and I haven't looked at MSVC.  But in the context of the cycle collector's hashtables, maybe this would be the right tradeoff?  Re-computing the hash would be a small number of instructions on every probe into the hashtable, but not caching the hash would shrink the memory requirements of the cycle collector's hashtables by half.

This change would require a decent amount of re-engineering on PLDHashtable and friends, but maybe for such a core piece of infra, it would be worth it?
The CC creates a hash table containing everything that is potentially garbage. That's very large when there's a lot of garbage, like when you close a page or navigate, which apparently can cause performance problems. The hash table is freed when the CC is done, so it is peak memory usage. We don't have any data about how often it happens to people, aside from indirect information like telemetry about how long CC takes.
Flags: needinfo?(continuation)
(In reply to Nathan Froyd [:froydnj] from comment #17)
> libstdc++'s <unordered_map> includes code to decide whether to cache the
> hash for stored keys in the hashtable, or to recompute it every time.  I
> don't know whether this is the right tradeoff; libc++ appears to always
> cache the hash and I haven't looked at MSVC.  But in the context of the
> cycle collector's hashtables, maybe this would be the right tradeoff? 

You were looking at switching to robinhood hashing, I believe. Wouldn't you have to re-hash every cell while scanning for an insertion position?

Or on the flip side, if we switch to robinhood hashing (while still storing hashes), we could bump up the load factor to offset some of this cost.
(In reply to Steve Fink [:sfink] [:s:] from comment #19)
> (In reply to Nathan Froyd [:froydnj] from comment #17)
> > libstdc++'s <unordered_map> includes code to decide whether to cache the
> > hash for stored keys in the hashtable, or to recompute it every time.  I
> > don't know whether this is the right tradeoff; libc++ appears to always
> > cache the hash and I haven't looked at MSVC.  But in the context of the
> > cycle collector's hashtables, maybe this would be the right tradeoff? 
> 
> You were looking at switching to robinhood hashing, I believe. Wouldn't you
> have to re-hash every cell while scanning for an insertion position?

You would have to rehash every cell while scanning; this would apply to the robinhood hashtable in development or PLDHashTable as it now stands.  This sounds not so great, but maybe the extra instructions wouldn't "really" cost you anything; I'm not sure.

> Or on the flip side, if we switch to robinhood hashing (while still storing
> hashes), we could bump up the load factor to offset some of this cost.

The last time I tried bumping the load factor (increasing it to ~90% rather than whatever PLDHashTable's default is) with robinhood hashing, it did horrible things to the benchmarks I was running.  But it's entirely possible I was doing dumb things or the benchmarks were just non-representative; I didn't look very closely at what was going on.
Depends on: 1435292
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
Performance Impact: --- → P3
Whiteboard: [MemShrink:P3][qf:f63][qf:p3] → [MemShrink:P3]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.