Closed Bug 1300173 Opened 8 years ago Closed 2 years ago

facebook causes excessive fragmentation in jemalloc heap

Categories

(Core :: Memory Allocator, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
platform-rel --- -

People

(Reporter: bkelly, Unassigned)

References

Details

(Whiteboard: [MemShrink:P3][platform-rel-Facebook])

I've been running with dom.ipc.processCount set to a high value so each tab gets its own process.  This has shown that some sites tend to fragment the heap a lot more than others.

In particular I see this for a facebook.com tab that has mostly been sitting idle in the background for 6 hours:

181.39 MB (100.0%) -- explicit
├───59.18 MB (32.63%) -- heap-overhead
│   ├──55.09 MB (30.37%) ── bin-unused
│   ├───3.43 MB (01.89%) ── bookkeeping
│   └───0.66 MB (00.36%) ── page-cache
├───58.02 MB (31.98%) ++ window-objects/top(https://www.facebook.com/, id=17179869185)
├───29.78 MB (16.42%) ++ js-non-window
├───26.68 MB (14.71%) ── heap-unclassified
├────5.87 MB (03.24%) ++ (15 tiny)
└────1.86 MB (01.03%) ++ images

The bin-used is almost as large as the window itself!

In comparison, a fairly actively used gmail tab uses half the bin-used:

188.34 MB (100.0%) -- explicit
├──105.92 MB (56.24%) ++ window-objects/top(https://mail.google.com/mail/u/0/#label/dev-platform, id=4294967297)
├───34.54 MB (18.34%) ++ js-non-window
├───27.75 MB (14.73%) -- heap-overhead
│   ├──24.39 MB (12.95%) ── bin-unused
│   ├───2.62 MB (01.39%) ── bookkeeping
│   └───0.74 MB (00.39%) ── page-cache
├───14.47 MB (07.68%) ── heap-unclassified
└────5.66 MB (03.00%) ++ (16 tiny)

A site more like bugzilla has only about 9MB of bin-used:

80.79 MB (100.0%) -- explicit
├──34.72 MB (42.97%) ++ window-objects/top(https://bugzilla.mozilla.org/page.cgi?id=mydashboard.html, id=12884901889)
├──21.43 MB (26.52%) ++ js-non-window
├──11.29 MB (13.97%) -- heap-overhead
│  ├───9.33 MB (11.55%) ── bin-unused
│  ├───1.33 MB (01.65%) ── bookkeeping
│  └───0.62 MB (00.77%) ── page-cache
├───7.41 MB (09.17%) ── heap-unclassified
├───1.84 MB (02.28%) ++ (12 tiny)
├───1.21 MB (01.50%) ++ gfx
├───1.01 MB (01.26%) ++ atom-tables
├───1.01 MB (01.25%) ++ xpconnect
└───0.87 MB (01.07%) ── xpti-working-set
Facebook must be doing something periodically that spikes its memory and then drops back down to a quiescent state.  If it fragments a bit on each one of these cycles then bin-unused will keep growing.  This is not great for a site that tends to be left open for a long period of time.
I let facebook to run overnight.  With zero interaction that child process is now at:

222.91 MB (100.0%) -- explicit
├──120.79 MB (54.19%) -- heap-overhead
│  ├──114.96 MB (51.57%) ── bin-unused
│  ├────5.31 MB (02.38%) ── bookkeeping
│  └────0.52 MB (00.23%) ── page-cache
├───41.07 MB (18.42%) -- window-objects/top(https://www.facebook.com/, id=17179869185)

Heap overhead bin-used is almost 3x the window size itself.
Twitter is also starting to fragment:

166.41 MB (100.0%) -- explicit
├───51.18 MB (30.75%) -- heap-overhead
│   ├──45.93 MB (27.60%) ── bin-unused
│   ├───4.64 MB (02.79%) ── bookkeeping
│   └───0.61 MB (00.37%) ── page-cache
├───42.57 MB (25.58%) -- window-objects
│   ├──42.29 MB (25.41%) ++ top(https://twitter.com/, id=21474836483)

Seems like it might be a more general problem.

I wonder if these are getting large XHR chunks when they update after waking from sleep.  And those chunks are being kept around in heap-overhead.  Perhaps we need to be more aggressive about free'ing stuff like that when we have more child processes.
Refreshing facebook brought me back down to here:

130.50 MB (100.0%) -- explicit
├───45.91 MB (35.18%) -- heap-overhead
│   ├──41.56 MB (31.85%) ── bin-unused
│   ├───3.83 MB (02.94%) ── bookkeeping
│   └───0.52 MB (00.40%) ── page-cache
├───36.64 MB (28.08%) -- window-objects/top(https://www.facebook.com/, id=17179869185)

So bin-used went from ~115MB to ~42MB.  Window went from ~41MB to ~37MB.
IIRC somebody was talking to Facebook to help them improve their memory usage with Firefox. I'm not sure who it was and if they found anything actionable.
This doesn't feel like facebook's fault to me.  Their explicit memory usage is reasonable in all the cases I've seen here (although I have other cases where they get out of control).  Its just that whatever activity they are doing is causing a lot of fragmentation in the browser.  Sites should be able to use our APIs without triggering fragmentation ideally.
Ben, any chance you can run with DMD's cumulative mode [1] enabled? This might give us an idea where the fragmentation is coming from.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD#Cumulative_mode_output
Flags: needinfo?(bkelly)
Generally speaking, there's not much that can be done at the allocator level. Even tweaking the allocator strategy, or radically change it, if it can improve anything at all for that particular case, it will also undoubtedly cause the opposite effect in other particular cases. IOW, there is no silver bullet for memory fragmentation.

Some things that /may/ help are per-thread arenas (which has other memory implications, such as increasing the general memory footprint), or per-site dedicated arenas (which, at this point, is only a theoretical option: it requires switching to jemalloc4 first, then doing the work to actually use dedicated arenas).

But neither would likely help for this specific case, since it only involves one site.

Something that would definitely help is heap compaction: realloc objects (based on cycle collection traversal, I guess) in a way that fills the gaps. The malloc() interface is not really helpful to do these kind of things, though.

None of the above is a short-term solution.
(In reply to Mike Hommey [:glandium] from comment #8)
> Something that would definitely help is heap compaction: realloc objects
> (based on cycle collection traversal, I guess) in a way that fills the gaps.
> The malloc() interface is not really helpful to do these kind of things,
> though.

Blink recently starting doing this with their oilpan c++ GC implementation:

https://www.opera.com/blogs/desktop/2016/07/memory-usage-opera-heap-compaction/

The arcticle does not explicitly mention which heap is being compacted, but Rick Byers told me it was the c++ heap in person in July.
Flags: needinfo?(bkelly)
Resurrecting ni for comment 7.
Flags: needinfo?(bkelly)
Ben, what happens to the results if you run a memory minimization? If bin-unused shrinks significantly it means we're keeping around something that's not really needed and that's also pathological for the allocator (i.e. allocations that eat a little over half a page leaving the rest unused and unusable).
All of these memory reports are after minimizing memory.  I guess its possible that there is a bug with memory minimization and multiple child processes, but I think that works.
(In reply to Ben Kelly [:bkelly] from comment #12)
> All of these memory reports are after minimizing memory.  I guess its
> possible that there is a bug with memory minimization and multiple child
> processes, but I think that works.

It used to work, we've used it extensively in Firefox OS so unless it broke recently it should still work correctly.
Whiteboard: [MemShrink] → [MemShrink:P3]
Ben, would you mind getting full allocation logs for the scenarios from comment 0? This will require mozilla-inbound as of writing (with bug 1300948 and bug 1300974 fixed), and will produce *very* large output. See instructions in memory/replace/logalloc/README. Note that the gzip trick in the README can be very useful. You might want to use zstd instead of gzip, though, it's faster and should compress better. Without the compression trick, be prepared for gigabytes of logs. Then I guess put the (compressed) logs on people.m.o or somewhere else where I can download them. If you could also note what process ids the content processes for the various sites have, that would be awesome. Thanks.
Sorry I haven't had time to get DMD or allocation logs.  FWIW, I see non-trivial fragmentation with single content process config as well.  My nightly has been open for about 24 hours and the child process memory is 20% bin-unused:

895.80 MB (100.0%) -- explicit
├──412.86 MB (46.09%) ++ window-objects
├──220.42 MB (24.61%) -- heap-overhead
│  ├──202.17 MB (22.57%) ── bin-unused
│  ├───17.52 MB (01.96%) ── bookkeeping
│  └────0.72 MB (00.08%) ── page-cache
├──119.08 MB (13.29%) ++ js-non-window
├──104.66 MB (11.68%) ── heap-unclassified
├───24.73 MB (02.76%) ++ (16 tiny)
└───14.04 MB (01.57%) ++ workers

This is after closing a bunch of stuff and minimzing memory.
Interestingly, I closed every tab except for an example.com tab (to keep child process alive).  This did *not* release most of the fragmented memory:

253.02 MB (100.0%) -- explicit
├──156.41 MB (61.82%) -- heap-overhead
│  ├──143.15 MB (56.58%) ── bin-unused
│  ├───12.51 MB (04.95%) ── bookkeeping
│  └────0.75 MB (00.29%) ── page-cache
├───50.67 MB (20.03%) ── heap-unclassified
├───36.87 MB (14.57%) ++ js-non-window
├────5.92 MB (02.34%) -- (16 tiny)
│    ├──0.87 MB (00.34%) ── xpti-working-set
│    ├──0.79 MB (00.31%) ++ xpconnect
│    ├──0.70 MB (00.28%) ++ atom-tables
│    ├──0.65 MB (00.26%) ++ layout
│    ├──0.51 MB (00.20%) ++ images
│    ├──0.50 MB (00.20%) ── preferences
│    ├──0.45 MB (00.18%) ── icu
│    ├──0.42 MB (00.17%) ── telemetry
│    ├──0.36 MB (00.14%) ++ window-objects/top(http://example.com/, id=2147487503)
│    ├──0.36 MB (00.14%) ++ gfx
│    ├──0.28 MB (00.11%) ++ xpcom
│    ├──0.04 MB (00.01%) ── cycle-collector/collector-object
│    ├──0.01 MB (00.00%) ── script-namespace-manager
│    ├──0.00 MB (00.00%) ── history-links-hashtable
│    ├──0.00 MB (00.00%) ── network/effective-TLD-service
│    └──0.00 MB (00.00%) ── media/libogg
└────3.14 MB (01.24%) ++ dom

This suggests to me something in our chrome script or c++ code is contributing to this fragmentation.
Sorry, I don't expect to have time to investigate this any time soon.  Its pretty easy to reproduce if anyone else wants to look, though.
Flags: needinfo?(bkelly)
(In reply to Ben Kelly [:bkelly] from comment #17)
> Sorry, I don't expect to have time to investigate this any time soon.  Its
> pretty easy to reproduce if anyone else wants to look, though.

It's not. I created a facebook account just for this bug, and have failed to reproduce. But then, there's not much displayed on www.facebook.com for me. So it would be very helpful if you could do comment 14.
Flags: needinfo?(bkelly)
platform-rel: --- → ?
Whiteboard: [MemShrink:P3] → [MemShrink:P3][platform-rel-Facebook]
platform-rel: ? → -
Not sure if this is related, but for a couple of months now Firefox has been crashing every night for me.  I leave Facebook open in one tab as well as multiple other tabs/windows.  I tested closing the Facebook tab and Firefox did not crash.

Here is a link to one my crash reports with Signature: OOM | small:
https://crash-stats.mozilla.com/report/index/a8f41535-3662-414f-8304-9e3682161215

Recently I started getting this one with Signature: StatsCompartmentCallback:
https://crash-stats.mozilla.com/report/index/35f8d31f-878a-49bc-97b8-9885b2170105
Here's another crash from last night, when I left Facebook open in one tab (Signature: OOM | small):
https://crash-stats.mozilla.com/report/index/42cec66c-77df-40a9-927c-540812170110

This behavior (crashing when Facebook is left open) only started a couple of months ago.
Bill, you may be hitting virtual address fragmentation instead of real memory fragmentation.  This is a known cause of "small OOM" crashes.  You can probably mitigate these crashes by installing the 64-bit version of firefox:

https://download.mozilla.org/?product=firefox-50.1.0-SSL&os=win64&lang=en-US

Does that help you at all?
Flags: needinfo?(WPWoodJr+Bugzilla)
The 64 bit version won't run on our work PC.  Something to do with Symantec AV.  :(

However my Firefox has 2 processes and is running in less than 2gb in each process.  Currently 1.7gb and 1.3gb.  So I don't think virtual addresses are the issue?

Facebook seems to be stressing Firefox in a way that doesn't seem to affect Chrome, where I leave Facebook open for months on my Mac.
Flags: needinfo?(WPWoodJr+Bugzilla)
(In reply to Bill Wood from comment #22)
> However my Firefox has 2 processes and is running in less than 2gb in each
> process.  Currently 1.7gb and 1.3gb.  So I don't think virtual addresses are
> the issue?

Well, that pretty much describes the virtual address fragmentation issue.  The actual memory used is not great, but the address space where memory can be mapped has been fragmented.
 
> Facebook seems to be stressing Firefox in a way that doesn't seem to affect
> Chrome, where I leave Facebook open for months on my Mac.

Mac runs 64-bit which would explain why you don't see it there.
Flags: needinfo?(bkelly)
(In reply to Bill Wood from comment #22)
> The 64 bit version won't run on our work PC.  Something to do with Symantec
> AV.  :(

Bill, is Symantec AV reporting a problem with 64-bit Firefox? Can you share any error messages?

While unrelated to this jemalloc memory problem, this sounds bad. We plan to promote 64-bit Firefox to more users soon, so it would be good to know if there are compatibility problems with Symantec AV.
I would have to try 64 bit again.  Last time I tried in the summer, Firefox would not bring up a user interface.  This was in single process mode, and the Firefox process hung at about 10mb memory usage and did nothing. My theory was that Symantec was blocking a DLL from loading.

WRT Facebook, isn't this bug tracking virtual memory fragmentation?  Or should I report somewhere else?
Restoring the needinfo that was removed along comment 23.
Flags: needinfo?(bkelly)
(In reply to Bill Wood from comment #25)
> WRT Facebook, isn't this bug tracking virtual memory fragmentation?  Or
> should I report somewhere else?

No, this bug is about real memory fragmentation.  Its related, but a bit different.  You would see this show up as real memory usage in your operating system process monitor.

The top level bug for the virtual address fragmentation issue is here:

https://bugzilla.mozilla.org/show_bug.cgi?id=965936

Note, however, the best solution we have and are likely to get is to just use 64-bit builds.
Flags: needinfo?(bkelly)
Putting this back, but I don't expect to look at this any time soon.
Flags: needinfo?(bkelly)
Thanks for the info.

Sorry if I'm being thick, but how is real memory fragmentation an issue?  If virtual pages are mapped to real memory with a table, it would seem that real memory fragmentation is irrelevant since the table mapping makes the memory appear contiguous to Firefox?
(In reply to Bill Wood from comment #29)
> Sorry if I'm being thick, but how is real memory fragmentation an issue?  If
> virtual pages are mapped to real memory with a table, it would seem that
> real memory fragmentation is irrelevant since the table mapping makes the
> memory appear contiguous to Firefox?

Memory allocators like malloc typically get real memory from the operating system in pages which are fixed at something like 4KB, etc.  The allocator then must carve up that 4KB into smaller blocks of memory to accomodate things like malloc(32).  It can't free the page back to the OS until all of the allocations handed out from that page are themselves free'd.  So its possible to get pages that have a few allocations left, but not enough free memory to use for other things.  This fragmentation leaves wasted space within the pages.

Or something like that.  Sorry if I'm not explaining well.
Flags: needinfo?(bkelly)
(In reply to Ben Kelly [:bkelly] from comment #30)
> Memory allocators like malloc typically get real memory from the operating
> system in pages which are fixed at something like 4KB, etc.  The allocator
> then must carve up that 4KB into smaller blocks of memory to accomodate
> things like malloc(32).  It can't free the page back to the OS until all of
> the allocations handed out from that page are themselves free'd.  So its
> possible to get pages that have a few allocations left, but not enough free
> memory to use for other things.  This fragmentation leaves wasted space
> within the pages.
> 
> Or something like that.  Sorry if I'm not explaining well.

Thanks.  Still, what you describe is fragmentation happening in virtual memory, which if bad enough would require the allocator to request more real memory to satisfy requests (if it cannot consolidate fragmented memory).  Then if you run out of real memory, or you go over the virtual memory limit imposed by a 32 bit process, you are in trouble. 

I don't think I am running out of real memory as I have 12gb; and my virtual memory usage seems reasonable at under 2gb in each process.  Maybe there is a limit to the allocator's memory heap size that isn't constrained by these two limits?
(In reply to Bill Wood from comment #31)
> (In reply to Ben Kelly [:bkelly] from comment #30)
> > Memory allocators like malloc typically get real memory from the operating
> > system in pages which are fixed at something like 4KB, etc.  The allocator
> > then must carve up that 4KB into smaller blocks of memory to accomodate
> > things like malloc(32).  It can't free the page back to the OS until all of
> > the allocations handed out from that page are themselves free'd.  So its
> > possible to get pages that have a few allocations left, but not enough free
> > memory to use for other things.  This fragmentation leaves wasted space
> > within the pages.
> > 
> > Or something like that.  Sorry if I'm not explaining well.
> 
> Thanks.  Still, what you describe is fragmentation happening in virtual
> memory, which if bad enough would require the allocator to request more real
> memory to satisfy requests (if it cannot consolidate fragmented memory). 
> Then if you run out of real memory, or you go over the virtual memory limit
> imposed by a 32 bit process, you are in trouble. 
> 
> I don't think I am running out of real memory as I have 12gb; and my virtual
> memory usage seems reasonable at under 2gb in each process.  Maybe there is
> a limit to the allocator's memory heap size that isn't constrained by these
> two limits?

Your crash reports indicate that your virtual memory is highly fragmented.  For instance:

https://crash-stats.mozilla.com/report/index/a8f41535-3662-414f-8304-9e3682161215

indicates that there are ~400MB of virtual memory available, but the largest available contiguous virtual memory block is under 2MB (see the "raw dump" tab and search for "largest_free_vm_block")

This similar report:

https://crash-stats.mozilla.com/report/index/35f8d31f-878a-49bc-97b8-9885b2170105 

shows much the same thing, only your available virtual memory is much smaller (~200MB).  The largest available contiguous virtual memory block is under 2MB here as well.

The underlying allocators generally try to obtain 1MB or 2MB blocks from the OS, then carve those up into smaller chunks.  So if you don't have 2MB of contiguous virtual memory space, even small allocations will fail despite seemingly having enough available virtual memory.
It's also worth noting that 32-bit programs on Windows are generally limited to 2GB of virtual address space:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa366912(v=vs.85).aspx

so you can't use the entire 4GB virtual address space.
(In reply to Nathan Froyd [:froydnj] from comment #33)
> It's also worth noting that 32-bit programs on Windows are generally limited
> to 2GB of virtual address space:
> 
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa366912(v=vs.85).
> aspx
> 
> so you can't use the entire 4GB virtual address space.

I believe that article is only for 32 bit versions of Windows.  Mine is a 64 bit version.

To test this I started opening all the tabs in my 26 Firefox windows, over 160 tabs.  Before I could finish, Firefox stopped working properly (black areas in the window, etc) but this was not until I reached 2,990,336 KB working set and 2,331,296 KB private working set.  This was in the Firefox contentproc.

So I have plenty of virtual memory headroom, given that normally I run Firefox contentproc at about 1,800,000 KB working set and 900,000 KB private working set or less.  

Why then, am I running out of virtual memory for allocations?  It would seem there is either some other limit that is being hit before actual virtual memory address space is exhausted, or it's possible that Facebook is allocating vast quantities of virtual memory (at night when I am happily asleep); but then the question is what can be done about the fragmentation?
From Bill's crash report, Total Virtual Memory 4,294,836,224 bytes (4 GB). So it looks like he has 4 GB (usually it's either 3 or 4 GB on 64 bit Windows).

BTW, how many tabs do you have open when this happens? And which? Are they all Facebook tabs?
It would be helpful if we could take Bill's crash to a separate bug.  I don't think its related to comment 0 here.
(In reply to Marco Castelluccio [:marco] from comment #35)
> From Bill's crash report, Total Virtual Memory 4,294,836,224 bytes (4 GB).
> So it looks like he has 4 GB (usually it's either 3 or 4 GB on 64 bit
> Windows).
> 
> BTW, how many tabs do you have open when this happens? And which? Are they
> all Facebook tabs?

Usually around 25 Windows, with one Facebook tab and 25 to 50 other tabs open.
(In reply to Nathan Froyd [:froydnj] from comment #33)
> It's also worth noting that 32-bit programs on Windows are generally limited
> to 2GB of virtual address space:
> 
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa366912(v=vs.85).
> aspx

Following some links from there, I ended up at this even more useful page and thought I should share it, if only so I can find it again in my bugmail:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx

In particular, it indicates that because we pass "/LARGEADDRESSAWARE" (http://searchfox.org/mozilla-central/source/old-configure.in#1113) that on x64 our 32-bit Firefox should have 4GB of address space (in agreement with comment 35).  And on "4GT" "4-gigabyte tuning" enabled x86, 3GB.  (Although it sounds like only Windows XP and Windows Server releases have that?  Confusing.)
Here's the latest, like clockwork every night Firefox crashes unless I close the Facebook tab:
https://crash-stats.mozilla.com/report/index/55bfd47a-2d1c-49f7-99a9-9c6d92170113
(In reply to Bill Wood from comment #39)
> Here's the latest, like clockwork every night Firefox crashes unless I close
> the Facebook tab:
> https://crash-stats.mozilla.com/report/index/55bfd47a-2d1c-49f7-99a9-
> 9c6d92170113

Bill, could you open a new bug for your crash?
We've discussed it a bit here, but it is actually kind of off-topic.
You can set the "See Also" field of the bug you're going to open to point at this bug.
Are you sure you guys want a new bug?  As far as I can tell I'm reporting a very similar issue to the OP, i.e. fragmented memory with Facebook open; except in my case it is crashing Firefox, but I have many more tabs open which could explain that.
Priority: -- → P3

If somebody is still experiencing fragmentation, it would probably be better to file a new bug.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.