Closed Bug 1219914 Opened 9 years ago Closed 7 years ago

25MiB AWSY regression when re-enabling jemalloc 4

Categories

(Core :: Memory Allocator, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox45 --- affected

People

(Reporter: erahm, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(1 file, 1 obsolete file)

When Jemalloc 4 was re-enabled on 9/23 we saw a 25MiB jump on AWSY. At the time glandium thought it must have been changes to Jemalloc 4 b/w disabling on 9/17 and re-enabling on 9/23.
Mike, did we ever track down what changed in b/w disabling and re-enabling jemalloc 4 last time?
Flags: needinfo?(mh+mozilla)
I am waiting on this to be sorted out before enabling jemalloc4 again on trunk.
Whiteboard: [MemShrink] → [MemShrink:P1]
The memory report diff between these two revisions

  387.06MiB https://hg.mozilla.org/integration/mozilla-inbound/rev/e892727a373a
  420.81MiB https://hg.mozilla.org/integration/mozilla-inbound/rev/81fca7e4e6ef

on AWSY of RSS: After TP5, tabs closed:

  33.21 MB (100.0%) -- explicit
  ├──29.90 MB (90.03%) -- heap-overhead
  │  ├──28.55 MB (85.99%) ── page-cache
  │  ├───4.63 MB (13.94%) ── bookkeeping
  │  └──-3.29 MB (-9.90%) ── bin-unused
  ├───6.73 MB (20.28%) ── heap-unclassified
  ├──-2.08 MB (-6.27%) -- js-non-window
According to https://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsMemoryReporterManager.cpp#1351, page-cache is not something need to worry about. Then the problem would be heap-unclassified (I missed in comment 3):

33.21 MB (100.0%) -- explicit
├──29.90 MB (90.03%) -- heap-overhead
│  ├──28.55 MB (85.99%) ── page-cache
│  ├───4.63 MB (13.94%) ── bookkeeping
│  └──-3.29 MB (-9.90%) ── bin-unused
├───6.73 MB (20.28%) ── heap-unclassified
├──-2.08 MB (-6.27%) -- js-non-window
├──-1.08 MB (-3.27%) -- images
├──-0.63 MB (-1.89%) ++ (14 tiny)
└───0.37 MB (01.12%) -- gfx
I tried to test locally see if I can get what's in the heap-unclassified, but I got smaller heap-unclassified with 81fca7e4e6ef:

1. open 10 tabs
2. let tab 1 open the first website in tp5n, tab 2 for the second, and then go back to tab 1 for the eleventh...
3. open a new tab for about:memory
4. minimize memory usage, and get memory report
5. save dmd

Will try again with the becnhtester in areweslimyet.
(In reply to Ting-Yu Chou [:ting] from comment #5)
> Will try again with the becnhtester in areweslimyet.

Couldn't find a way to launch firefox binary with dmd by run_slimtest.py...
(In reply to Ting-Yu Chou [:ting] from comment #6)
> Couldn't find a way to launch firefox binary with dmd by run_slimtest.py...

OK, do following could enable it:

  export LD_PRELOAD=$OBJDIR/dist/bin/libdmd.so \
  export LD_LIBRARY_PATH=$OBJDIR/dist/bin/ \
  export DMD=1
Another problem is I don't know how not to close the browser after finishing the test so I have time to save dmd...
(In reply to Ting-Yu Chou [:ting] from comment #8)
> Another problem is I don't know how not to close the browser after finishing
> the test so I have time to save dmd...

I added "os.system("killall -34 firefox")" to test_open_tabs().
Note that since this bug was filed, jemalloc 4 was upgraded. So we should get new AWSY numbers.
Flags: needinfo?(mh+mozilla)
(In reply to Ting-Yu Chou [:ting] from comment #11)
> I will enqueue the build to AWSY later.

Enqueued. https://areweslimyet.com/?series=bug1219914
Ting-Yu, thank you for picking this back up. Let me know if you need anymore help with AWSY.

(In reply to Ting-Yu Chou [:ting] from comment #4)
> According to
> https://dxr.mozilla.org/mozilla-central/source/xpcom/base/
> nsMemoryReporterManager.cpp#1351, page-cache is not something need to worry
> about. Then the problem would be heap-unclassified (I missed in comment 3):

Page cache is definitely a problem. njn just landed patches reducing it in mozjemalloc in bug 1258257.
Got it, thanks Eric! I am taking sick leave today, will resume the work tomorrow.
(In reply to Eric Rahm [:erahm] from comment #13)
> Page cache is definitely a problem. njn just landed patches reducing it in
> mozjemalloc in bug 1258257.

Eric, could you explain me a bit why page cache is a problem if it's just kept around for optimization? Thanks.
Flags: needinfo?(erahm)
(In reply to Ting-Yu Chou [:ting] from comment #15)
> (In reply to Eric Rahm [:erahm] from comment #13)
> > Page cache is definitely a problem. njn just landed patches reducing it in
> > mozjemalloc in bug 1258257.
> 
> Eric, could you explain me a bit why page cache is a problem if it's just
> kept around for optimization? Thanks.

It holds onto a significant portion of unused memory that could otherwise be returned to the system. This is supposed to be a speed optimization, but after shrinking mozjemalloc's page cache we did not see an impact on our performance tests.

As we start enabling more content processes this overhead becomes much more significant (25MiB per process).
Flags: needinfo?(erahm)
This is the diff (TabsOpenForceGC) from not enabling and enabling jemalloc4 [1]:

43.15 MB (100.0%) -- explicit
├──35.67 MB (82.65%) -- heap-overhead
│  ├──33.91 MB (78.57%) ── page-cache
│  ├───3.64 MB (08.45%) ── bookkeeping
│  └──-1.88 MB (-4.36%) ── bin-unused
├───7.14 MB (16.56%) ── heap-unclassified
├───1.07 MB (02.48%) ++ js-non-window
├──-1.29 MB (-2.98%) ++ images
├───1.03 MB (02.38%) ++ gfx
└──-0.46 MB (-1.08%) ++ (16 tiny)

[1] https://areweslimyet.com/?series=bug1219914
For jemalloc4:

  https://dxr.mozilla.org/mozilla-central/source/memory/jemalloc/src/src/arena.c#1407-1418

It seems page-cache is controlled by opt_lg_dirty_mult which is default 3, and arena_maybe_purge_ratio() will make sure there are less than (arena->nactive >> 3 or at least chunk_npages) dirty pages. So by default each arena will have at least 2MB of dirty pages. If we set opt_lg_dirty_mult large enough, we will have 2MB dirty pages per arena at maximum. Though this doesn't align to bug 1258257 which have 1MB at maximum.
(In reply to Ting-Yu Chou [:ting] from comment #17)
> ├───7.14 MB (16.56%) ── heap-unclassified

I tried to get DMD from local AWSY test, but didn't have similar diff on heap-unclassified...
Attached file poc
With this POC which limit the page cache to 1MB, the explicit memory of TabsOpen / TabsOpenSettles when jemalloc4 is enabled are smaller than mozjemalloc, but TabsOpenForceGC and TabsClosed are larger. Note page cache are all similar, also the heap-unclassified except in TabsClosed:

TabsOpen
  -4.58 MB (100.0%) -- explicit
  ├──-3.51 MB (76.56%) ++ js-non-window
  ├───3.23 MB (-70.44%) ++ heap-overhead
  ├──-3.28 MB (71.70%) ++ images
  ├──-1.32 MB (28.78%) ── heap-unclassified
  ├──-0.38 MB (08.27%) ++ window-objects
  ├───1.01 MB (-21.95%) ++ gfx
  ├──-0.33 MB (07.12%) ++ xpconnect
  ├───0.14 MB (-2.97%) ++ storage/sqlite
  ├───0.12 MB (-2.73%) ── preferences
  ├──-0.02 MB (00.43%) ++ add-ons
  ├──-0.09 MB (01.90%) ++ network
  ├──-0.08 MB (01.79%) ++ workers/workers(chrome)
  ├──-0.04 MB (00.97%) ++ layout
  └──-0.03 MB (00.58%) ++ (8 tiny)

TabsOpenSettled
  -1.62 MB (100.0%) -- explicit
  ├──-4.13 MB (254.16%) ++ js-non-window
  ├───3.32 MB (-204.14%) ++ heap-overhead
  ├──-1.02 MB (62.96%) ++ images
  ├───1.07 MB (-65.56%) ++ gfx
  ├──-0.46 MB (28.62%) ── heap-unclassified
  ├──-0.35 MB (21.74%) ++ xpconnect
  ├───0.14 MB (-8.37%) ++ storage/sqlite
  ├───0.13 MB (-7.69%) ── preferences
  ├──-0.06 MB (03.90%) ++ window-objects
  ├──-0.02 MB (01.21%) ++ add-ons
  ├──-0.09 MB (05.35%) ++ network
  ├──-0.08 MB (05.04%) ++ workers/workers(chrome)
  ├──-0.04 MB (02.73%) ++ layout
  └──-0.00 MB (00.06%) ++ (8 tiny)

TabsOpenFoprceGC
  7.04 MB (100.0%) -- explicit
  ├──6.96 MB (98.85%) -- heap-overhead
  │  ├──3.55 MB (50.41%) ── bookkeeping
  │  ├──3.24 MB (46.06%) ── bin-unused
  │  └──0.17 MB (02.38%) ── page-cache
  ├──1.26 MB (17.85%) ++ js-non-window
  ├──-1.28 MB (-18.22%) ++ images
  ├──1.03 MB (14.56%) ++ gfx
  ├──-0.45 MB (-6.33%) ── heap-unclassified
  ├──-0.29 MB (-4.08%) ++ xpconnect
  ├──-0.22 MB (-3.10%) ++ window-objects
  ├──0.13 MB (01.77%) ── preferences
  ├──-0.02 MB (-0.28%) ++ add-ons
  ├──-0.09 MB (-1.21%) ++ workers/workers(chrome)
  └──0.01 MB (00.18%) ++ (11 tiny)

TabsClosed
  17.83 MB (100.0%) -- explicit
  ├──34.26 MB (192.14%) ++ js-non-window
  ├──-23.74 MB (-133.15%) ── heap-unclassified
  ├───4.77 MB (26.77%) -- heap-overhead
  │   ├──3.59 MB (20.13%) ── bookkeeping
  │   ├──1.33 MB (07.45%) ── bin-unused
  │   └──-0.14 MB (-0.81%) ── page-cache
  ├───3.40 MB (19.04%) ++ window-objects
  ├──-1.13 MB (-6.32%) ++ images
  ├───0.41 MB (02.30%) ++ xpconnect
  └──-0.14 MB (-0.77%) ++ (15 tiny)
Based on comment 20, I think it's worth to make a PR to jemalloc so we can have a hard limitation mode for purging dirty pages. Then we can move to enabling jemalloc 4.

What do you guys think?
Summary: 25MiB AWSY regression when re-enabling jemalloc 4 on 9/23 → 25MiB AWSY regression when re-enabling jemalloc 4
Here's the summary:

a. The 25MiB AWSY regression is from page-cache, set dirty pages to a hard limit 1MiB instead of ratio purging could remove regressed page-cache and heap-unclassified, see comment 20. [1]
b. Enabling jemalloc4 causes many regressions on Talos. [2]
c. Somehow enabling jemalloc4 failed Windows XP and 8 builds on Try. [3][4]

I guess jemalloc4 won't be enabled soon because of b), in that case we should land bug 1005844 in mozjemalloc at first. We can file a bug to transform bug 1005844 to a hook later when jemalloc4 gets enabled.

What do you think?

[1] https://areweslimyet.com/?series=bug1219914
[2] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=093575f5e79c&newProject=try&newRevision=efa504ec15ae&framework=1&showOnlyImportant=0
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=efa504ec15ae
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=23e6b507ab8d
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(erahm)
it has been a bit long as this bug has been opened for 6 months and it looks like a deadlock right now.
it may make more sense to evaluate the status of jemalloc4 first?

the question is whether jemalloc4 (bug 762449) offers the sufficient benefits to move forward?
> if no, does it mean we stick with mozjemalloc onwards?
> if yes, investigate whether those regressions on Talos are actionable
  > if yes, do we want to wait for these improvements?
  > if no, do the trade off worthy?
(In reply to Ting-Yu Chou [:ting] from comment #23)
> c. Somehow enabling jemalloc4 failed Windows XP and 8 builds on Try. [3][4]

This may be due to bug 1261226. FWIW, I've been building w/ jemalloc4 enabled on Windows for some time now and things generally Just Work. Anyway, you can attempt another Try push with a newer upstream rev and see if it works better.
What do you think about comment 23?
Flags: needinfo?(continuation)
I don't know really know anything about jemalloc, sorry. Hopefully Glandium will be able to reply.
Flags: needinfo?(continuation)
(In reply to Ting-Yu Chou [:ting] from comment #23)
> Here's the summary:
> 
> a. The 25MiB AWSY regression is from page-cache, set dirty pages to a hard
> limit 1MiB instead of ratio purging could remove regressed page-cache and
> heap-unclassified, see comment 20. [1]
> b. Enabling jemalloc4 causes many regressions on Talos. [2]
> c. Somehow enabling jemalloc4 failed Windows XP and 8 builds on Try. [3][4]
> 
> I guess jemalloc4 won't be enabled soon because of b), in that case we
> should land bug 1005844 in mozjemalloc at first. We can file a bug to
> transform bug 1005844 to a hook later when jemalloc4 gets enabled.
> 
> What do you think?
> 
> [1] https://areweslimyet.com/?series=bug1219914
> [2]
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=093575f5e79c&newProject=try&newR
> evision=efa504ec15ae&framework=1&showOnlyImportant=0
> [3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=efa504ec15ae
> [4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=23e6b507ab8d

FWIW I think trying to land bug 1005844 is a good idea. It might make sense to have jmaher look at those talos results to weigh in on whether or not they're acceptable. It's also possible glandium can work his jemalloc-regression-hunting magic to find issues in jemalloc itself.
Flags: needinfo?(erahm) → needinfo?(jmaher)
it would be nice to get updated try pushes if possible- that is 5+ weeks old, the code base changes often.

I don't like so many regressions- while they are all <5%, we have a lot of small regressions that pop in- these are more e10s regressions than both e10s/non-e10s- I would want to coordinate with the e10s team as they have focused hard to get e10s <= non-e10s.

let me know if we can get new try pushes, that would be nice to have.
Flags: needinfo?(jmaher)
Not sure how to enable jemalloc4 now...
ac_add_options --enable-jemalloc=4
(In reply to Ryan VanderMeulen [:RyanVM] from comment #34)
> ac_add_options --enable-jemalloc=4

Tried, but didn't see MOZ_JEMALLOC4 in obj-x86_64-pc-linux-gnu/config.status, is this normal?
hrm, it's definitely there on my local builds.
(In reply to Ting-Yu Chou [:ting] from comment #35)
> Tried, but didn't see MOZ_JEMALLOC4 in
> obj-x86_64-pc-linux-gnu/config.status, is this normal?

Hmm... it does not have MOZ_JEMALLOC4 if I put the option in

  build/mozconfig.common.override, or
  build/mozconfig.common

but it does if I put in .mozconfig, how should I do for a Try push?
Add "ac_add_options --enable-jemalloc=4" to build/mozconfig.common.override will have jemalloc4 enabled on Try but not local build. Anyway, here's the updated comparison:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=85592c49e70b&newProject=try&newRevision=c96f8c0e8343&framework=1&showOnlyImportant=0

Note the tests are still running when I post this, and sorry for the Try pushes spamming.
Flags: needinfo?(jmaher)
given the compare link above, we have a lot of useful data to work with (linux64, osx10.10, win7).  There are a lot of regressions and a few improvements.

improvments:
2-3% linux64: glterrain, kraken, kraken-e10s, main_rss, tp5o private bytes, tps
203% osx: tp5o

regressions- too many to list just look at the compare link above.  seems that win7 is the primary platform for regressions, many regressions are in the 2-5%, here are the largest regressions seen:
>7% win7: a11y, damp, cart, sessionrestore_no_auto_restore, tart, tp5 responsiveness, tp5o scroll
>7% linux64: tabpaint, tp5o scroll
>7% osx10: tps!!!

I recall a large list of regressions when this was landed originally.  I assume there is nothing we can do to reduce these regressions, but it would be good to understand them in more detail before moving forward (i.e. is this a relic of the hardware we use, build flags, etc.)
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #29)
> in- these are more e10s regressions than both e10s/non-e10s- I would want to coordinate with the e10s > team as they have focused hard to get e10s <= non-e10s.

Would you coordinated with the team based on the updated comparison?
Flags: needinfo?(jmaher)
Jemalloc 4.2.1 is out and, as preliminary testing seems to indicate, seems much better than what we currently have in tree, enough that I'm seriously considering enabling and see what happens after bug 1277704 lands.
Depends on: 1277704
Flags: needinfo?(mh+mozilla)
I have let this needinfo go way too long.  I did chat with the e10s team and they don't have release criteria for anything after version 48, so in this case lets look at 4.2.1 and see if we can make something stick.
Flags: needinfo?(jmaher)
In case anybody is wondering, here's the current state of Talos with and without jemalloc4 (now 4.4.0) enabled. Not sure how easily we can run a test on AWSY.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4a2697532c50&newProject=try&newRevision=3d8ac051ac1e&framework=1&showOnlyImportant=0
Testing locally, I've found that setting |lg_dirty_mult:6| (using |JE_MALLOC_CONF|) has approximately the same effect as :ting found with explicitly setting |threshold|: page-cache goes from ~20MB down to 1-2MB.

This seems like a good compromise, since for very large processes it'll still allow gradually keeping more pages around, and it can be solved with configuration instead of needing to mess with the source. Once bug #1353752 is solved, I'll work on a patch to configure our default lg_dirty_mult to be 6, and then run through AWSY and confirm my local measurements.
Attachment #8855021 - Flags: review?(mh+mozilla)
Alex can you run this against awsy? It would be nice to get several retriggers so we can compare in perfherder. The try format for all supported platforms is:

> try: -b o -p linux32,linux64,win32,win64 -u awsy-e10s -t none

Before pushing I think you'll need to update the in-tree configs to enable jemalloc 4.
Flags: needinfo?(agaynor)
Eric, yes I will, but first it's blocked on bug #1353752 (I ended up working out of order with what I said after I realized Mike was on Tokyo time :D). Once that's landed I'll kick the try bots. (Leaving ni on since the next action here is still on me)
(In reply to Alex Gaynor from comment #49)
> Eric, yes I will, but first it's blocked on bug #1353752 (I ended up working
> out of order with what I said after I realized Mike was on Tokyo time :D).
> Once that's landed I'll kick the try bots. (Leaving ni on since the next
> action here is still on me)

Oh right, you're using review board and autoland. I can push to try for you, I'll add a note here when I do.
Thanks!
Flags: needinfo?(agaynor)
Comment on attachment 8855021 [details]
Bug 1219914 - increase jemalloc's 'lg_dirty_mult' parameter from the default of 3 to 6

https://reviewboard.mozilla.org/r/126942/#review129700

Using jemalloc's default was done on purpose in bug 1203840 to avoid purges happening at relatively high frequency on free(), but rather trigger it manually after cycle collection. Somehow, AWSY wasn't satisfied with this strategy, which is either a problem with when AWSY does its measurements, or a problem with when the purges are triggered. IIRC from when I looked, it tended to be the former. It would be good for fresh eyes to look into it again.
Attachment #8855021 - Flags: review?(mh+mozilla)
AWSY does measurements immediately after testing (TabsOpen), 30 seconds later (TabsOpenSettled), and after forcing a GC (TabsOpenForcedGC).
So for Linux we see an average of a 10MB regression in Explicit Memory After tabs open [+30s, forced GC] [1]. Windows seems happy (this is a new test, so I have lower confidence in it).

Interesting parts of the diff of memory reports [2,3]:

> Web Content (pid NNN)
> 
> ├──11.60 MB (278.77%) -- heap-overhead
> │  ├───9.12 MB (219.15%) ── bookkeeping [4]
> │  ├───1.72 MB (41.32%) ── bin-unused [4]
> │  └───0.76 MB (18.31%) ── page-cache [4]
> 
> Main Process (pid NNN)
> 
> ├──2.34 MB (93.88%) -- heap-overhead
> │  ├──1.64 MB (65.89%) ── bookkeeping
> │  ├──1.28 MB (51.43%) ── page-cache
> │  └──-0.58 MB (-23.44%) ── bin-unused

It looks like the bigger loss was in bookkeeping, so maybe jemalloc4 just has higher overhead in general?

[1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=60d670e34950d5414175b06081e861c800df7e96&newProject=try&newRevision=91515528d453f08cd9fd28f552d1dd1b70645c2b&originalSignature=a630e0d4f7000610ca57d0f8da52b55d117632a9&newSignature=a630e0d4f7000610ca57d0f8da52b55d117632a9&framework=4
[2] https://queue.taskcluster.net/v1/task/ZpIEZEohSeagTdOVyw24dw/runs/0/artifacts/public/test_info//memory-report-TabsOpenForceGC-2.json.gz
[3] https://queue.taskcluster.net/v1/task/bbzJ2DiMSGCCkGqwAfPXPg/runs/0/artifacts/public/test_info//memory-report-TabsOpenForceGC-2.json.gz
Also note the original regression was single process, now we're using e10s-multi (1 chrome, 4 content), so relatively speaking this is *much* better.
I've gone ahead and pushed a build enabling jemalloc4 but without this patch:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a840ec89de36478d526d4b1aaeac161143fdbd64
From the build with jemalloc4, but not this patch [0], I'm seeing that in _one_ Web Content process has a high page-cache:

├───29.99 MB (22.80%) -- heap-overhead
│   ├──17.69 MB (13.45%) ── bin-unused
│   ├───8.13 MB (06.18%) ── page-cache
│   └───4.17 MB (03.17%) ── bookkeeping

however, all the other processes (1 main, 3 web content) all have page-cache in the 0-2MB range. I'm still digging into how the AWSY tests work; is there any reason we'd expect such a significant deviation between the web content processes?

[0] https://queue.taskcluster.net/v1/task/PFTs4UwrTDWqFuE8--Gz4Q/runs/0/artifacts/public/test_info//memory-report-TabsOpenForceGC-2.json.gz
A few perfherder links:

base vs jemalloc4 (no patch): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=22e2081b0e0d1e757e4e980c6ee6b29d21488c53&newProject=try&newRevision=a840ec89de36478d526d4b1aaeac161143fdbd64&framework=4&showOnlyImportant=0

jemalloc 4 (no patch) vs jemalloc 4 (with patch): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a840ec89de36478d526d4b1aaeac161143fdbd64&newProject=try&newRevision=91515528d453f08cd9fd28f552d1dd1b70645c2b&framework=4&showOnlyImportant=0

base vs jemalloc 4 (with patch): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=22e2081b0e0d1e757e4e980c6ee6b29d21488c53&newProject=try&newRevision=91515528d453f08cd9fd28f552d1dd1b70645c2b&framework=4&showOnlyImportant=0


Summary:

base v jemalloc 4 (no patch): A handful of significant regressions with jemalloc4
jemalloc 4 (no patch) vs jemalloc 4 (with patch): One significant improvement with the patch
base v jemalloc 4 (with patch): All in the noise

(Really hope I did this correctly, my first time using perfherder!)

Before I opine on possible future directions, would love if someone can confirm that my attempts to use the tools and summarize what they say is correct!
(In reply to Alex Gaynor from comment #60)
> Summary:
> 
> base v jemalloc 4 (with patch): All in the noise
> (Really hope I did this correctly, my first time using perfherder!)

For this metric it's best to expand the subtests (hover over the platform and the click 'subtests'). There you can see the 10MB regression after gc. 

> Before I opine on possible future directions, would love if someone can
> confirm that my attempts to use the tools and summarize what they say is
> correct!

Yes you're using them correctly, sometimes you have to dig into the subtests though. Perfherder uses a geometric mean of the subtests for comparison at a higher level, this finds rather large regressions but might miss regressions in just one of the checkpoints (in this case tabs loaded, after gc is probably the most important).
Sorry I missed this.

(In reply to Alex Gaynor from comment #59)
> From the build with jemalloc4, but not this patch [0], I'm seeing that in
> _one_ Web Content process has a high page-cache:
> 
> ├───29.99 MB (22.80%) -- heap-overhead
> │   ├──17.69 MB (13.45%) ── bin-unused
> │   ├───8.13 MB (06.18%) ── page-cache
> │   └───4.17 MB (03.17%) ── bookkeeping
> 
> however, all the other processes (1 main, 3 web content) all have page-cache
> in the 0-2MB range. I'm still digging into how the AWSY tests work; is there
> any reason we'd expect such a significant deviation between the web content
> processes?

The test opens 100 pages in 30 tabs (round robin), close all but one, navigates to about:blank, repeats for a total of 3 times. So a couple possibilities:
  #1 - That process just loaded heavier pages
  #2 - That process didn't get killed b/w tests (perhaps an e10s-multi regression)
  #3 - Our GC helper is returning before GC runs in the child process

I'll look into 2 & 3.
(In reply to Eric Rahm [:erahm] from comment #62)
> The test opens 100 pages in 30 tabs (round robin), close all but one,
> navigates to about:blank, repeats for a total of 3 times. So a couple
> possibilities:
>   #1 - That process just loaded heavier pages
>   #2 - That process didn't get killed b/w tests (perhaps an e10s-multi
> regression)
>   #3 - Our GC helper is returning before GC runs in the child process
> 
> I'll look into 2 & 3.

It looks #2 is probably the explanation. Basically the first tab is always the first content process, it never gets closed so it accumulates more cruft between iterations.
And the more live allocated data, the more dirty pages jemalloc4 allows to be kept around. So that makes sense from the perspective of jemalloc4's automatic reclamation logic.

Would we have expected the cycle collector's purge logic to have been triggered here?
(In reply to Alex Gaynor from comment #64)
> And the more live allocated data, the more dirty pages jemalloc4 allows to
> be kept around. So that makes sense from the perspective of jemalloc4's
> automatic reclamation logic.
> 
> Would we have expected the cycle collector's purge logic to have been
> triggered here?

I think the disconnect is that we force a full GC, not CC. The assumption is that with the 30 second pause CC should have kicked in, metrics indicates this *usually* the case [1], but I guess not guaranteed.

[1] https://mzl.la/2o7xlwg
Interesting, that's a very handy metric!

The question I see now is: are we comfortable with the memory usage post-CC -- which seems to be 0-2MB or so (per process).

Assuming the answer is yes, there's a few options:

a) Run the CC more frequently
b) Trigger the purge on occasions besides the CC -- maybe on a full GC?
c) Make AWSY trigger the CC

(c) only makes sense if you assume that AWSY and "normal" usage diverge in their CC patterns. From the data you linked, it looks like 85% of cycle collections are less than 30 seconds apart.

Does that make sense? (Please let me know if I'm way off base, I'm just diving into Firefox memory management, and I'm layering what I'm learning on top of my experience with other VMs' GCs)
Comment on attachment 8855021 [details]
Bug 1219914 - increase jemalloc's 'lg_dirty_mult' parameter from the default of 3 to 6

https://reviewboard.mozilla.org/r/126942/#review131712

I guess you didn't mean to send this to review. For one, the build/mozconfig.common change is ungranted, it's something for a try build. And it doesn't seem the discussion since comment 54 was conclusive this is necessary. I'd really rather avoid madvise as much as possible during "normal" operations, and have it more likely triggered by GC/CC/whatever.
Attachment #8855021 - Flags: review?(mh+mozilla)
Per bug 1363992, jemalloc 4 related bugs are now irrelevant.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Attachment #8855021 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: