Closed Bug 619558 (GenerationalGC) Opened 14 years ago Closed 9 years ago

[meta] Implement generational garbage collection

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31
blocking-kilimanjaro -
Tracking Status
firefox31 --- disabled
firefox32 --- fixed
relnote-firefox --- 32+

People

(Reporter: paul.biggar, Assigned: terrence)

References

(Blocks 4 open bugs)

Details

(Keywords: feature, perf, Whiteboard: [games:p2] [js:p1:fx31][talos_regression][qa-])

Attachments

(8 files, 2 obsolete files)

Depends on: 619392
Depends on: 619561
Alias: GenerationalGC
Summary: META: implement generational garbage collection → [meta] Implement generational garbage collection
Blocks: 625615
Depends on: 658676
Whiteboard: [MemShrink:P1]
No longer blocks: 636077
No longer blocks: 617569
Depends on: 673492
Blocks: 505308
Depends on: 701448
No longer depends on: 701448
Depends on: 641027
Depends on: 590379
Blocks: 590379
No longer depends on: 590379
Depends on: 706885
Depends on: 707049
Depends on: 714647
Depends on: 720522
A random test collector I whipped up this weekend to play with potential performance tradeoffs and to see if tying a new allocator into JaegerMonkey is as easy as it looks.
Attachment #603033 - Flags: feedback?(wmccloskey)
Will there be any differences in how the Generational GC works with Ion versus Jager?  If so, would it make more sense to prototype GGC with Ion?
Attachment #603033 - Flags: feedback?(wmccloskey)
Depends on: 745742
blocking-kilimanjaro: --- → ?
Depends on: 750733
Too big for the k9o initial time frame.
blocking-kilimanjaro: ? → -
Depends on: 752093
Depends on: ExactRooting
No longer depends on: 750733
Depends on: 750733
Whiteboard: [MemShrink:P1] → [MemShrink:P1][games:p3]
Depends on: 764876
Depends on: 773686
Depends on: 782646
Blocks: 605426
No longer depends on: 773686
Depends on: 831506
Depends on: 831507
Whiteboard: [MemShrink:P1][games:p3] → [MemShrink:P1][games:p2]
Depends on: 831886
Depends on: 832489
No longer blocks: MatchStartupMem
Depends on: 838810
Depends on: 851181
Depends on: 851342
Assignee: wmccloskey → terrence
Depends on: 862115
No longer depends on: 862115
Depends on: 863521
Depends on: 863524
Blocks: 874174
Depends on: 875863
No longer depends on: 831507
No longer depends on: 650161
Depends on: ggcfuzz
Depends on: 877907
Depends on: 882482
Depends on: 883234
Depends on: 883466
Depends on: 883498
No longer depends on: 883498
Blocks: 619392
No longer depends on: 619392
No longer depends on: 831886
No longer depends on: 673492
Depends on: 885550
Depends on: 883498
Depends on: 885955
Depends on: 886575
Depends on: 893486
Whiteboard: [MemShrink:P1][games:p2] → [MemShrink:P1][games:p2][u=js c=ggc p=1]
No longer depends on: 720522
No longer depends on: 706885, 745742
Status: NEW → ASSIGNED
Keywords: perf
Depends on: 916829
Depends on: 906091
Depends on: 919536
Depends on: 921178
Depends on: 923179
Depends on: 925397
Depends on: 925817
Depends on: 925863
Keywords: feature
Whiteboard: [MemShrink:P1][games:p2][u=js c=ggc p=1] → [MemShrink:P1][games:p2]
Blocks: 928894
Blocks: 874580
Blocks: 933313
Depends on: 941779
Depends on: 956434
Depends on: 967720
Depends on: 969410
Attachment #603033 - Attachment is obsolete: true
Attached patch enable_ggc_by_default_on_desktop-v0.diff (obsolete) — — Splinter Review
This is identical to the patch Ted reviewed for exact rooting and has extensive testing on Maple, so shouldn't need too skeptical a review.
Attachment #8377864 - Flags: review?(sphink)
Comment on attachment 8377864 [details] [diff] [review]
enable_ggc_by_default_on_desktop-v0.diff

Review of attachment 8377864 [details] [diff] [review]:
-----------------------------------------------------------------

I am happy to ride on the coattails of ted's review.
Attachment #8377864 - Flags: review?(sphink) → review+
Comment on attachment 8377864 [details] [diff] [review]
enable_ggc_by_default_on_desktop-v0.diff

>+JSGC_GENERATIONAL=1
>+MOZ_ARG_DISABLE_BOOL(gcgenerational,
>+[  --enable-gcgenerational Disable generational GC],
>+    JSGC_GENERATIONAL= ,
>+    JSGC_GENERATIONAL=1 )
> if test -n "$JSGC_GENERATIONAL"; then
>     AC_DEFINE(JSGC_GENERATIONAL)
> fi

Correct me if I'm wrong, but shouldn't that be "--disable-gcgenerational" there?
(In reply to Dave Garrett from comment #8)
> Correct me if I'm wrong, but shouldn't that be "--disable-gcgenerational"
> there?

Great catch! It does indeed look like I've missed a string.
Attachment #8377864 - Attachment is obsolete: true
Attachment #8377934 - Flags: review+
Whiteboard: [MemShrink:P1][games:p2] → [MemShrink:P1] [games:p2] [js:p1:fx31]
Depends on: 978387
No longer depends on: 978387
Depends on: 984684
Depends on: 985562
Depends on: 986147
Depends on: 987160
Keywords: leave-open
Attachment #8377934 - Flags: superreview?(nihsanullah)
Depends on: 988053
Awesome!
Comment on attachment 8377934 [details] [diff] [review]
enable_ggc_by_default_on_desktop-v1.diff

sr+ from Naveed, over-the-shoulder.
Attachment #8377934 - Flags: superreview?(nihsanullah) → superreview+
Let's watch AWSY closely.
sorry had to backout this changeset, seems its causing frequent test failures on OSX 10.8 like https://tbpl.mozilla.org/php/getParsedLog.php?id=36790332&tree=Mozilla-Inbound
The AWSY results will be available later today. In the meantime, I did some local benchmarking using http://gregor-wagner.com/tmp/mem50. I dumped memory reports at three points:
- immediately after start-up
- after running the benchmark (with all 50 tabs open)
- after closing all the tabs and hitting "minimize memory usage" three times

I tested against revisions b0dbbb5d770a (just before landing) and 52f43e3f552f (just after landing). This is on a Mac.

The summary is that 'resident' memory usage is up by roughly 8%.
- At start-up, it is up by 14 MiB, from 179 to 193
- With 50 tabs open, it is up by 117 MiB, from 1270 to 1387
- After closing the tabs, it is up by 35 MiB, from 409 to 444

I will attach the relevant memory report files shortly.

This is disappointing. I'd been hoping that by filtering out many of the short-lived objects, the tenured heap would grow more slowly and that this would actually reduce memory usage.

At start-up:
- 7 MiB is from nurseries; 4 MiB in the main runtime, and 2 and 1 in the two workers
- 4 MiB is additional heap-unclassified, which is surprising
- 6? MiB is just more GC stuff being held onto, which is surprising

With 50 tabs open:
- 17.5 MiB is additional heap-overhead, which is surprising. Are we doing more heap allocations, maybe due to store buffers?
- 16 MiB is additional heap-unclassified

With 50 tabs closed:
- 15 MiB is additional heap-overhead

So the clear signals are coming from nurseries, heap-overhead, and heap-unclassified. And then there's just more general JS stuff being held onto as well.
Attached file noggc-startup —
Attached file ggc-startup —
Depends on: 988821
Attached file ggc-mem50-full —
Attachment #8397785 - Attachment description: ggc-mem50-full.json.gz → ggc-mem50-full
Attachment #8397787 - Attachment filename: ggc-mem50-after.json.gz → ggc-mem50-after
It's worth noting that a typical worker (e.g. osfile_async_worker.js, SessionWorker.js) is currently 2--3 MiB, so an extra 1 MiB of nursery is a big increase, and goes against bug 864927's goals.
Thinking some more...

- The nursery space isn't surprising. For workers, yesterday we discussed the idea of possibly decommitting part of it and having some kind of guard page, in order to effectively reduce the chunk size.

- The increased tenured heap size is surprising. It seems like lots of GC things are being held onto for longer?

- The heap-overhead increase is surprising. I don't understand exactly what that measures, but it's some sort of malloc heap fragmentation measure. Does GGC cause more malloc heap churn?

- The heap-unclassified increase is also surprising. I thought we had coverage for all the new structures. Are there some we are missing?
> - The heap-unclassified increase is also surprising. I thought we had
> coverage for all the new structures. Are there some we are missing?

mccr8 just worked this out -- we're not measuring all the malloc'd stuff hanging off the GC things in the nursery. And it sounds like this isn't even possible, because the nursery isn't designed to be iteratable. Luke suggested that forcing a nursery collection before running the memory reporters would avoid this problem, though it could result in misleading measurements.
(In reply to Nicholas Nethercote [:njn] from comment #26)
> mccr8 just worked this out -- we're not measuring all the malloc'd stuff
> hanging off the GC things in the nursery. And it sounds like this isn't even
> possible, because the nursery isn't designed to be iteratable.

The nursery does have a list of external malloc'ed slots (because it has to free them eventually), so it seems you could measure this somehow.
> The nursery does have a list of external malloc'ed slots (because it has to
> free them eventually), so it seems you could measure this somehow.

Ah yes. This is Nursery::hugeSlots, which stores malloc'd slots. But there are other malloc'd things that can hang of objects: elements, and things in the private slot (e.g. for PropertyIteratorObject and RegExpStaticsObject), and things in fixed slots (e.g. for ArgumentsObject), and reserved slots (e.g. for AsmJSModuleObject). How are these tracked?

Also, how is the number of committed chunks in the Nursery controlled? When does it increase and decrease?
Depends on: 989035
(In reply to Nicholas Nethercote [:njn] from comment #28)
> > The nursery does have a list of external malloc'ed slots (because it has to
> > free them eventually), so it seems you could measure this somehow.
> 
> Ah yes. This is Nursery::hugeSlots, which stores malloc'd slots. But there
> are other malloc'd things that can hang of objects: elements, and things in
> the private slot (e.g. for PropertyIteratorObject and RegExpStaticsObject),
> and things in fixed slots (e.g. for ArgumentsObject), and reserved slots
> (e.g. for AsmJSModuleObject). How are these tracked?

|hugeSlots| stores both slots and elements. The nursery only stores background finalizable objects. The background finalizer only knows how to free slots and elements, so we would be leaking regardless if that were the case.

> Also, how is the number of committed chunks in the Nursery controlled? When
> does it increase and decrease?

It is based on the promotion rate. Search for growAllocableSpace and shrinkAllocableSpace in Nursery.cpp.
No longer depends on: 989035
(In reply to Terrence Cole [:terrence] from comment #29)
> (In reply to Nicholas Nethercote [:njn] from comment #28)
> > > The nursery does have a list of external malloc'ed slots (because it has to
> > > free them eventually), so it seems you could measure this somehow.
> > 
> > Ah yes. This is Nursery::hugeSlots, which stores malloc'd slots. But there
> > are other malloc'd things that can hang of objects: elements, and things in
> > the private slot (e.g. for PropertyIteratorObject and RegExpStaticsObject),
> > and things in fixed slots (e.g. for ArgumentsObject), and reserved slots
> > (e.g. for AsmJSModuleObject). How are these tracked?
> 
> |hugeSlots| stores both slots and elements. The nursery only stores
> background finalizable objects. The background finalizer only knows how to
> free slots and elements, so we would be leaking regardless if that were the
> case.

njn, the classes with malloced data in their private slots (eg PropertyIteratorObject) will have finalizers, and we do not nursery allocate anything with a finalizer. (You can be background finalizable with or without a custom finalizer hook.)

The examples you list all have custom finalizers: PropertyIteratorObject, RegExpStaticsObject, ArgumentsObject, and AsmJSModuleObject.
So the AWSY results are in and they actually look good -- I was expecting an obvious jump but there wasn't one. All the changes were within the noise, except perhaps for the 'explicit' start-up measurements, which were up by about 6 MiB (those measurements don't vary much, so this is beyond the noise).

Hopefully this is accurate, and somehow my measurements from this morning are bogus in some way...
Attached file Results of latest landing —
No further comment.
> So the AWSY results are in and they actually look good

Subsequent measurements were similar, which is good.

I did some more measurements by hand. Startup 'resident' is still about 5% worse on my Mac, but I also tried opening every article on the front page of TechCrunch and I got about a 30 MiB (2.5%) reduction.
This definitely needs to get into the release notes. So nominating.
relnote-firefox: --- → ?
Either this or bug 988821 caused 5-7% dromaeo-dom regressions across the board.  Was that expected?

In particular, dom_attr got consistently slower....
Flags: needinfo?(terrence)
(In reply to Boris Zbarsky [:bz] from comment #36)
> Either this or bug 988821 caused 5-7% dromaeo-dom regressions across the
> board.  Was that expected?
> 
> In particular, dom_attr got consistently slower....

Is this on a mac? Since there is one bug (bug 987047) still open that made ggc a regression on awfy instead of an improvement on mac. So if this is mac-only, I would bet it is that problem.
Depends on: 989414
I will benchmark as soon as I get home from the work-week.
No longer depends on: 969410
As mentioned in comment 36, this shows a regression on talos (linux, linux64, osx 10.6, osx 10.8, win7, win8) for dromaeo_dom:
http://graphs.mozilla.org/graph.html#tests=[[73,131,31],[73,131,25],[73,131,35],[73,131,33],[73,63,24],[73,63,21]]&sel=1395660120282,1396264920282&displayrange=7&datatype=running

on datazilla you can see on 10.6 traverse.html is regressed:
https://datazilla.mozilla.org/?start=1395659415&stop=1396264215&product=Firefox&repository=Mozilla-Inbound&os=mac&os_version=OS%20X%2010.6.8&test=dromaeo_dom&graph_search=786057e01ef3&error_bars=false&project=talos

and for win7, traverse.html is regressed as well:
https://datazilla.mozilla.org/?start=1395659415&stop=1396264215&product=Firefox&repository=Mozilla-Inbound-Non-PGO&os=win&os_version=6.1.7601&test=dromaeo_dom&graph_search=786057e01ef3&error_bars=false&project=talos


In addition to dromaeo(dom) regression, kraken has regressed:
http://graphs.mozilla.org/graph.html#tests=[[232,63,24],[232,63,21],[232,131,25],[232,131,33],[232,131,35],[232,131,31]]&sel=1395668004184,1396272804184&displayrange=7&datatype=running

on all platforms except for linux 32 where it improved!

here is a view of datazilla for win7 kraken:
https://datazilla.mozilla.org/?start=1395659415&stop=1396264215&product=Firefox&repository=Mozilla-Inbound-Non-PGO&os=win&os_version=6.1.7601&test=kraken&graph_search=786057e01ef3&error_bars=false&project=talos

we regressed on:
json-parse-financial
audio-oscillator
audio-fft
audio-beat-detection

but we improved on stanford-cryto-aes though! 


Now for even more regressions, ts paint:
http://graphs.mozilla.org/graph.html#tests=[[83,131,33],[83,131,25],[83,131,35]]&sel=1395929540852.4817,1396010084966.6313,0,1080&displayrange=7&datatype=running

This is showing regressions on win7, win8, linux 32, linux64


the last regression I see from here is the tpaint (opening windows) regression:
http://graphs.mozilla.org/graph.html#tests=[[82,131,25],[82,131,37],[82,131,31],[82,131,33],[82,63,21]]&sel=none&displayrange=7&datatype=running

this is seen on winxp, win7, win7 and linxu32 and osx 10.6 


I have done a lot of retriggers on tests before and after this landing:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&fromchange=7cad4bd4bfe5&tochange=d9e6a6c40a57&jobname=mozilla-inbound%20talos%20other
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&fromchange=7cad4bd4bfe5&tochange=d9e6a6c40a57&jobname=mozilla-inbound%20talos%20dromaeojs

If there are no plans to back this out or fix this, I would like to see a post outlining that we are fine regressing these 4 performance tests on the majority of our platforms.
Whiteboard: [MemShrink:P1] [games:p2] [js:p1:fx31] → [MemShrink:P1] [games:p2] [js:p1:fx31][talos_regression]
(In reply to Joel Maher (:jmaher) from comment #40)
> 
> If there are no plans to back this out or fix this, I would like to see a
> post outlining that we are fine regressing these 4 performance tests on the
> majority of our platforms.

We expected performance to vary up and down across the board. We still have 4 weeks to address the cases where the variance is more down than up, so lets hold off on the backout for a bit.
I think I had outlined all the regressions in my previous comment, make a little work and that should be minimized.
one other regression is the TART test, I see it on windows 8 (PGO):
http://graphs.mozilla.org/graph.html#tests=[[293,63,31]]&sel=1393767907190,1396359907190&displayrange=30&datatype=running

I had done some retriggers and I can see the change prior to this with a 4.4-4.6 range on tart and this change has 4.5-5.0:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=WINNT%206.2%20mozilla-inbound%20pgo%20talos%20svgr&fromchange=092a737e9451&tochange=a4c9a284e014

this took a while to get an alert as we don't have less frequent pgo builds.
Target Milestone: --- → mozilla31
With Bug 990336 landed, we've recovered our kraken performance and then some. Verified on AWFY and talos.
Flags: needinfo?(terrence)
Whoops, didn't mean to un-needinfo myself; still more to be done for dromaeo.
Flags: needinfo?(terrence)
QA Contact: andrei.vaida
Added to the release note for 31. If that changes (ie ship in 32), please reset the relnotes flag to "?"
Depends on: 994450
Depends on: 994589
Whiteboard: [MemShrink:P1] [games:p2] [js:p1:fx31][talos_regression] → [MemShrink:P1] [games:p2] [js:p1:fx31][talos_regression][qa+]
Depends on: 995442
we still have a few other regressions and <2 weeks to go. I am glad to see the kraken regression fixed, is it realistic to get tspaint, tart, and dromaeo fixed as well?
Whiteboard: [MemShrink:P1] [games:p2] [js:p1:fx31][talos_regression][qa+] → [MemShrink:P1] [games:p2] [js:p1:fx31][talos_regression][qa-]
I've removed the MemShrink:P1 tag because this is basically done and it turned out to not have much effect on memory usage. I've added a MemShrink tag to bug 650161 (compacting GC) which should have a bigger effect.
Whiteboard: [MemShrink:P1] [games:p2] [js:p1:fx31][talos_regression][qa-] → [games:p2] [js:p1:fx31][talos_regression][qa-]
No longer depends on: 875863
No longer depends on: 989414
Depends on: 1020751
Depends on: 991845
Depends on: 1009675
Resetting the relnotes flag to make sure we have it in the 32 beta release notes.
Target Milestone: mozilla31 → mozilla32
This still landed on trunk during the Fx31 cycle, even though it was later reverted on mozilla-beta.
Target Milestone: mozilla32 → mozilla31
Moved to the 32 beta release notes
Depends on: 1015618
Depends on: 1068399
I believe this is released now, we appear to have not addressed all the talos performance regressions from this, specifically looking back in the last 6 months, windows 7 tpaint is worse:
http://graphs.mozilla.org/graph.html#tests=%5B%5B82,131,25%5D,%5B82,53,25%5D%5D&sel=1380637141812,1412173141812&displayrange=365&datatype=running

Are we done with ggc?  are there next steps?
No, we still plan to address this; it isn't worth holding up GGC over it, however.
Flags: needinfo?(terrence)
Depends on: 1127246
Depends on: 1149526
No longer depends on: 885550
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
No longer depends on: 994589
Depends on: 1264575
No longer depends on: 1264575
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: