1470339 - Reduce the overhead of telemetry in content processes

Reporter

Description

•

6 years ago

Telemetry is reporting 156K of overhead for it's C++ components under about:memory for each content process.

Additionally the numbers in bug 1463569 indicate the JS portions are using a fair amount as well:
 - TelemetryController.jsm (24712 bytes)
 - TelemetrySession.jsm (23792 bytes)
 - TelemetryStopwatch.jsm (11280 bytes)
 - content-HybridContentTelemetry.js (5736 bytes)
 - GCTelemetry.jsm (5352 bytes)

That's about 70k.

Chris H-C :chutten

Comment 1

•

6 years ago

Next steps: break down where the memory lives to see if there are quick wins. (and to familiarize ourselves with tooling)

Priority: -- → P2

Eric Rahm [:erahm]

Reporter

Comment 2

•

6 years ago

Kris, it sounded like you had some thoughts on easy wins for the JS portion of this memory.

Flags: needinfo?(kmaglione+bmo)

Kris Maglione [:kmag]

Comment 3

•

6 years ago

I'm not sure about quick wins, but I have some thoughts.

So, basically, there are two parts to the telemetry overhead in the content process, as Eric notes:

- The JS framework, which is (conservatively) around 70K.
- The C++ framework, which reports about 150K.

As far as the C++ side goes, I'm not super familiar with the details, but I think that it should be mostly a stub that keep track of much data, and just offloads as much to the parent process as possible. That may mean we have to restrict parts of the API in content processes so that they can't query the current state of the telemetry DB, but that's actually a good thing: we definitely don't want compromised content processes to see any more user data than necessary.


As far as the JS overhead goes, I think we should probably migrate the current mix of XPIDL, JSAPI, and JS interfaces to a pure WebIDL interface.

That would actually massively simplify the C++ service, which currently defaults to XPIDL and then falls back to hand-crafted JSAPI objects for things that XPIDL doesn't do well. WebIDL makes all of these things easy.

For the JS side of things, it wouldn't simplify things much, but it also wouldn't make them much more complicated either, and would probably give us a significant performance win. And it would be nice to have everything in one unified interface rather than spread across a native service and a bunch of helper JSMs.

Flags: needinfo?(kmaglione+bmo)

Georg Fritzsche [:gfritzsche]

Comment 4

•

6 years ago

(In reply to Kris Maglione [:kmag] from comment #3)
> I'm not sure about quick wins, but I have some thoughts.
> 
> So, basically, there are two parts to the telemetry overhead in the content
> process, as Eric notes:
> 
> - The JS framework, which is (conservatively) around 70K.
> - The C++ framework, which reports about 150K.
> 
> As far as the C++ side goes, I'm not super familiar with the details, but I
> think that it should be mostly a stub that keep track of much data, and just
> offloads as much to the parent process as possible. That may mean we have to
> restrict parts of the API in content processes so that they can't query the
> current state of the telemetry DB, but that's actually a good thing: we
> definitely don't want compromised content processes to see any more user
> data than necessary.

For the C++ part, we should first look into what parts are using up that memory.
Which tools should we use to investigate?

> As far as the JS overhead goes, I think we should probably migrate the
> current mix of XPIDL, JSAPI, and JS interfaces to a pure WebIDL interface.

Are you suggesting to move logic from the JSMs into the Telemetry C++ code?
Or how else would we achieve having a single interface?

Overall, having a single interface to the Telemetry functionality would be great.
Some consideration would need to be given to any legacy Mozilla addons using the current interfaces (Shield, any system search system addon, ...?).

Flags: needinfo?(kmaglione+bmo)

Georg Fritzsche [:gfritzsche]

Updated

•

6 years ago

Priority: P2 → --

Kris Maglione [:kmag]

Comment 5

•

6 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> For the C++ part, we should first look into what parts are using up that
> memory.
> Which tools should we use to investigate?

For the memory that's currently being reported, your best bet is probably to update the memory reporter to report the individual parts of the total usage separately rather than as a single chunk:

https://searchfox.org/mozilla-central/rev/1193ef6a61cb6e350460eb2e8468184d3cb0321d/toolkit/components/telemetry/Telemetry.cpp#223-226
https://searchfox.org/mozilla-central/rev/1193ef6a61cb6e350460eb2e8468184d3cb0321d/toolkit/components/telemetry/Telemetry.cpp#1745-1766

You can also use DMD, but that requires a special build:

https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD

> > As far as the JS overhead goes, I think we should probably migrate the
> > current mix of XPIDL, JSAPI, and JS interfaces to a pure WebIDL interface.
> 
> Are you suggesting to move logic from the JSMs into the Telemetry C++ code?

That's what I'm suggesting, yes.

> Some consideration would need to be given to any legacy Mozilla addons using
> the current interfaces (Shield, any system search system addon, ...?).

Sure. But I imagine they would be fairly easy to update.

Flags: needinfo?(kmaglione+bmo)

Georg Fritzsche [:gfritzsche]

Comment 6

•

6 years ago

Given what we know about the required timeline here, we're planning to:
- do an investigation and planning on what we need to do in Q3
- possibly address easy wins in Q3
- take more involved work in early Q4

Priority: -- → P3

Jan-Erik Rediger [:janerik]

Updated

•

6 years ago

Blocks: 1475556

Eric Rahm [:erahm]

Reporter

Comment 7

•

6 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #6)
> Given what we know about the required timeline here, we're planning to:
> - do an investigation and planning on what we need to do in Q3
> - possibly address easy wins in Q3

Georg, did any of the planning end up happening? This is still one of our top bugs for fission memshrink.

Flags: needinfo?(gfritzsche)

Kris Maglione [:kmag]

Comment 8

•

6 years ago

(In reply to Eric Rahm [:erahm] from comment #7)
> This is still one of our top bugs for fission memshrink.

And, notably, bug 1465251 caused a significant content JS memory regression because it added code to TelemetrySession.jsm, and all of TelemetrySession.jsm gets loaded into every content process, even though the majority of it is not needed.

Jan-Erik Rediger [:janerik]

Comment 9

•

6 years ago

We addressed the dynamic memory usage of the C++ parts in content processes in bug 1477213, which should take away a big chunk of memory.
I looked into memory savings for the individual JavaScript parts, which is now all tracked under bug 1475556.

We plan to prioritize and schedule this individual work for the coming quarter, I will drive this forward.
Part of that is actually avoiding loading TelemetrySession, making the Prio changes a non-issue.

Flags: needinfo?(gfritzsche)

Eric Rahm [:erahm]

Reporter

Comment 10

•

6 years ago

(In reply to Jan-Erik Rediger [:janerik] from comment #9)
> We addressed the dynamic memory usage of the C++ parts in content processes
> in bug 1477213, which should take away a big chunk of memory.
> I looked into memory savings for the individual JavaScript parts, which is
> now all tracked under bug 1475556.
> 
> We plan to prioritize and schedule this individual work for the coming
> quarter, I will drive this forward.
> Part of that is actually avoiding loading TelemetrySession, making the Prio
> changes a non-issue.

That's great progress! The C++ side is down to ~29KB:

> ├──────28,624 B (00.21%) -- telemetry
> │      ├──20,032 B (00.14%) ── event/data
> │      ├───8,192 B (00.06%) ++ scalar
> │      ├─────400 B (00.00%) ── impl

The regression Kris noted was 8KB, I'll update the overhead to reflect how things are now. We might just want to close this out in favor of your new meta bug.

Whiteboard: [overhead:220k] → [overhead:100k]

Kris Maglione [:kmag]

Updated

•

6 years ago

Depends on: 1505522

Chris H-C :chutten

Comment 11

•

5 years ago

I spent a little time investigating whether expired metrics are a memory burden. They shouldn't be, conceptually, since all operations on them are ignored... but there are a lot of them.

There are over 300 expired Histograms, about 30 expired scalars, and 0 expired events. So let's look deeper into the cost of maintaining expired Histograms.

To error on unknown Histograms but no-op on expired histograms we need to keep IDs and Strings for expired histograms around. So no savings there unless we're willing to change semantics (for instance, if we didn't error on any unknown or expired histogram, or only errored on unknown histograms in debug mode we could drop a bunch of strings from the table). But we do have some data structures proportional in size to the number of histogram IDs there are, expired and non.

We have

gHistogramStorage and gKeyedHistogramStorage which are pointer arrays for speedy access. Both are of size HistogramCount * ProcessTypeCount and are allocated on the heap. They're only built in the parent process, though.
gHistogramInfos and gHistogramBucketLowerBoundIndex are const arrays that could have expired info removed. gHistogramInfos is an array of POD structs HistogramCount in length and about... 344b apiece? gHistogramBucketLowerBoundIndex is an int16_t[HistogramCount].
gHistogramRecordingDisabled is a bool[HistogramCount] statically allocated in each process.

Summing it all up, the extra 300 expired histograms are costing us 230kbit in the parent process heap, 110kbit readonly shared, and 2kbit per process. In a 10-process world that's 360kbit. In a 100-process world, 540kbit. In a 1000-process world, 2.3mbit.

To me this doesn't appear to be worth the effort (Too much squeeze, not enough juice), but I'm documenting it here in case it does turn out to be worth it.

(( The plan was to sort the histograms by expiry and not allocate anything from before the current version. Expiry checks become aId >= UnexpiredHistogramCount instead of strcmp, and all the storage mentioned above would be of length UnexpiredHistogramCount. ))

Nathan Froyd [:froydnj]

Comment 12

•

5 years ago

(In reply to Chris H-C :chutten from comment #11)

Summing it all up, the extra 300 expired histograms are costing us 230kbit in the parent process heap, 110kbit readonly shared, and 2kbit per process. In a 10-process world that's 360kbit. In a 100-process world, 540kbit. In a 1000-process world, 2.3mbit.

To me this doesn't appear to be worth the effort (Too much squeeze, not enough juice), but I'm documenting it here in case it does turn out to be worth it.

How much work is it to make this overhead go away? Saving 200kb of memory (even if only in the parent process), and reducing read-only tables by 110kb seems totally worth it,.

Flags: needinfo?(chutten)

Chris H-C :chutten

Comment 13

•

5 years ago

One thing I just noticed is that, if we have to preserve semantics, we can't actually remove the expired histograms' HistogramInfo from gHistogramInfos because we still need to know that they're expired, not just unknown. If we're willing to treat both expired and unknown histograms referenced by name as though they're expired (no error, just no-ops) then we can increase our savings and actually make this worth our while.

Because then not only do we get the gHistogramInfos savings I erroneously mentioned earlier, we also save their names from gHistogramStringTable, so that's sum(len(expired histogram names)) * sizeof(char) from read-only tables. Let's say ~25 chars per name, 300 names: 60kbit

And a bit of other savings I missed before is that HistogramInfo would no longer need expiration_offset (a uint32_t) so that's 32bit * 1500 (so 48kbit). We also don't need the version strings in the string table, so that's another sum(len(unique version strings)) * sizeof(char). (both from the read-only tables). Let's say 35 unique version strings of about 2 characters plus \0: 840bit

Totalled up that adds another ~110kbit in the read-only tables we could save.

As for the work required, well...

It requires making gen_histogram_data perform the expiry checks and omitting expired data. Which is the easy part.

But then it gets tricky. We need to dive deep into TelemetryHistogram to change how some things are handled. We assume if you have a HistogramID that's good enough to get you an index into certain storages, and that will no longer be universally true.

I'm not sure how tricky that'd be.

I think it might be worth going into it if we changed our semantics so that unknown histograms referenced by their string name are handled the same way as expired histograms.

ni?gfritzsche for his opinions on changing the semantics as described.

Flags: needinfo?(chutten) → needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Comment 14

•

5 years ago

Is this question still current?

Flags: needinfo?(gfritzsche) → needinfo?(chutten)

Chris H-C :chutten

Comment 15

•

4 years ago

Yes. Do you have opinions on changing getHistogramById(id) to always return some sentinel Histogram instance, even if the id is utter nonsense?

Flags: needinfo?(chutten) → needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Comment 16

•

4 years ago

I don't think i have good context for this right now, but i'm wondering how this works out for error handling, when specifying an invalid histogram id.
Getting feedback on a proposed solution from Firefox JS engineers on how that would affect them seems important.

Flags: needinfo?(gfritzsche)

Chris H-C :chutten

Comment 17

•

4 years ago

That's exactly the sort of wisdom I was hoping for.

Alrighty, this is prioritized and questions are answered. To the backlog with you, bug!

Kris Maglione [:kmag]

Updated

•

4 years ago

Depends on: 1645862

Kris Maglione [:kmag]

Updated

•

4 years ago

Depends on: 1648178

Chris Peterson [:cpeterson]

Updated

•

2 years ago

Depends on: 1482080

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Gregory Pappas [:gregp]

Updated

•

2 months ago

No longer depends on: 1482080