Closed Bug 1350969 Opened 7 years ago Closed 5 years ago

[meta] [Performance] New netmonitor can be slow when a lot of requests are happening in the same time

Categories

(DevTools :: Netmonitor, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1487906

People

(Reporter: julienw, Unassigned)

References

(Depends on 11 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [netmonitor])

Attachments

(1 file)

See profile here: https://perfht.ml/2oqj0Z0

Especially look at what happens in the main thread.
I could very easily make my Firefox hangs for several seconds.

I'll try to do a testcase.
Stable chokes a little bit on the same page too, but on nightly this is huge.
Version: unspecified → Trunk
Whiteboard: [netmonitor][triage]
Flags: qe-verify?
Flags: qe-verify? → qe-verify-
Priority: -- → P3
Summary: New netmonitor can be slow when a lot of requests are happening in the same time → [Performance] New netmonitor can be slow when a lot of requests are happening in the same time
Whiteboard: [netmonitor][triage] → [netmonitor]
Priority: P3 → P2
Assignee: nobody → gasolin
Status: NEW → ASSIGNED
Iteration: --- → 55.3 - Apr 17
Priority: P2 → P1
julien, 

Did you set the flag `ac_add_options --enable-debug-js-modules` in mozconfig while build nightly? It will use debugging version of react, which is pretty slow when there are lots of components.
Flags: needinfo?(felash)
No I didn't.

This happened to me on a page with a lot of images being requested in the same time.
Flags: needinfo?(felash)
Attached file testcase-images.html
STR:
1. Open the file.
2. Open the network monitor.
3. Reload the page
4. Click the button, several times, to look for the effect of having a lot of lines.

=> notice this is slower as there are more lines. Firefox interface is having small freezes while the netmonitor renders.

Stable doesn't have the issue and is quite happy to render as many lines as you wish.
Thanks julien for more details and the cute test page!


I found the request-list in 52 is still build with XUL, and the react version of request-list is landed in 53 (after Hawaii All Hands).

It means we need improve the overall request-list component performance now.
request-list-content.js do the most heavy work, componentWillUpdate in request-list-content.js takes 50%(156ms/306ms) time in CPU heavy peak
http://searchfox.org/mozilla-central/source/devtools/client/netmonitor/src/components/request-list-content.js#121

isScrolledToBottom function might take too much time for doing getBoundingClientReact twice. (that will trigger the browser to synchronously calculate the style and layout)
https://gist.github.com/paulirish/5d52fb081b3570c81e3a

Should find better way to detect if the list is scrolled to the bottom.


Instead of set node.scrollTop in componentDidUpdate, we can also replace it with scrollIntoView API
https://developer.mozilla.org/en/docs/Web/API/Element/scrollIntoView
(In reply to Fred Lin [:gasolin] from comment #6)
> request-list-content.js do the most heavy work, componentWillUpdate in
> request-list-content.js takes 50%(156ms/306ms) time in CPU heavy peak
> http://searchfox.org/mozilla-central/source/devtools/client/netmonitor/src/
> components/request-list-content.js#121
> 
> isScrolledToBottom function might take too much time for doing
> getBoundingClientReact twice. (that will trigger the browser to
> synchronously calculate the style and layout)
> https://gist.github.com/paulirish/5d52fb081b3570c81e3a
> 
> Should find better way to detect if the list is scrolled to the bottom.
> 
> 
> Instead of set node.scrollTop in componentDidUpdate, we can also replace it
> with scrollIntoView API
> https://developer.mozilla.org/en/docs/Web/API/Element/scrollIntoView
This might help and we should try it. But, I think that the real problem
here is that layout recalculation takes so much time. I believe that we
need to use table layout (or perhaps grid layout) instead of flex-box.
This will speed up layout calc and also the isScrolledToBottom() function.

Honza
(In reply to Fred Lin [:gasolin] from comment #6)
> isScrolledToBottom function might take too much time for doing
> getBoundingClientReact twice. (that will trigger the browser to
> synchronously calculate the style and layout)
> https://gist.github.com/paulirish/5d52fb081b3570c81e3a
> 
> Should find better way to detect if the list is scrolled to the bottom.
I wonder if this would be faster:

let bottom = contentEl.getBoxQuads({relativeTo: lastChildEl})[0].bounds.bottom

if we know the height of one request in the list (22px), then if bottom is < height we are not scrolled at the bottom.
This might trigger only one sync reflow instead of 2. Although I'm not sure how much faster this would make this since the second reflow might already be optimized.
There are other potential perf wins to be made in the requests list. At each update, we also restyle the whole graph because we change the values of 2 custom css properties: --timings-scale and --timings-rev-scale.
This is used to make sure the waterfall graph always fits nicely into the viewport while not having to calculate anything in JS. Indeed, if an item lasts for 127ms we draw it to be 127px, and we just then use CSS transforms to scale the whole graph so it fits (we also reverse-scale the text so it doesn't appear squished).
I think that doing this calculation in JS instead and having the graph use % CSS sizes instead would avoid this and might make the refresh of the list faster.
(In reply to Patrick Brosset <:pbro> from comment #9)
> There are other potential perf wins to be made in the requests list. At each
> update, we also restyle the whole graph because we change the values of 2
> custom css properties: --timings-scale and --timings-rev-scale.
> This is used to make sure the waterfall graph always fits nicely into the
> viewport while not having to calculate anything in JS. Indeed, if an item
> lasts for 127ms we draw it to be 127px, and we just then use CSS transforms
> to scale the whole graph so it fits (we also reverse-scale the text so it
> doesn't appear squished).
> I think that doing this calculation in JS instead and having the graph use %
> CSS sizes instead would avoid this and might make the refresh of the list
> faster.

Agree, the scaling must be removed.

I would really like to use similar (if not exactly the same) approach as in
Firebug since Firebug's UI is visibly faster. No JS involved in resizing and
no complex layout (=> fast reflow calc).

1) Define bars as simple DIVs
https://github.com/firebug/firebug/blob/master/extension/content/firebug/net/netReps.js#L460-L471

2) Use position: absolute 
https://github.com/firebug/firebug/blob/master/extension/skin/classic/net.css#L327-L338

3) Calculate left position and width in JS once
https://github.com/firebug/firebug/blob/master/extension/content/firebug/net/netPanel.js#L982-L987

Note that we have bug 1308695 for these things

Honza
Depends on: 1350227
Depends on: 1356957
Summary: [Performance] New netmonitor can be slow when a lot of requests are happening in the same time → [meta][Performance] New netmonitor can be slow when a lot of requests are happening in the same time
Assignee: gasolin → nobody
Status: ASSIGNED → NEW
Iteration: 55.3 - Apr 17 → ---
Keywords: meta
Priority: P1 → --
Summary: [meta][Performance] New netmonitor can be slow when a lot of requests are happening in the same time → [meta] [Performance] New netmonitor can be slow when a lot of requests are happening in the same time
Mark as meta bug for tracking performance relevant issues in network monitor.
Depends on: 1349198, 1349173, 1308695
Some other performance related reports (done yet before we started converting it to HTML):

Bug 1176050 - Firefox liveload with lots of connections slow when netmonitor is open
Bug 976823 - NCIX.com sale flyer page slows Firefox to a crawl with netmonitor open
Bug 956452 - Network monitor is janky on sites with a ton of requests

Honza
Depends on: 1176050, 976823, 956452
Depends on: 1358054
While testing the sample page, I saw 200+ error log in content console: 

"cant convert undfined to object spread" while doing updateRequest. Will fix at bug 1356957
See also this comment about what could be wrong on the Network monitor backend:
https://bugzilla.mozilla.org/show_bug.cgi?id=1389864#c8

Honza
Blocks: 1389864
Depends on: 1403884
Depends on: 1404117
Depends on: 1404118
Depends on: 1404130
A quick summary about performances of netmonitor seen from DAMP over the last year:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=31536000&series=mozilla-central,1470318,1,1

A significant drop (-27%) occurred in late April on this changelog:
  https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e17cbb839dd225a2da7e5d5bec43cf94e11749d8&tochange=62b649c6b314f756f21cb95f2b0d491e2664e944
  Most likely bug 1349173.

Then an increase (+34%) in early May on this changelog:
  https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9348b76977e833f108cf77dff75b0fab887a2fc1&tochange=e7bf9443be2c4a5187c37440e35f3526148d7fa8
  Most likely bug 1344160.

Followed by another drop (-14%) in mid May:
  https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=baf05f61bc14fdf45511bc1165ce76daa08c5c0f&tochange=2fa6931995b23f1f39752385b6689dc6b8d94c1b
  Most likely bug 1365635.

And finally, a more recent significant regression (+44%) late August on this changelog:
  https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3529b653ede26f990eb7320649015294ad0f8e76&tochange=1b4c59eef820b46eb0037aca68f83a15088db45f
  Unfortunately, I see no bug that could explain that. There is bug 1341305, bug 1333759 and bug 1378856 in it.
  On the platform side, there is bug 863246 which look suspicious.
  I tried to re-run DAMP for bug 863246 and it doesn't seem to report a regression (but I have limited trust in this as I don't know exactly if my comparison is correct)
  https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=autoland&originalRevision=40bc1d4ce54b34b3851d52576e1b16f68af3bd48&newProject=autoland&newRevision=bd9c5474a5e4e6391a3ae74574885d1361599184&originalSignature=9663a154caf9b713fb951c8db24518c7e0ceddce&newSignature=9663a154caf9b713fb951c8db24518c7e0ceddce&framework=1

Honza, any idea about September regression?
Otherwise I think we should learn from this. It tells us that changes like bug 1349173 and bug 1365635 have a significant impact on performances. So if you have similar ideas, it should be prioritize accordingly.
Flags: needinfo?(odvarko)
Flags: needinfo?(odvarko)
Flags: needinfo?(odvarko)
Depends on: 1404913
Depends on: 1404917
Flags: needinfo?(odvarko)
(In reply to Alexandre Poirot [:ochameau] from comment #15)
> Honza, any idea about September regression?
What September regression do you mean, it isn't in the list

> Otherwise I think we should learn from this. It tells us that changes like
> bug 1349173 and bug 1365635 have a significant impact on performances. So if
> you have similar ideas, it should be prioritize accordingly.
Absolutely agree. We'll go over this with Ricky and Fred.
This is great analysis Alex, thanks!

Honza
(In reply to Jan Honza Odvarko [:Honza] from comment #16)
> (In reply to Alexandre Poirot [:ochameau] from comment #15)
> > Honza, any idea about September regression?
> What September regression do you mean, it isn't in the list

Sorry, I meant late August one. (On perherder graph it seems to be in september)

The one with this changelog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3529b653ede26f990eb7320649015294ad0f8e76&tochange=1b4c59eef820b46eb0037aca68f83a15088db45f
Depends on: 1356536
Depends on: 1408085
Depends on: 1408182
Depends on: 1408737
Depends on: 1411852
Depends on: 1419350
Stop using ImmutableJS in NetMonitor:

* Bug 1419366 - NetMonitor: Stop using ImmutableJS in filters reducer
* Bug 1419367 - NetMonitor: Stop using ImmutableJS in sort reducer
* Bug 1419368 - NetMonitor: Stop using ImmutableJS in timing-markers reducer
* Bug 1419369 - NetMonitor: Stop using ImmutableJS in ui reducer

As soon as ImmutableJS is removed from NetPanel we can stop
loading the library entirely:
* Bug 1419370 - Do not load ImmutableJS library int the Net panel iframe

Honza
Depends on: 1419415
Depends on: 1420289
Depends on: 1420291
Depends on: 1421926
Another relevant profile reported on discourse [1]: https://perfht.ml/2BvUu0i

The users reports page reload taking ~12 seconds when netmonitor is opened, 3 seconds if using another panel (e.g. console). There are 281 requests during the page reload.

Looking at the profile, top devtools offenders are:
- componentDidUpdate
- parseCertificateInfo
- isDocumentRTL (which was inlined in recent versions of the file, as https://searchfox.org/mozilla-central/rev/b1e0ae2573e88391a7ed92407568be77994c9317/devtools/client/netmonitor/src/widgets/WaterfallBackground.js#86-87 , but the perf hit should be identical)
- isScrolledToBottom

isDocumentRTL could probably be memoized? parseCertificateInfo as well?

User also tested on Nightly which seemed to slightly improve the performance (9s instead of 12s) but still not usable. 

[1] https://discourse.mozilla.org/t/devtools-slow-and-occasionally-unresponsive-on-some-sites/19936/37
(In reply to Julian Descottes [:jdescottes][:julian] from comment #19)
> Another relevant profile reported on discourse [1]: https://perfht.ml/2BvUu0i
> 
> The users reports page reload taking ~12 seconds when netmonitor is opened,
> 3 seconds if using another panel (e.g. console). There are 281 requests
> during the page reload.
> 
> Looking at the profile, top devtools offenders are:
> - componentDidUpdate

This is slow because we are updating most react components too frequently (at least twice per request for top level components) *and* netmonitor.html reflow is slow.

> - parseCertificateInfo

This is actor code. I'm starting a write diagram of it to help refatoring all the actor codebase to be lazy. parseCertificateInfo/_getSecurityInfo is one example of many things that can/should be computed lazily now that client fetched that lazily.
Looking at raw profiler numbers, and BHR reports, actor codebase has a limited impact compared to frontend codebase. => optimizing frontend has more impact than optimizing the backend.

> - isDocumentRTL (which was inlined in recent versions of the file, as
> https://searchfox.org/mozilla-central/rev/
> b1e0ae2573e88391a7ed92407568be77994c9317/devtools/client/netmonitor/src/
> widgets/WaterfallBackground.js#86-87 , but the perf hit should be identical)

I already identified that in various profiles.
I tried fixing that, but DAMP reports no win:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=a11f5bd42b466ce7f64a815f6e9cd0ddab92a575&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&filter=netmo&framework=1&selectedTimeRange=172800
isDocumentRTL also appears in DAMP profiles. I'm wondering if that's the profiler adds a virtual overhead to this??
I would be interested to know if that patch has any significant impact on a real user scenario.

> - isScrolledToBottom

This is a known issue. Still needs to be seriously looked into.
There is valuable comments about this in bug 1416714.
Long story short: we should avoid accessing/setting scrollTop/scrollHeight unless it is really *really* necessary. Julien suggested a way to try that.
This is tightly related to componentDidUpdate. We pile up even more reflows with this isScrolledToBottom and reflows are slow.

Reflows of netmonitor document are slow. This may be the topmost perf issue of netmonitor, as it was for webconsole.
And it is slow for weird/unexpected reasons. The profiler doesn't help figuring out why reflows are slow. Nor I know any tool to help figuring this out.
Via bug 1421926, we discovered that setting *one* particular column size in pixels rather than in percents, brings a 6% win:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=ee78e217207bf50b8cf31d85d652fc406221488c&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&showOnlyImportant=1&framework=1&selectedTimeRange=172800
After that, I tried simplifying DOM/CSS hierarchy of the netmonitor, and we get 5% win by simplifying the <table> hierarchy used in netmonitor:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=3e211f9bc9f945b884489c0bb4c69a1843300278&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&showOnlyImportant=1&framework=1&selectedTimeRange=172800
netmonitor is a complex hiearchy involving "position: absolute" elements with "flexbox" and advanced usages of "display: table-*".
Depends on: 1425964
Depends on: 1425978
Another lead: I noticed that the canvas is drawn synchronously in componentDidUpdate [1]. Even if there is a shallow check in WaterfallBackground [2] this could lead to too much canvas drawing, especially in a synchronous way. Maybe you could try using requestAnimationFrame like we do in perf.html [3]. Also we don't do this in perf.html, but I think we shouldn't register a new rAF request if another is already here.

[1] https://searchfox.org/mozilla-central/rev/ff462a7f6b4dd3c3fdc53c9bc0b328f42a5b3d2b/devtools/client/netmonitor/src/components/RequestListHeader.js#87
[2] https://searchfox.org/mozilla-central/rev/ff462a7f6b4dd3c3fdc53c9bc0b328f42a5b3d2b/devtools/client/netmonitor/src/widgets/WaterfallBackground.js#45
[3] https://github.com/devtools-html/perf.html/blob/be2bd0561f8561aa020afa54c982635265259041/src/components/shared/chart/Canvas.js#L58-L68
Depends on: 1431132
Depends on: 1431395
Depends on: 1432803
Depends on: 1434848
Depends on: 1434855
Product: Firefox → DevTools
Depends on: 1471158

Closing in favor of the newer meta.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
Depends on: 1571286
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: