Closed Bug 1400000 Opened 7 years ago Closed 4 years ago

High memory usage in GPU process with many YouTube tabs

Categories

(Core :: Audio/Video: Playback, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox55 --- wontfix
firefox56 --- fix-optional
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- ?

People

(Reporter: zxspectrum3579, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [MemShrink:P2][gfx-noted])

Crash Data

Attachments

(12 files)

This bug was filed from the Socorro interface and is 
report bp-51d561ad-b36d-4928-9af6-dd4e71170914.
=============================================================

WHAT I DID:

I had a regular long browsing session, but this time with many YouTube tabs opened during it -- probably as much as 50-100.


WHAT HAPPENED:

The browser GUI  has started to slow down, but this was expected as FireFox still can not freeze the background Javascript code from all of the active/loaded pages to prioritize the responsiveness, so it was expected.

What was not expected that at one point the pages have stopped rendering, started to look blank, and, simultaneously, FireFox has started to leak, and its the size in memory has started to go like crazy until the total allocated memory has reached 64 GB that has forced Windows 10 to kill the biggest of FireFox processes. See the full story as it has progressed in the annotated screenshots that are attached to the bug.


WHAT SHOULD HAVE HAPPENED:

Normal continuous operation without a crash. (Ideally, the GUI of FF should never slow down as the background Javascript code should be put to sleep/suspended at least for tabs that were last active like ten or twenty tabs ago, but this is a different issue.)


STEPS TO REPRODUCE:

Just opening like a 100 of YouTube videos might be not enough as by the time I started opening those tabs it was already an old session.
Severity: critical → normal
Summary: Crash in moz_abort | arena_run_split | arena_malloc_large | je_malloc | NodePool::Builder::Add → A bad leak in FF 55.03 [Crash in moz_abort | arena_run_split | arena_malloc_large | je_malloc | NodePool::Builder::Add]
If you can reproduce this issue, it would be good to save an about:memory report (anonymized if you prefer) to get a sense of where the memory is going. You'd have to take the memory report long before the memory usage gets really bad. Maybe with so many YouTube videos open, we're keeping data around for all of them, and that adds up? Or you are experiencing a memory leak of some other sort.
Component: JavaScript: GC → XPCOM
Flags: needinfo?(zxspectrum3579)
Whiteboard: [MemShrink]
It is a sort of break-even point. I opened and watched YouTube normally, but then the pages stop rendering at all, and this is when I decided to look at Task Manager to see what is happenning to FireFox, and I saw it blowing up uncontrollably in real time.

Thus it seems that FF became stuck with some endless Malloc loop that, so I already understood that I will crash soon due to OOM and, hence, I made the screenshots. I forgot to do about:memory this time, but if I will face it yet again I will try to get it.

By the way, is it possible to turn on the setting that memory's structure was included in the crash data along with raw dump, so it would not depend whether I manually manage to do it or not next time?
Flags: needinfo?(zxspectrum3579)
By the way, Andrew, do you know how FireFox deals with the memory management?

It is normal that I had three processes, and during the leak two of them became giant, and at the end the bulk of the size of one of those two processes was moved to the single process that have become a sole giant of the three (as seen in one of the screenshots)?

Why FireFox move those huge data chunks from one process to another, how it does it? Or there is no "move", and it just coincided that one of the priorly huge processes lost all those giant GBs and another one has blown up spectacularly even more?
I don't think we do anything as fancy as rebalancing pages that have already loaded from one process to another. If you are downloading a lot of data, it starts out in the parent process, then it is copied into the child process, then freed in the parent. Maybe that is what you are seeing.
(In reply to User Dderss from comment #5)
> By the way, is it possible to turn on the setting that memory's structure
> was included in the crash data along with raw dump, so it would not depend
> whether I manually manage to do it or not next time?

We do periodically take a memory report, so it might be in your crash report. It won't be a snapshot from at the time of the crash. But I'll try to take a look at the crash report.
I wonder if this is a dup of bug 1398188.

As Andrew said, an about:memory report would be really helpful [1]. Can you:

1) Attach an about:memory report
2) Provide the output from about:support
3) Test in safe-mode [2] to see if the problem reproduces with add-ons disabled

[1] https://developer.mozilla.org/docs/Mozilla/Performance/about:memory
[2] https://support.mozilla.org/kb/troubleshoot-firefox-issues-using-safe-mode
Flags: needinfo?(zxspectrum3579)
Thanks; please look at the crash report, if you will have time. I chose to generate fat crash reports in the FF settings, so maybe you will see something there.

The description of the #bug 1398188 fits to what has happened to me, except for there is seemingly no artificial blow up until a certain break point. The leak starts only after one of the tabs fails to render (I am not opening any new tabs any more: it would be useless, it would not work), and it grows very quickly, not during a couple of hours. Though the reporter has wrote, if I understand it right, that at times it goes to the Windows' warning that it has to close some program quickly, which was my case.

It will take few days before I will be able to see this phenomenon again; I will post my "about:memory" and "about:support" from that time.
Flags: needinfo?(zxspectrum3579)
Attached file memory-report.json.gz
I faced the leak again, and this time I had the possibility to take a memory snapshot.
Though maybe I made a mistake as I selected to restart the browser (I did not want to wait until the crash) right after ordering the memory footprint, so it might have tempered with the memory snapshot as the restart comman tries to shut down the allocations.
This is a case of me doing nothing other than just watching YouTube videos, switching tabs. I just had a temporary FF memory blow out with no apparent reason, but on this occasion FF managed to scale it back before getting to OOM crash.
I did nothing at all with the browser at the time, but it decided to blow up on me just out of the blue.

This time it has reached OOM crash: 

https://crash-stats.mozilla.com/report/index/f6aa9bfb-3bca-481c-ac6c-a987c0170918 -- It looks like the specific crash is the same as #bug 1350042.

But maybe this bug I filed also connected to it? Though the signature and symptoms are different. This #bug 1400000 I reported is not recoveable, it make FireFox completely dead, while the bug I am referencing in this post, the newest crash of mine, only kills one of the working FireFox processes, but not others, so I continue to use the same FireFox session right now when I am writing this comment.
Thanks for the memory report. Your about:memory report contains this in the GPU process:
4,273.10 MB (100.0%) -- explicit
├──4,234.61 MB (99.10%) ── heap-unclassified
└─────38.49 MB (00.90%) ++ (5 tiny)

Your OOM crash is also in the GPU process, so I think they are related. That's different than bug 1398188.
Component: XPCOM → Graphics
Summary: A bad leak in FF 55.03 [Crash in moz_abort | arena_run_split | arena_malloc_large | je_malloc | NodePool::Builder::Add] → High memory usage in GPU process with many YouTube tabs
See Also: → 1350042
Thanks, but is it different from #bug 1350042? The symptoms are different, but I am not professional, so I am not sure.

Also, it is alright that FireFox process becomes 100% busy with all of the background scripts just from many tabs open -- not just YouTube, but any tabs that have AJAX comment system or other background scripts -- and, what is worse, is unable to prioritise the GUI of current tab versus background tabs what leads to FF being poorly responsive?

For example, trying to pause a YT video playback only works immediately if you have just few tabs open overall, and it will take seconds (!) if you have a dozen or couple of dozen loaded tabs. 

Or the fundamental changes that would make the GUI of a current tab an absolute priority are not yet implemented in the version 55? If so, in which version the quick degradation of GUI with all of the non-responsiveness and spinners when you switch between tabs will go away forever?

Thanks in advance.
Jean-Yves, did we change decoding with YouTube in 55?
Flags: needinfo?(jyavenard)
Priority: -- → P1
Whiteboard: [MemShrink] → [MemShrink][gfx-noted]
In 55 we enabled VP9 hardware decoder on Windows 10 machines.
We also changed the way surfaces are allocated, using nv12 D3D11 surfaces. Locking was also changed in 55. Bas can provide more details.
Flags: needinfo?(jyavenard)
Assignee: nobody → bas
Whiteboard: [MemShrink][gfx-noted] → [MemShrink:P1][gfx-noted]
Eric, can you please look at my question https://bugzilla.mozilla.org/show_bug.cgi?id=1400000#c17 above? Is the GUI slowdown I see during the extensive YouTube browsing tied to this leak or it is just a fundamental issue with non-prioritising active tab versus background JS in general? If so, then, maybe, when this bug will be resolved, YouTube player will not become so slow so quickly. (If not, then I will have to wait until changes to the architecture of FF will be done.)
A new crash with a different signature(OOM small), but I guess it is the same thing:
https://crash-stats.mozilla.com/report/index/ec38c739-1559-4e18-a1f2-370940170923
It turns out that the crash was really bad, so it had not one but two crash reports at the same time with a different signatures. The other one is: https://crash-stats.mozilla.com/report/index/6cce0e4d-cb03-4eda-9805-dd3590170923

It has "[@ OOM | unknown | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::BufferList<T>::AllocateSegment ]" signature, which is also cited in #bug 1402185 and #bug 1350042.
That looks like a compositor process crash, "followed" by a content process crash.  Not sure the former caused the later, but that seems to be the timing.  The compositor process OOM is fairly large (1920x1168x4), the content process one is 4k.
Thanks, Milan. 

But what about the slowdown? Is it tied to the bug or it is an architectural background JS handling issue that is not yet addressed in the browser?
Keywords: regression
More than likely due to the suspend background video being disabled in 55 and 56. So every single video takes space.

It's supposed to be re-enable in 57. I don't know the status of it.

Prior 55, everytime a video was running in the background we would have disabled the video and the memory usage would be then close to 0.
Flags: needinfo?(bwu)
In my case the flag "media.autoplay.enabled" is manually set to be false.
(In reply to Jean-Yves Avenard [:jya] from comment #25)
> More than likely due to the suspend background video being disabled in 55
> and 56. So every single video takes space.
> 
> It's supposed to be re-enable in 57. I don't know the status of it.
> 
> Prior 55, everytime a video was running in the background we would have
> disabled the video and the memory usage would be then close to 0.
Yeah and shutdown decoders is on on every nightly version. 
Now it has been enabled on 57 on bug 1401909.
Flags: needinfo?(bwu)
Are all the YT tabs playing? If only one tab is playing and others are paused, IIRC our dormant mechanism should be triggered on paused tabs.
Is this issue reproducible in 57?
Component: Graphics → Audio/Video: Playback
Flags: needinfo?(zxspectrum3579)
Priority: P1 → P2
I have not used FF 57 yet, I am on a release version, but so far on version 56 I did not see this bug happening. However, this bug happens only when YouTube videos are played consequentively in large quantities, and I did not do that yet this month. I will try to carry out the experiment this weekend.
Flags: needinfo?(zxspectrum3579)
I can't speak for anyone else, but in 57b14 I've started getting almost hourly crashes that might be due to overloaded GPU memory. First symptom is DOM objects disappearing, first a few and then the entire page but the background, then a tab crashes, and either it gets unusable enough that I shut it down, or it just terminates (no crash popup, just gone) at some point. Even after restarting just to reload this bug, the search screen I had open, and a page with an embedded video to check whether video increased it much, plus thumbnails from any unloaded tabs.

about:memory lies so hard. It says a grand total of 2.8MB is allocated on the GPU. Process Explorer says that 30+169+227+266+476 MB are allocated across FF processes, and 4+12+26+65+151 are actually committed. Over a gigabyte allocated for a handful of tabs. Between Chrome and other processes also competing for GPU resources, they're quickly exhausted. I'll upload my snapshot anyway.

FF56 only used maybe a quarter of the GPU resources at most, all within one process, so it was never a problem before the new renderer.
Attached file memory-report.json.gz
David, any ideas why about:memory might be lying to us?
Flags: needinfo?(dvander)
Jean-Yves, what's the status of background video in 57? This looks like it may be more serious of a problem than I initially thought?
Flags: needinfo?(jyavenard)
(In reply to Bas Schouten (:bas.schouten) from comment #34)
> Jean-Yves, what's the status of background video in 57? This looks like it
> may be more serious of a problem than I initially thought?

Blake in comment 27 states it was enabled in 57... So maybe it's something else...

Reading comment 17 however, refers to the JS / Ajax, those wouldn't be impacted by the decoder suspend...
Flags: needinfo?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #35)
> (In reply to Bas Schouten (:bas.schouten) from comment #34)
> > Jean-Yves, what's the status of background video in 57? This looks like it
> > may be more serious of a problem than I initially thought?
> 
> Blake in comment 27 states it was enabled in 57... So maybe it's something
> else...
> 
> Reading comment 17 however, refers to the JS / Ajax, those wouldn't be
> impacted by the decoder suspend...

Right, I was referring mainly to comment 31, to be fair, this is the only bug report to this extent I've seen on current beta. I'm going to unassign this bug just because I'm not clear if there's any graphics work here to be done.
Assignee: bas → nobody
For the Crash Signature, I have not seen any reports since 57. 
Anyone can still see this problem on 57 or after 57?
1445470 - [tracking] video playback performance
<https://bugzilla.mozilla.org/show_bug.cgi?id=1445470>
I did not witness the issue for a while, but then with FireFox 61 I started to occasionally experience a HUGE explosion of memory footprint during some video playbacks on YouTube -- at one point I even had to restart the whole system as it has become unresponsive when it ate all of the memory and then the HDD memory file.

A phenomenon similar to mine was reported on Reddit:

https://www.reddit.com/r/firefox/comments/8grw52/one_youtube_video_running_6gb_memory_41_cpu/

People advise in the thread to install a h264-fier, but it should not be necessary. Even if the cause of FF processes blowing up is VP8/9, it should not be fixed by making people install extensions.
The way how I understand that things go wrong is that YouTube video playback momentarily freezes, stumbles because FF process/OS become unresponsive doing some absolutely modal operations.

If I will not stop the playback and just continue the whole OS would die to the point when it will not react to mouse movements or anything else at all.

If I do stop the playback the whole thing calms down until some point later when it "explodes" again. I will try to make a memory snapshot the next time when things will be crazy.
Attached file memory-report.json.gz
Blocks: 1202237
Whiteboard: [MemShrink:P1][gfx-noted] → [MemShrink:P2][gfx-noted]

Closing because no crashes reported for 12 weeks.

Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: