Open Bug 1407691 Opened 7 years ago Updated 1 year ago

Run away memory use sending ArrayBuffers to workers

Categories

(Core :: JavaScript: GC, defect, P2)

55 Branch
defect

Tracking

()

Tracking Status
firefox57 --- ?
firefox58 --- affected

People

(Reporter: jonco, Unassigned, NeedInfo)

References

()

Details

(Whiteboard: [MemShrink:P3])

Attachments

(2 files)

STR:

 - in about:config create the gfx.canvas.willReadFrequently.enable preference and set it to true
 - download http://queze.net/goinfre/tmp/blob-testcase/Segment_0001-big.mp4
 - navigate to http://queze.net/goinfre/tmp/blob-testcase/extract-with-workers.html
 - drop video file on this page

Observed:

Memory use grows to > 5GB while page is running.

Probably we are not updating memory counters correctly when transferring array buffers.
(In reply to Jon Coppeard (:jonco) from comment #0)
> Probably we are not updating memory counters correctly when transferring
> array buffers.

We are doing this here, so there must be something else going on:

http://searchfox.org/mozilla-central/source/js/src/vm/ArrayBufferObject.cpp#1064
The test case triggers a ton of TOO_MUCH_MALLOC GCs, both in workers and on the main thread.
Priority: -- → P2
The testcase in comment 0 uses a mix of copied and transferred array buffers to the worker. Due to having tried to reduce a test case for another bug, I have noticed that the bug happens both if all the array buffers are transferred to the workers without copy, and if the array buffers are all copied to the workers.
Much simpler test case:

let worker = new Worker("data:text/javascript,");
let buffer = new ArrayBuffer(8000000);
setInterval(() => { worker.postMessage(buffer); }, 0);

You can try it at https://codepen.io/anon/pen/GMeKKR

This has growing memory usage until the content process crashes (likely an OOM crash).
I think Olli was looking at worker GC a while ago.  Maybe he has some ideas?
Summary: Run away memory use transferring video frames to workers → Run away memory use sending ArrayBuffers to workers
(In reply to Florian Quèze [:florian] [:flo] from comment #4)
> Much simpler test case

It's possible I minimized the test case a little bit too much as when using this version about:memory shows heap-unclassified growing faster than the ArrayBuffer objects in the worker.

In case it's helpful, here is where most of the heap-unclassified memory goes:

Unreported {
  8,878,969 blocks in heap block record 1 of 3,507
  36,368,257,024 bytes (36,368,257,024 requested / 0 slop)
  Individual block sizes: 4,096 x 8,878,969
  86.00% of the heap (86.00% cumulative)
  99.05% of unreported (99.05% cumulative)
  Allocated at {
    #01: replace_malloc (DMD.cpp:1303, in libdmd.dylib)
    #02: mozilla::BufferList<js::TempAllocPolicy>::AllocateSegment(unsigned long, unsigned long) (Utility.h:357, in XUL)
    #03: mozilla::BufferList<js::TempAllocPolicy>::WriteBytes(char const*, unsigned long) (BufferList.h:424, in XUL)
    #04: JSStructuredCloneWriter::writeArrayBuffer(JS::Handle<JSObject*>) (StructuredClone.cpp:972, in XUL)
    #05: JSStructuredCloneWriter::startWrite(JS::Handle<JS::Value>) (StructuredClone.cpp:1513, in XUL)
    #06: JSStructuredCloneWriter::write(JS::Handle<JS::Value>) (StructuredClone.cpp:1711, in XUL)
    #07: WriteStructuredClone(JSContext*, JS::Handle<JS::Value>, JSStructuredCloneData*, JS::StructuredCloneScope, JS::CloneDataPolicy, JSStructuredCloneCallbacks const*, void*, JS::Value const&) (StructuredClone.cpp:620, in XUL)
    #08: JSAutoStructuredCloneBuffer::write(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::CloneDataPolicy, JSStructuredCloneCallbacks const*, void*) (StructuredClone.cpp:2771, in XUL)
  }
}

First 4 DMD stacks: https://pastebin.mozilla.org/9070692
(In reply to Florian Quèze [:florian] [:flo] from comment #6)
I don't understand what this means, but needinfo'ing Steve because of structured cloning.
Flags: needinfo?(sphink)
(In reply to Florian Quèze [:florian] [:flo] from comment #4)
> Much simpler test case:
> 
> let worker = new Worker("data:text/javascript,");
> let buffer = new ArrayBuffer(8000000);
> setInterval(() => { worker.postMessage(buffer); }, 0);
> 
> You can try it at https://codepen.io/anon/pen/GMeKKR
> 
> This has growing memory usage until the content process crashes (likely an
> OOM crash).

Since nothing in the worker sees the MessageEvent or its ArrayBuffer payload, do we still have to wait for them to GC in this test case?
Whiteboard: [memshrink]
(In reply to Ben Kelly [:bkelly] from comment #8)
> (In reply to Florian Quèze [:florian] [:flo] from comment #4)
> > Much simpler test case:
> > 
> > let worker = new Worker("data:text/javascript,");
> > let buffer = new ArrayBuffer(8000000);
> > setInterval(() => { worker.postMessage(buffer); }, 0);
> > 
> > You can try it at https://codepen.io/anon/pen/GMeKKR
> > 
> > This has growing memory usage until the content process crashes (likely an
> > OOM crash).
> 
> Since nothing in the worker sees the MessageEvent or its ArrayBuffer
> payload, do we still have to wait for them to GC in this test case?

I don't understand the question. It seems to me like this is a case of a web page leaking memory. We don't know that the worker is never going to listen for those buffer messages, do we? Or some new worker started up later that decides to listen? (Sorry, I don't know the spec at all.)

needinfo? baku because this seems like a very plausible way to accumulate a ton of unclassified-heap memory, but I'm not sure where to inject the memory reporter.
Flags: needinfo?(sphink) → needinfo?(amarchesini)
(In reply to Steve Fink [:sfink] [:s:] from comment #9)

> It seems to me like this is a case of a web page leaking memory.

No, in comment 6 I said "It's possible I minimized the test case a little bit too much" and I updated the codepen to:

let worker = new Worker("data:text/javascript,onmessage=function(){}");
let buffer = new ArrayBuffer(8000000);
setInterval(() => { worker.postMessage(buffer); }, 0);

This listens for the messages in the worker, and still crashes with OOM.
Flags: needinfo?(sphink)
> This listens for the messages in the worker, and still crashes with OOM.

I managed to reproduce the crash, but it doesn't seem to happen during the StructuredCloneAlgorithm. Instead I see the crash in new MessageEvent, or other 'new' allocations. OOM seems correctly managed by MessageEvents and communicated to content via onerror.

About having a memory reporter for non-delivered messages: yes, this seems nice to have. Note that a similar issue can be triggered with MessagePort.
Flags: needinfo?(amarchesini)
Our theory here is that we start triggering the many GCs (as noted in comment 2) and these GCs are taking longer than 4ms.  This then causes the worker thread to fall behind servicing the message event queue.  Since the main thread is running open loop (and at a higher priority than worker threads), it likely just completely overwhelms the worker thread.

Does this happen on a real site?  My expectation is that sites would have more a closed loop where they wait for some results from the worker instead of constantly firing large buffers infinitely.  There is a temptation to tune GC to this test case, but right now its not a realistic work load.  

P3 for now.
Whiteboard: [memshrink] → [MemShrink:P3]
Florian asked me to look at the case in comment 0 a bit more.

I modified it a bit:

1. To call revokeObjectURL() to avoid any extra leaked blob URL data in the memory report.
2. To keep the worker threads alive after it completes

I also ran the test without the canvas pref suggested in comment 0.  Per my previous comment, I think part of the problem here is trying to run faster than we can run on the worker thread.

With those changes I found that we still sometimes ended up with ArrayBuffer memory left in the worker thread.

Looking a bit more I noticed that many of the ArrayBuffer values are transferred back out of the worker, but some are ignored.  I further modified the worker script to:

3. Transfer ignored frames back out of the worker and ignore them in the window.

This allows the worker memory to reduce to ~zero.

So there does seem to be some issue with our worker GC either not clearing all values or possibly not clearing transferred values.  For now a work around is to transfer the large arrays back out to get them properly cleaned up.  The proposed ArrayBuffer.transfer() mechanism would be even better to just immediately detach the buffer without postMessage().
Attached file index.html
This is my modified page based on comment 0 case.
Attached file worker.js
And the modified worker script.
Andrea, is it possible we are holding an extra ref to the ArrayBuffer somehow when its transferred?  It seems very suspicious that these would not GC until transferred out of the worker.
Flags: needinfo?(amarchesini)
Using minimize from about:memory seems to make the memory go away.  Are we just not running cycle collector on worker threads?
I think a more damning test case is one without a loop at all:

let worker = new Worker("data:text/javascript,onmessage=function(){}");
let buffer = new ArrayBuffer(1024 * 1024 * 512);
worker.postMessage(buffer, [buffer]);

Gives a worker holding 512MB of memory that doesn't clean up until minimize memory is used in about:memory or the worker is terminated.

537.53 MB (100.0%) -- explicit
├──513.59 MB (95.55%) -- workers/workers(example.com)/worker(data:text/javascript,onmessage=function(){}, 0x2d9ca379000)
│  ├──512.36 MB (95.32%) -- zone(0x2d9ccad2000)
│  │  ├──512.14 MB (95.28%) -- compartment(web-worker)
│  │  │  ├──512.12 MB (95.27%) -- classes
│  │  │  │  ├──512.00 MB (95.25%) -- class(ArrayBuffer)/objects
│  │  │  │  │  ├──512.00 MB (95.25%) ── malloc-heap/elements/normal
│  │  │  │  │  └────0.00 MB (00.00%) ── gc-heap
│  │  │  │  └────0.12 MB (00.02%) ++ (2 tiny)
│  │  │  └────0.02 MB (00.00%) ++ (2 tiny)
│  │  └────0.22 MB (00.04%) ++ (6 tiny)
│  └────1.23 MB (00.23%) ++ (3 tiny)
├───12.04 MB (02.24%) ++ js-non-window
└───11.91 MB (02.22%) -- (22 tiny)
    ├───2.75 MB (00.51%) ── heap-unclassified
    ├───2.70 MB (00.50%) ++ heap-overhead
    ├───1.59 MB (00.30%) ── profiler/profiler-state
    ├───1.48 MB (00.27%) ++ script-preloader
    ├───0.83 MB (00.15%) ── xpti-working-set
    ├───0.61 MB (00.11%) ++ layout
    ├───0.48 MB (00.09%) ── preferences
    ├───0.45 MB (00.08%) ++ window-objects/top(http://example.com/, id=8589934593)
Calling this some time later also causes it to clean up:

worker.postMessage({});

But doing it immediately after the array buffer postMessage does not allow it to clean up:

let worker = new Worker("data:text/javascript,onmessage=function(){}");
let buffer = new ArrayBuffer(1024 * 1024 * 512);
worker.postMessage(buffer, [buffer]);
worker.postMessage({});  // does not cause GC

I wonder if postMessage() is somehow holding it alive for a while preventing our GC from cleaning it up.  And then we don't try again until the next event is received.
We don't do anything special for arrayBuffer. They are transferred internally by the JS engine:
https://searchfox.org/mozilla-central/source/dom/workers/Worker.cpp#83,85-86,107
StructureCloneHolder doesn't receive ArrayBuffers:
https://searchfox.org/mozilla-central/source/dom/base/StructuredCloneHolder.cpp#1168-1233
but just what it is not 'recognized' by the JS engine. Mainly, MessagePorts.

It could be that we schedule GC wrongly in workers. I'll check it again today.
WorkerPrivate::GarbageCollectInternal is correctly executed by the IdleGCTimerCallback and by the PeriodicGCTimerCallback.

IdleGCTimerCallback is executed 5 seconds after the last nsIThread event. It's executed with aShrinking=true and aCollectChildren=false, this means that we end up executing:

https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#4947,4957
JS::PrepareForFullGC(aCx); JS::GCForReason(aCx, GC_NORMAL, JS::gcreason::DOM_WORKER);

About PeriodicGCTimerCallback, this is executed any a few seconds and it ends up calling WorkerPrivate::GarbageCollectInternal with aShrinking=false and aCollectChildren=false. This means that we just execute: JS_MaybeGC(aCx);

Maybe Jon can help here.
Flags: needinfo?(amarchesini) → needinfo?(jcoppeard)
Last time I looked at worker GC scheduling, the issue was that we cancelled the timers doing GC in many cases.
this related demo (regarding a problem with Safari) crashes Firefox Nightly on Windows 10 66.0a1 (2019-01-03) (64-bit),
it also crashed Chrome Stable Version 71.0.3578.98, but doesn't Crash Canary Version 73.0.3660.0 

https://measuredweighed.github.io/mapboxgl-js-mem-usage/arraybuffer-test.html

from https://github.com/mapbox/mapbox-gl-js/issues/4695#issuecomment-447362134
See Also: → 1643064
Depends on: 1651612
Severity: normal → S3

I use this feature pretty heavily, and found out that our site is crashing FireFox users' browsers.
I hid the feature behind a URL so you guys can test it out and reproduce it. I use a background web worker to render a spinning 2d planet, along with active weather effects; it's fairly simple, but does rely on transferring stuff around. It has a couple different modes depending on what the browser supports, I think it's using the main thread rendering version and just transferring the data down via transfer on array buffers (since ImageBitmap/offscreen rendering wasn't reliable or working yet on Firefox, I can't remember).

You can enable it here: https://planetminecraft.com/?firefox_bug
The top of the page's header will have what we call the Active Header.

We do video based augmented reality in javascript, so needless to say, we have had a need to frequently send video frames to web workers in order to perform computer vision efficiently at a high frame rate. I was able to determine ArrayBuffer garbage collection will not happen until the app is blurred, or in some cases it might intermittently happen when the CPU processing had dropped significantly. It may be waiting to schedule GC until it feels like it's a good time, but it was hard to determine why or when. I was able to implement a pause in CPU and animation updates and occasionally trigger it, while also verifying this was not from a memory leak of any kind by process of elimination. It would quickly accumulate to several GB an then kill the website.

The workaround is to serialize the data into a string when passing it back to and from the web worker process, but this is in no way ideal. I tried using transferable objects, and the issue still persisted. It was only after implementing serialization did it stop being a problem.

For anyone interested, here is the serialization workaround, albeit opencv matrices. object.data is the arraybufffer and there's a function to get the channels for opencv's Mat constructor.

export const serializeImage = (image:Mat):ImageBuffer=>{
    const data = btoa(new Uint8Array(image.data).reduce((data, byte) => data + String.fromCharCode(byte), ''));
    return {width:image.cols, height:image.rows, data, channels:image.channels()}
}

export const deserializeImage = (buffer:ImageBuffer)=>{
    const data = Uint8Array.from(atob(buffer.data), c => c.charCodeAt(0));
    return cv.matFromArray(buffer.height, buffer.width, getMatrixType(buffer), data);
}

It is entirely possible this is also related to CPU utilization, but I do not know anything about the GC algorithm.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: