Open Bug 1381146 Opened 7 years ago Updated 2 years ago

Stop using nsCString for buffering incoming data channel messages

Categories

(Core :: WebRTC: Networking, enhancement, P3)

enhancement

Tracking

()

UNCONFIRMED

People

(Reporter: lgrahl, Unassigned)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36 OPR/46.0.2597.19 (Edition beta)
Using nsCString limits incoming data channel messages to ~2 GiB in size while we can support messages of arbitrary size.
Component: General → WebRTC: Networking
Rank: 29
Priority: -- → P2
(In reply to Lennart Grahl from comment #1)
> Using nsCString limits incoming data channel messages to ~2 GiB in size
> while we can support messages of arbitrary size.

Uh, do we really need contiguous buffers greater then 2GB?  That sounds like a good way to OOM.

Adding memshrink here because we have done a lot of work to avoid giant IPC buffers in the past.
Whiteboard: [MemShrink]
Oh, maybe this is not IPC.  Sorry for my confusion here.  Still, giant contiguous memory buffers can be a problem if there is any fragmentation.
I'd be happy if WebRTC would have a data channel "streaming" API, so we can hand out data chunks earlier and forget about them. It's rather low level and requires a carefully thought out API design. But without that, applications either have to limit large data transfers to ~2 GiB or do fragmentation/reassembly in the application (as they have to do at the moment). So, my personal opinion is that we either need to support messages of arbitrary size until we're OOM or have a proper streaming API for WebRTC data channels (which probably requires some persuasion towards the WebRTC W3C working group). The status quo really cannot be more but a temporary solution.

Regardless of the ~2 GiB limitation, I'm not really sure if using nsCString for buffering messages is such a good idea. It has a somewhat awkward API (e.g. the append method cannot return an error if there's no space left). But I'm not that familiar with classes available in the core, so I'll be thankful for any suggestions.
See nsSegmentedBuffer and possibly nsPipe if you don't need random access, etc.
Oh, and nsSegmentedVector.
So, this was something I typed into #memshrink: (trimmed for readability)

actually one of the things it receives is strings, which was why it was done that way originally (probably cloned from WebSockets code - both receive either strings, or ArrayBuffers/Blobs (depending on what the receiving code asks for).  Even binary things are typically <1500 bytes, and rarely >16K, but the api allows giving send() a 4gb (or larger) blob.  I've actually tested sending1GB files with it, and yes, it's quite subject to fragmentation with the current impl.  Some type of segmented buffer makes much more sense.

Also:   static nsresult CreateArrayBuffer(JSContext *aCx, const nsACString& aData, JSObject** aResult);  Blobs have that type (CreateStringBlob(... const nsACString&...)), though also  CreateMemoryBlob(..., void* aMemoryBuffer, uint64_t aLength,...), which requires a malloc-like buffer.

The lower-level code that accumulates doesn't know if the JS wants an ArrayBuffer or a Blob, and even if it did the application could change it's mind before the accumulation is done.  It can accumulate in a Segmented array of some sort, but will have to transfer it to linear memory before calling either of these.  So while inefficient to use an nsCString (periodic copies as it grows, instead of one copy from segmented to linear a the end), it's not that insane.

What we really need is either for those to support segmented storage backends (unlikely though possible for ArrayBuffer I'd guess), or (at least for blobs) have it be backed-to-temp-disk (though even that has it's problems).  Mostly the "huge" transfers would be blobs anyways (and then saved to disk by the user), so if Blob supported segmented storage, then it could be accumulated that way, converted to a blob, then written to disk and never be linear (or copied).

The other thing applications do is transfer (say) 16K chunks, accumulate them in JS (since Chrome doesn't support send(blob) yet), then convert to a blob and write it to a user-selected file.  "Convert to a blob" though requires N*(16K) + (N*16K) memory at the point of conversion, and one of those is a linear buffer of course.
Randell, I believe we should look at what we want to solve (and find a consensus on what we don't want to solve). There are several things:

1. A segmented buffer backend for Blob and ArrayBuffer (if possible). This would not prevent large buffers but should significantly reduce copying.
2. Connect a data channel directly to a file - incoming and outgoing. Not really sure what the status quo is here but IIRC at least piping incoming chunks from a message to a file requires some spec changes.
3. A disk-backed message buffer for gigantic messages.
4. A low-level streaming mode for data channels, so chunks are handed out directly. Requires spec changes. (This is actually an optional feature I've already implemented in RAWRTC because I needed it. Receiving: It hands out chunks instead of messages and has a flag to tell you once a message is complete or has been aborted. Sending: You provide chunks and the same flag when passing it to the stack. The chunk is being rejected in case there's no space in the buffer. One can also abort messages in case the channel is unreliable.)

Personally, I think 1. would increase efficiency, 2. would improve the high-level API for receiving large files easily, while 4. solves the need for a low-level streaming API that requires fragmentation/reassembly in the application but gives more control to the data (relaying to another peer, decrypting on-the-fly, implementing 2., ...).

However, I think 3. would become obsolete as I can't think of a use case that 1., 2. and 4. wouldn't solve.

What do you think?
NI mcmanus since this API was designed to mirror/ducktype WebSocket's send()/onmessage() api; so these are questions he may have some thoughts about.

#2 and #4 would require non-trivial spec changes, so more likely for v2 of the webrtc spec (which should start soon, as v1 is near completion).  Saving directly to a file would require the user to select the file to save in (for sec reasons), though we can buffer to a temp file while they're selecting, or make the user select the file before the transfer starts.
Flags: needinfo?(mcmanus)
> Saving directly to a file would require the user to select the file
> to save in (for sec reasons)

Why is that?  We store blobs in auto-named files all the time.
(In reply to Lennart Grahl from comment #4)
> Regardless of the ~2 GiB limitation, I'm not really sure if using nsCString
> for buffering messages is such a good idea. It has a somewhat awkward API
> (e.g. the append method cannot return an error if there's no space left).

On the assumption that "no space left" means "no memory to allocate a larger string", I'll point out that the various Append() methods take a `fallible_t` parameter and return `bool` to indicate whether they succeeded or not.  We could at least use those to prevent crashes in WebRTC code, assuming those aren't already being used.  If that's not what you meant, please let me know, and I can try to point you in the right direction.
Flags: needinfo?(lennart.grahl)
(In reply to Nathan Froyd [:froydnj] from comment #11)
> On the assumption that "no space left" means "no memory to allocate a larger
> string", I'll point out that the various Append() methods take a
> `fallible_t` parameter and return `bool` to indicate whether they succeeded
> or not.  We could at least use those to prevent crashes in WebRTC code,
> assuming those aren't already being used.  If that's not what you meant,
> please let me know, and I can try to point you in the right direction.

I couldn't find the `Append` method that includes the `fallible_t` parameter and returns a `bool`. It's not in the MDN of nsCString's description and I haven't seen it in the code either.
Crashing is already being prevented on the receiving side. There's another crash that might be related (see bug 1382779).
Flags: needinfo?(lennart.grahl)
Follow-up: But if there is such a method, please point me in the right direction. This should make the code less quirky.
(In reply to Lennart Grahl from comment #12)
> I couldn't find the `Append` method that includes the `fallible_t` parameter
> and returns a `bool`. It's not in the MDN of nsCString's description and I
> haven't seen it in the code either.

There are numerous Append methods here:

http://searchfox.org/mozilla-central/source/xpcom/string/nsTSubstring.h#585
Thanks, I'll try it out! The MDN should definitely be updated.
Whiteboard: [MemShrink]
I think datachannels and websockets would both benefit from #1 as a no brainer.. not sure about the API questions right now.
Flags: needinfo?(mcmanus)
(In reply to Ben Kelly [:bkelly] from comment #10)
> > Saving directly to a file would require the user to select the file
> > to save in (for sec reasons)
> 
> Why is that?  We store blobs in auto-named files all the time.

I was assuming that the user would need to select a destination directory (and filename) for the transfer; if we consider this a "download", it could go in downloads... but letting an app do so with either a random (or app-selected!) name without user action or knowledge would be a major problem/surprise.  Especially as the size is basically unlimited.
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.