Closed Bug 1767342 Opened 2 years ago Closed 2 years ago

Flip pref for User defined byte streams

Categories

(Core :: DOM: Streams, task, P1)

task

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox102 --- fixed

People

(Reporter: mgaudet, Assigned: mgaudet)

References

Details

(Keywords: dev-doc-complete)

Attachments

(1 file)

To ship them by default.

Assignee: nobody → mgaudet
Status: NEW → ASSIGNED
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d3e51e995cb7
Enable user defined byte stream by default r=smaug
Severity: -- → S3
Priority: -- → P1
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
Keywords: dev-doc-needed

FYI FF102 docs work for this can be tracked in https://github.com/mdn/content/issues/16816

Is there any good explainer or example code anywhere? I work it out from the spec if not.

I don't think there's an incredibly good explainer on this, though there are WPT tests that if nothing else guide the expected behaviour.

Thanks Matthew, I think "maybe" I now understand (normal) streams. I think my epiphany was that the controller is owned by the stream but is only used by the "underlying source" (to queue chunks and report stream completion/errors).

  1. Is that right?

Anyway, is this the best place to confirm my understanding? If so, as I see it:

  • pull(controller) is called whenever a new buffer is required (except if it returns a promise, in which case it will be called once the last promise completes/rejects)
  • The controller (ReadableByteStreamController) is owned by the ReadableStream (it knows to give you one of these because you set the type when you constructed that object)
  • the controller has ReadableByteStreamController.byobRequest, which is a ReadableStreamBYOBRequest.
    • The "underlying source" implementation of pull would access byobRequest in the controller and either:
      • write bytes to view, and then call respond() on the byobRequest.
      • create a buffer, write data to it, and then call respondWithNewView() on the byobRequest
    • either way, this tells the stream that it has new bytes to play with. It will transfer the content of the view that was used and make it unusuable
  1. Is that correct?

  2. Spec says: [[autoAllocateChunkSize]]: A positive integer, when the automatic buffer allocation feature is enabled. In that case, this value specifies the size of buffer to allocate. It is undefined otherwise.

    • What is automatic buffer allocation, and which buffer is allocated? i.e. does this mean the view owned by ReadableByteStreamController.byobRequest?
    • If this is not specified presumably you just get whatever size data you're given, so you'd set it if you knew the size data you wanted transfer "efficiently" right?
    • What happens if automatic buffer allocation is not enabled?
  3. Spec says: [[byobRequest]] A ReadableStreamBYOBRequest instance representing the current BYOB pull request, or null if there are no pending requests.

    • I can't see any workflow that defines what a pending request or who it is pending from?
    • Does this mean the controller creates a new instance of the request object every time it is about to call pull() and sets it to null after the underlying source has responded? So if you were to cached the controller in start() the value might be null if you were to somehow call it between pull() calls?
  4. ReadableStreamBYOBRequest represents a pull request. My understanding is that this is called whenever the readable stream thinks it is needed (and there is no promise outstanding).

    • If you have a push source and so you don't define a pull() how does BYOB work?

Apologies if these questions are dumb!

Flags: needinfo?(mgaudet)

Definitely not dumb questions!

I think my epiphany was that the controller is owned by the stream but is only used by the "underlying source" (to queue chunks and report stream completion/errors).

The underlying source is the most natural way to interact with the controller, on the pull and start callbacks; nothing stops you from saving a reference to the controller and doing with it what you will however.

The controller is really just a pluggable interface to the stream that allows you to perform actions against the stream. I wasn't around for the design of the Streams spec, but I suspect if there wasn't a desire to distinguish behaviour between byte streams and 'regular' streams, all of its functionality could be inlined into the stream class itself.

  • pull(controller) is called whenever a new buffer is required (except if it returns a promise, in which case it will be called once the last promise completes/rejects)
  • The controller (ReadableByteStreamController) is owned by the ReadableStream (it knows to give you one of these because you set the type when you constructed that object)
    * the controller has ReadableByteStreamController.byobRequest, which is a ReadableStreamBYOBRequest.

One point (and this might clarify some of the future questions too): byobRequest may or may not exist. It exists in two circumstances:

  1. You've created a reader with the byob mode (const byobReader = stream.getReader({ mode: 'byob' });)
  2. The stream was created with an autoAllocateChunkSize, in which case the byobRequest is always present.

The design goal around the BYOB stuff is to avoid copying.

So what you wrote was correct, but not universal -- using the BYOB features requires coordination between underlying source, stream and potentially reader.

Spec says: [[autoAllocateChunkSize]]: A positive integer, when the automatic buffer allocation feature is enabled. In that case, this value specifies the size of buffer to allocate. It is undefined otherwise.

* What is automatic buffer allocation, and which buffer is allocated? i.e. does this mean the view owned by ReadableByteStreamController.byobRequest? 

Yep! That's the buffer.

* If this is not specified presumably you just get whatever size data you're given, so you'd set it if you knew the size data you wanted transfer "efficiently" right?

If it's not set, you can enqueue an ArrayBuffer yourself, but it's more code. See the comment here in the spec, and specifically the two examples they compare-contrast: the one without auto-allocation support and the one with auto allocation support. The first example shows how you can write a pull callback that is flexible as to BYOB status.

*  What happens if automatic buffer allocation is not enabled?

Then you cannot use the .byobRequest getter, and have to go through the controller.enqueue interface.

Spec says: [[byobRequest]] A ReadableStreamBYOBRequest instance representing the current BYOB pull request, or null if there are no pending requests.

  • I can't see any workflow that defines what a pending request or who it is pending from?

Pending requests will be queued for each time you call ReadableStreamBYOBReader.prototype.read ; but it's definitely hard to follow in the spec.

  • Does this mean the controller creates a new instance of the request object every time it is about to call pull() and sets it to null after the underlying source has responded? So if you were to cached the controller in start() the value might be null if you were to somehow call it between pull() calls?

I think there's a complicated set of behaviour at play here; I hesitate to answer with authority... but in general, unless the stream was created with automatic buffer creation enabled, I'd assume that byobReader can be null.

ReadableStreamBYOBRequest represents a pull request. My understanding is that this is called whenever the readable stream thinks it is needed (and there is no promise outstanding).

  • If you have a push source and so you don't define a pull() how does BYOB work?

I... don't think it works? Experimentally, rs = new ReadableStream({start(controller) { console.log(typeof controller.byobRequest.view) }, autoAllocateChunkSize: 1024, type: 'bytes'}) throws, because byobRequest is undefined.

Flags: needinfo?(mgaudet)

Thanks Matthew, you are much easier to read than the spec! FYI, if I don't respond, it is because this interface has no "like" button - and I only work some days in the week. I am appreciative!

So to paraphrase, my new understanding

  • the controller manages the internal queues of the stream.
  • A consumer (e.g. a reader or a another stream in a pipe chain) requests data from the stream, for example using read. This creates a "pending request"
  • if the internal queue has data the stream will satisfy the pending request from that (and won't request more from the source using pull())
  • However if the stream has no data in the queue the controller will try pull more. Essentially this means that it _may _create a ReadableStreamBYOBRequest and call pull() if it is defined.
    • It will create ReadableStreamBYOBRequest IFF either of the reasons you gave are true (the reader was created as byob OR autoallocation buffer is defined). Otherwise this will be null. It will also be null when there is no pending request, which implies the item gets deleted whenever a pending request is satisfied.
  • A source has three choices.
    • It can always enqueue its data using the controller.
    • IFF the BYOBRequest is not NULL it can write to the supplied buffer using respond()
    • IFF the BYOBRequest is not NULL it can supply its own buffer and send that in respondWithNew().
  • So if you have a BYOBRequest it means you're doing a zero-copy transfer to the consumer's buffer. If you don't then you're copying into an internal buffer and the data is later copied when needed.
  1. Sound about right?

The big epiphany for me here was what was meant by zero copy - not of source buffer into controller queue, but of source buffer into the final consumers buffer.

So view is an auto allocated buffer that you plan to give the final consumer with respond() but you might instead give them your own buffer using respondNew().

  1. Doesn't that mean that you may allocate a buffer for the view that you then throw away because you responded with your own buffer?

If you have a push source and so you don't define a pull() how does BYOB work?

I... don't think it works? Experimentally, rs = new ReadableStream({start(controller) { console.log(typeof controller.byobRequest.view) }, autoAllocateChunkSize: 1024, type: 'bytes'}) throws, because byobRequest is undefined.

So just FYI, I think this means this test is invalid, because you won't have a byobRequest unless the data is requested and unavailable.

But you answered this by pointing me to the first example.

My epiphany here was that the controller doesn't care if the source is push or pull. If it needs data it creates the request. So a push source should write to the request if it has one, and otherwise enqueue. A pull source differs only in that it will probably have the request.

  1. Make sense?

  2. Is the zero copy just to the immediate consumer or the ultimate sync? My assumption is the immediate consumer - i.e a transform stream would need to provide a new buffer.

  3. Above I referred to zero copying to the consumer's buffer, and I was thinking this could be any type of consumer - i.e. pipe to a writable stream or a default reader or a ReadableStreamBYOBReader. As opposed to only being aware of byte consumers like ReadableStreamBYOBReader.
    Am I correct? I think the only special thing about a ReadableStreamBYOBReader is that it will supply the BYOBRequest when needed because it knows that you want that, whether or not the underlying source has specified itself as a byte source.
    For any other consumer, you'll get a buffer, it might come to you as zero copy. You really don't know or care. Right?

  4. You noted two conditions under which the request is created. Might it be more precise to say that those are the two conditions under which a ReadableByteStreamController is created (otherwise you get a default controller, which doesn't even have a the request property)? That makes sense because you can create the stream using autoallocatebuffer and get teh request, and still read it using the default reader (hope what I mean makes sense)

  5. Related to point 6, you said "The stream was created with an autoAllocateChunkSize, in which case the byobRequest is always present". I think you mean "in this case you may get a byobRequest" - I don't think you can have a request unless the internal buffers are empty.
    So this support the idea that maybe you mean ReadableByteStreamController is supplied which has the property.

I think I might actually get this. A miracle!

Would be good if you could sanity check the first doc based on above assumptions: ReadableStreamBYOBRequest

Flags: needinfo?(mgaudet)

(In reply to Hamish Willee from comment #8)

Thanks Matthew, you are much easier to read than the spec! FYI, if I don't respond, it is because this interface has no "like" button - and I only work some days in the week. I am appreciative!

Glad to help! In hindsight, maybe the right place for this conversation to have happened would have been on the streams standards github repo, if only because the designers of the streams standard could provide authoritative answers to places where I end up going 🤷‍♂️.

So to paraphrase, my new understanding

  • the controller manages the internal queues of the stream.
  • A consumer (e.g. a reader or a another stream in a pipe chain) requests data from the stream, for example using read. This creates a "pending request"
  • if the internal queue has data the stream will satisfy the pending request from that (and won't request more from the source using pull())
  • However if the stream has no data in the queue the controller will try pull more. Essentially this means that it _may _create a ReadableStreamBYOBRequest and call pull() if it is defined.
    • It will create ReadableStreamBYOBRequest IFF either of the reasons you gave are true (the reader was created as byob OR autoallocation buffer is defined). Otherwise this will be null. It will also be null when there is no pending request, which implies the item gets deleted whenever a pending request is satisfied.
  • A source has three choices.
    • It can always enqueue its data using the controller.
    • IFF the BYOBRequest is not NULL it can write to the supplied buffer using respond()
    • IFF the BYOBRequest is not NULL it can supply its own buffer and send that in respondWithNew().
  • So if you have a BYOBRequest it means you're doing a zero-copy transfer to the consumer's buffer. If you don't then you're copying into an internal buffer and the data is later copied when needed.
  1. Sound about right?

A couple of things:

  1. How pull needs to behave needs to be matched to the type of reader: So, if pull uses controller.enqueue, and you use a BYOBReader (getReader({mode: 'byob'})) you'll find that you don't get any sensible data, because the reader expected the underlying source would fill the view you passed. On the flip side, if you use a DefaultReader (getReader()), then you can either use enqueue, or if autoAllocateChunkSize is set, byobRequest+respond.

  2. respondWithNewView is a oddly named API (IMO), because the new view is not any arbitrary view. It must be a view of the same buffer as byobRequest.view, with the same byteOffset. Essentially, all this API lets you do is shorten the view by returning a subset; I guess if you are unable to fill the entire view passed to you, then you would use respondWithNewView.

The big epiphany for me here was what was meant by zero copy - not of source buffer into controller queue, but of source buffer into the final consumers buffer.

So view is an auto allocated buffer that you plan to give the final consumer with respond() but you might instead give them your own buffer using respondNew().

  1. Doesn't that mean that you may allocate a buffer for the view that you then throw away because you responded with your own buffer?

Given the limitations of respondWithNewView you'll see how that's not true: you can only respond with a new view of the same buffer.

If you have a push source and so you don't define a pull() how does BYOB work?

I... don't think it works? Experimentally, rs = new ReadableStream({start(controller) { console.log(typeof controller.byobRequest.view) }, autoAllocateChunkSize: 1024, type: 'bytes'}) throws, because byobRequest is undefined.

So just FYI, I think this means this test is invalid, because you won't have a byobRequest unless the data is requested and unavailable.

But you answered this by pointing me to the first example.

My epiphany here was that the controller doesn't care if the source is push or pull. If it needs data it creates the request. So a push source should write to the request if it has one, and otherwise enqueue. A pull source differs only in that it will probably have the request.

  1. Make sense?

Yes, I think so

  1. Is the zero copy just to the immediate consumer or the ultimate sync? My assumption is the immediate consumer - i.e a transform stream would need to provide a new buffer.

pipeTo and pipeThrough quite intentionally don't specify how all the piping happens in order to allow for copy reducing optimizations, but it's not entirely guaranteed or specified how all this will happen.

  1. Above I referred to zero copying to the consumer's buffer, and I was thinking this could be any type of consumer - i.e. pipe to a writable stream or a default reader or a ReadableStreamBYOBReader. As opposed to only being aware of byte consumers like ReadableStreamBYOBReader.
    Am I correct? I think the only special thing about a ReadableStreamBYOBReader is that it will supply the BYOBRequest when needed because it knows that you want that, whether or not the underlying source has specified itself as a byte source.
    For any other consumer, you'll get a buffer, it might come to you as zero copy. You really don't know or care. Right?

Yes, I think that sounds right.

  1. You noted two conditions under which the request is created. Might it be more precise to say that those are the two conditions under which a ReadableByteStreamController is created (otherwise you get a default controller, which doesn't even have a the request property)? That makes sense because you can create the stream using autoallocatebuffer and get teh request, and still read it using the default reader (hope what I mean makes sense)

No, I don't think so... You'll only get a ReadableByteStreamController if the underlying source has type: 'bytes' -- the confusing thing here is the distinction between controller and reader, and how these interfaces aren't universal (in the sense that they don't all translate into eachother, so the stream/underlying source and reader need to cooperate to work correctly.

  1. Related to point 6, you said "The stream was created with an autoAllocateChunkSize, in which case the byobRequest is always present". I think you mean "in this case you may get a byobRequest" - I don't think you can have a request unless the internal buffers are empty.
    So this support the idea that maybe you mean ReadableByteStreamController is supplied which has the property.

pull(c) is only invoked when the internal queues don't have anything to provide, and pull is where you'd see byobRequest, so in a sense this is the same sattement I think.

I think I might actually get this. A miracle!

Would be good if you could sanity check the first doc based on above assumptions: ReadableStreamBYOBRequest

I'd strongly recommend that we pull in the editors for the Streams spec to make sure we're accurate; I'll ping them on the PR after this.

Flags: needinfo?(mgaudet)

Thanks very much. I'll keep going on the docs, but that is again very helpful. Hopefully the editors come to check it out. If not I'll post a link there.

No, I don't think so... You'll only get a ReadableByteStreamController if the underlying source has type: 'bytes' -- the confusing thing here is the distinction between controller and reader, and how these interfaces aren't universal (in the sense that they don't all translate into eachother, so the stream/underlying source and reader need to cooperate to work correctly.

Ah, that's useful. You only get a byobRequest is if you have a ReadableByteStreamController, and you only get the controller if you specify that you're a byte source. So this is like a precondition on top of autoAllocateChunkSize being set or creating a byte stream reader.

Related to point 6, you said "The stream was created with an autoAllocateChunkSize, in which case the byobRequest is always present". I think you mean "in this case you may get a byobRequest" - I don't think you can have a request unless the internal buffers are empty.
So this support the idea that maybe you mean ReadableByteStreamController is supplied which has the property.

pull(c) is only invoked when the internal queues don't have anything to provide, and pull is where you'd see byobRequest, so in a sense this is the same statement I think.

Similar. A bit pedantic, but if you have a push source you might get data when byobRequest is null - i.e. between pulls. That is a case where autoAllocateChunkSize is set and byobRequest is not present.

How pull needs to behave needs to be matched to the type of reader: So, if pull uses controller.enqueue, and you use a BYOBReader (getReader({mode: 'byob'})) you'll find that you don't get any sensible data, because the reader expected the underlying source would fill the view you passed.
On the flip side, if you use a DefaultReader (getReader()), then you can either use enqueue, or if autoAllocateChunkSize is set, byobRequest+respond.

I think the generalisation is:

  • "if you're handed a byobRequest you must use it to send data rather than enqueue."
  • You will get that BYOB request on pull for a type="byte" underlying source if autoAllocateChunkSize is set for either reader, or if the reader is a BYOBReader
  • You will not have a BYOB request between pulls (in a push underlying source), if the buffer is full already, or if you are using the default reader and have not set the autoAllocateChunkSize. IN that case you just enqueue is normal.

So the instruction to users would be that if you're specifying a byte type underlying source you should always check for a byobRequest being non-null and use it if possible. That works for any reader or if the underlying byte source is pull or push.

We might be saying exactly the same thing :-). I just have a sneaking feeling that you mean if using a BYOBReader you can only ever use byobRequest to write data. You can never use enqueue. But that doesn't match the discussion we had above or the spec examples.

FYI Docs pretty much complete. Have been through one review from spec authors. Have marked as dev-doc-complete, though there might be some minor follow on work.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: