Closed Bug 1435829 Opened 6 years ago Closed 5 years ago

Implement String.prototype.matchAll proposal

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- disabled

People

(Reporter: till, Assigned: anba)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-complete)

Attachments

(6 files, 10 obsolete files)

27.79 KB, patch
anba
: review+
Details | Diff | Splinter Review
11.86 KB, patch
anba
: review+
Details | Diff | Splinter Review
12.94 KB, patch
anba
: review+
Details | Diff | Splinter Review
10.94 KB, patch
anba
: review+
Details | Diff | Splinter Review
1.49 KB, patch
anba
: review+
Details | Diff | Splinter Review
52.93 KB, patch
anba
: review+
Details | Diff | Splinter Review
The String.prototype.matchAll proposal is at stage 3 and we should implement. I wouldn't rule out slight changes to the spec, so we shouldn't immediately let this ride the trains.
Priority: -- → P3
This implements the current proposal text including https://github.com/tc39/proposal-string-matchall/pull/33, except for the second IsRegExp call in MatchAllIterator because of [1] and [2].

[1] https://github.com/tc39/proposal-string-matchall/pull/33#pullrequestreview-94400839
[2] https://github.com/tc39/proposal-string-matchall/pull/33#discussion_r166352273
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
Attachment #8956024 - Flags: review?(jorendorff)
Adds inlining support for IsRegExpStringIterator and NewRegExpStringIterator based on the existing inlining support for Array and String iterators.
Attachment #8956025 - Flags: review?(jorendorff)
Adds an optimisation to reuse the input RegExp of MatchAllIterator.

I've verified in µ-benchmarks that this optimisation does pay off. And, even though this is the initial implementation of the proposal, landing this optimisation now is useful because we can give implementer's feedback to the proposal author.
Attachment #8956026 - Flags: review?(jorendorff)
Attached patch bug1435829-part4-tests.patch (obsolete) — Splinter Review
And some initial tests, mostly focusing on the optimised path, because I expect the test262 tests will cover the functional tests more thoroughly.
Attachment #8956027 - Flags: review?(jorendorff)
Comment on attachment 8956024 [details] [diff] [review]
bug1435829-part1-implementation.patch

Review of attachment 8956024 [details] [diff] [review]:
-----------------------------------------------------------------

Crystal clear as always.

::: js/src/builtin/RegExp.js
@@ +1188,5 @@
> +        return callFunction(CallRegExpStringIteratorMethodIfWrapped, obj,
> +                            "RegExpStringIteratorNext");
> +    }
> +
> +    var result = { value: null, done: false };

If I'm reading this right, the proposal returns an object with `value: undefined` in step 4 and step 10.

::: js/src/vm/Iteration.h
@@ +148,5 @@
>  
>  StringIteratorObject*
>  NewStringIteratorObject(JSContext* cx, NewObjectKind newKind = GenericObject);
>  
> +class RegExpStringIteratorObject : public NativeObject

Please put all this somewhere in js/src/builtin, since it is standard library stuff, not part of the VM.

It's a little odd for GlobalObject::initRegExpStringIteratorProto to live in a different directory, but we already do that for some builtin objects: https://searchfox.org/mozilla-central/search?q=%5EGlobalObject%3A%3Ainit%5Cw%2B&case=false&regexp=true&path=js%2Fsrc%2Fbuiltin

I hope this is not onerous. (It's so hard to tell when something like this will be.) A separate patch would be fine, and if it's a pain, just file a followup bug...
Attachment #8956024 - Flags: review?(jorendorff) → review+
Comment on attachment 8956025 [details] [diff] [review]
bug1435829-part2-jit-support.patch

Review of attachment 8956025 [details] [diff] [review]:
-----------------------------------------------------------------

Seems obviously correct to me, but I'd better defer to a jit peer.
Attachment #8956025 - Flags: review?(jorendorff) → review?(jdemooij)
Comment on attachment 8956026 [details] [diff] [review]
bug1435829-part3-reuse-regexp.patch

Review of attachment 8956026 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with a few comments.

I guess I'm a little surprised this is a winning optimization. It looks like all that's being optimized away is a few straightforward method calls and a `new RegExp(existingRegExp)` ...which should be fairly quick, relative to the cost of actually searching strings.

::: js/src/builtin/RegExp.js
@@ +1142,5 @@
>          var C = SpeciesConstructor(R, GetBuiltinConstructor("RegExp"));
>  
> +        if (IsMatchAllIteratorOptimizable(R, C)) {
> +            // Step 2.c.
> +            source = UnsafeGetStringFromReservedSlot(R, REGEXP_SOURCE_SLOT);

I think the source slot of a RegExp is immutable (once it's initialized). If so, we don't need REGEXP_STRING_ITERATOR_SOURCE_SLOT. We can extract the source later, if we ever need it.

::: js/src/vm/Iteration.cpp
@@ +1109,2 @@
>      RegExpStringIteratorSlotFlags,
> +    RegExpStringIteratorSlotLastIndex,

Now these need short comments explaining why they aren't identical to what's in the spec, and how they correspond to the spec.

In particular, the comment on LastIndex should explain

- that it's used while we're on the fast path, and if we have to jump to the slower path, then this state is stored in the RegExp's .lastIndex property instead.

- how this field doubles as .[[Done]]
Attachment #8956026 - Flags: review?(jorendorff) → review+
(In reply to Jason Orendorff [:jorendorff] from comment #7)
> I think the source slot of a RegExp is immutable (once it's initialized).

I don't remember ever encountering RegExp.prototype.compile() before. Amazing.
Comment on attachment 8956027 [details] [diff] [review]
bug1435829-part4-tests.patch

Review of attachment 8956027 [details] [diff] [review]:
-----------------------------------------------------------------

Nice tests. Thank you!

::: js/src/tests/non262/String/matchAll.js
@@ +30,5 @@
> +const RegExp_prototype_match = RegExp.prototype[Symbol.match];
> +
> +function assertEqIterMatchResult(actual, expected) {
> +    assertEq(actual.done, expected.done);
> +    if (actual.value === null || expected.value === null) {

undefined, not null

@@ +102,5 @@
> +    regexp.compile("b+");
> +    assertEqMatchResults(iterator, matchResults("aabb", /a+/));
> +}
> +
> +// Recompile RegExp (source) before first match.

I think this one comment should say (flags) rather than (source).
Attachment #8956027 - Flags: review?(jorendorff) → review+
Comment on attachment 8956025 [details] [diff] [review]
bug1435829-part2-jit-support.patch

Review of attachment 8956025 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay. Looks great!
Attachment #8956025 - Flags: review?(jdemooij) → review+
(In reply to Jason Orendorff [:jorendorff] from comment #5)
> ::: js/src/builtin/RegExp.js
> @@ +1188,5 @@
> > +        return callFunction(CallRegExpStringIteratorMethodIfWrapped, obj,
> > +                            "RegExpStringIteratorNext");
> > +    }
> > +
> > +    var result = { value: null, done: false };
> 
> If I'm reading this right, the proposal returns an object with `value:
> undefined` in step 4 and step 10.

Yup, this was changed per the review comments in the spec repo -> https://github.com/tc39/proposal-string-matchall/commit/f368659e372b37afe241496db3425209d24f14ec


> ::: js/src/vm/Iteration.h
> @@ +148,5 @@
> >  
> >  StringIteratorObject*
> >  NewStringIteratorObject(JSContext* cx, NewObjectKind newKind = GenericObject);
> >  
> > +class RegExpStringIteratorObject : public NativeObject
> 
> Please put all this somewhere in js/src/builtin, since it is standard
> library stuff, not part of the VM.
> 
> [...]
>
> I hope this is not onerous. (It's so hard to tell when something like this
> will be.) A separate patch would be fine, and if it's a pain, just file a
> followup bug...

js/src/vm/Iteration.{h,cpp} currently also contains Array and String iterators and the Iterator prototype object [1]. Do you think it's acceptable to have a single js/src/builtin/Iterators.{h,cpp} file or do you prefer separate files for each iterator class? 

[1] https://searchfox.org/mozilla-central/source/js/src/vm/Iteration.cpp#1035-1097,1370-1435
(In reply to André Bargull [:anba] from comment #11)
> Do you think it's
> acceptable to have a single js/src/builtin/Iterators.{h,cpp} file or do you
> prefer separate files for each iterator class? 

Or alternatively we can place the code in js/src/{Array,String,RegExp}.cpp so it matches the code organisation in js/src/{Array,String,RegExp}.js.
(In reply to Jason Orendorff [:jorendorff] from comment #7)
> I guess I'm a little surprised this is a winning optimization. It looks like
> all that's being optimized away is a few straightforward method calls and a
> `new RegExp(existingRegExp)` ...which should be fairly quick, relative to
> the cost of actually searching strings.

I haven't investigated this properly, but I guess there are just multiple little things which add up and lead to the performance difference between the optimised and non-optimised version.
<anba> jorendorff: There was/is still some discussion around tc39/proposal-string-matchall#37 (and the various linked issues/prs), which I wanted to wait for. But I can upload new patches including tc39/proposal-string-matchall#38.
Filed bug 1485457 for comment #5.
Re-requesting review because the changes are probably a bit too large to simply carry the r+.


This implements the current spec proposal, including <https://github.com/tc39/proposal-string-matchall/pull/38>.
Attachment #8956024 - Attachment is obsolete: true
Attachment #9003253 - Flags: review?(jorendorff)
Rebased and updated to match the current mechanics to check for object-types. Carrying r+.
Attachment #8956025 - Attachment is obsolete: true
Attachment #9003255 - Flags: review+
Rebased patch, carrying r+. (Decision to carry r+ a bit borderline given the changes necessary for part 1....)
Attachment #8956026 - Attachment is obsolete: true
Attachment #9003257 - Flags: review+
Attached patch bug1435829-part4-tests.patch (obsolete) — Splinter Review
Updated part 4, carrying r+.
Attachment #8956027 - Attachment is obsolete: true
Attachment #9003258 - Flags: review+
Attached patch bug1435829-part5-xrays.patch (obsolete) — Splinter Review
Adding properties to |RegExp.prototype| requires review from an XPConnect peer.

I don't know if any additional tests are necessary for Xrays or if it's sufficient to simply add |Symbol.matchAll| to the prototype properties of |RegExp.prototype|.
Attachment #9003262 - Flags: review?(peterv)
Attached patch bug1435829-part6-test262.patch (obsolete) — Splinter Review
Enable test262 tests for matchAll.
Attachment #9003263 - Flags: review?(jorendorff)
Attachment #9003262 - Flags: review?(peterv) → review+
Comment on attachment 9003253 [details] [diff] [review]
bug1435829-part1-implementation.patch

Review of attachment 9003253 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Symbol.cpp
@@ +70,5 @@
>          WellKnownSymbols* wks = cx->runtime()->wellKnownSymbols;
>          for (size_t i = 0; i < JS::WellKnownSymbolLimit; i++) {
> +#ifndef NIGHTLY_BUILD
> +            if (i == SymbolCode::matchAll)
> +                continue;

Wow, good catch. I wouldn't have looked for this!
Attachment #9003253 - Flags: review?(jorendorff) → review+
Comment on attachment 9003263 [details] [diff] [review]
bug1435829-part6-test262.patch

Review of attachment 9003263 [details] [diff] [review]:
-----------------------------------------------------------------

Great. Thank you, anba.

::: js/src/tests/test262/GIT-INFO
@@ +1,2 @@
>  commit ab436c465106be86719c4849c9cedecd7b570ff9
> +Merge: d901922 ff475fc

This doesn't seem right.
Attachment #9003263 - Flags: review?(jorendorff) → review+
(In reply to Jason Orendorff [:jorendorff] from comment #23)
> ::: js/src/tests/test262/GIT-INFO
> @@ +1,2 @@
> >  commit ab436c465106be86719c4849c9cedecd7b570ff9
> > +Merge: d901922 ff475fc
> 
> This doesn't seem right.

The commit [1] has two parents and at least git 2.7.4 and 2.17.0 both include the merge info in the `git show` output for me. Not sure why it wasn't already present in GIT-INFO...

[1] https://github.com/tc39/test262/commit/ab436c465106be86719c4849c9cedecd7b570ff9
Updated part 1 to include <https://github.com/tc39/proposal-string-matchall/pull/41>.

Rebased part 1 to apply cleanly on inbound after the recent formatting changes. Carrying r+.
Attachment #9003253 - Attachment is obsolete: true
Attachment #9031374 - Flags: review+
Rebased part 2 to apply cleanly on inbound after the recent formatting changes. Carrying r+.
Attachment #9003255 - Attachment is obsolete: true
Attachment #9031376 - Flags: review+
Fixed a small bug where the optimized path didn't set |lastIndex| to 0 for global-or-sticky regular expressions.

Rebased part 3 to apply cleanly on inbound after the recent formatting changes. Carrying r+.
Attachment #9003257 - Attachment is obsolete: true
Attachment #9031377 - Flags: review+
Updated part 4 to include a regression test for the issue noted in the updated part 3 patch. Carrying r+.
Attachment #9003258 - Attachment is obsolete: true
Attachment #9031378 - Flags: review+
Rebased part 5. No changes, carrying r+.
Attachment #9003262 - Attachment is obsolete: true
Attachment #9031379 - Flags: review+
Regenerated the updated test262 to apply cleanly on inbound. Carrying r+.
Attachment #9003263 - Attachment is obsolete: true
Attachment #9031380 - Flags: review+
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/85654ecf0d0c
Part 1: Implement String.prototype.matchAll proposal. r=jorendorff
https://hg.mozilla.org/integration/mozilla-inbound/rev/a32778df6583
Part 2: Add inline support for RegExp-String-Iterator. r=jorendorff
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd2f20cc6d23
Part 3: Add fast path to share input RegExp object. r=jorendorff
https://hg.mozilla.org/integration/mozilla-inbound/rev/1608deab3d3d
Part 4: Tests for String.prototype.matchAll. r=jorendorff
https://hg.mozilla.org/integration/mozilla-inbound/rev/f79c5d5bea25
Part 5: Update xray tests for RegExp.prototype. r=peterv
https://hg.mozilla.org/integration/mozilla-inbound/rev/5142cc07565a
Part 6: Enable test262 tests for String.matchAll. r=jorendorff
Keywords: checkin-needed
Depends on: 1514587
Performance improvement found 

== Change summary for alert #18224 (as of Fri, 14 Dec 2018 10:37:21 GMT) ==

Improvements:

  3%  sessionrestore opt e10s stylo     852.62 -> 828.50

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18224

Notes for anyone willing to help with writing MDN documentation:
(I started this work, because I thought this was on my docs to-do list for Firefox 65, but it is Nightly only for now)

matchAll is linked from (this is done)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/prototype#Methods
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#Methods_2
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/prototype#Methods
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#Methods_2
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol#Well-known_symbols
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Text_formatting#Methods_of_String
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Working_with_regular_expressions

Browser compat data PR: https://github.com/mdn/browser-compat-data/pull/3289
(data needs updating once Firefox actually ships in a release version)

New reference pages to write:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/@@matchAll
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/matchAll

New interactive examples for the reference pages to add here:
https://github.com/mdn/interactive-examples/tree/master/live-examples/js-examples/string (string-matchAll.html)
https://github.com/mdn/interactive-examples/tree/master/live-examples/js-examples/regexp (regexp-prototype-@@match.html)
https://github.com/mdn/interactive-examples/tree/master/live-examples/js-examples/symbol (string-matchAll.html)

If this makes the train in Firefox, we also need to add it to "Firefox 66 for developers" (or 67, or whenever it lands outside of Nightly)
https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/66#JavaScript

Feel free to reach out to me if you want to document this and need help.

(In reply to Florian Scholz [:fscholz] (MDN) from comment #35)

... because I thought this was on my docs to-do list for Firefox 65, but it is Nightly only for now

is that still true? Chrome 73 ships it in March
https://www.chromestatus.com/feature/5520028858318848

Till, can we ship this as well or are we shipping already?

Flags: needinfo?(till)

(In reply to Florian Scholz [:fscholz] (MDN) from comment #37)

Till, can we ship this as well or are we shipping already?

Looks like it's in 66, so will be in release soon?

Flags: needinfo?(till)

It's still behind NIGHTLY_BUILD.

Oh, ok! Is there any reason we shouldn't unflag and ship it?

Flags: needinfo?(andrebargull)

I think it should be ready to ship. Do you agree Jason?

Flags: needinfo?(andrebargull) → needinfo?(jorendorff)

Yes, let's ship it. Filed bug 1531830 for this.

Flags: needinfo?(jorendorff)

setting "disabled" for Firefox 66 to avoid confusion

Doc work has been completed, see comment 35 for details.
Marked is shipping in 67 https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/67#JavaScript

Review appreciated.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: