Closed Bug 1519483 Opened 5 years ago Closed 3 years ago

Implement the RegExp Match Indices proposal

Categories

(Core :: JavaScript: Standard Library, enhancement, P3)

63 Branch
enhancement

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox88 --- fixed

People

(Reporter: alex.fdm, Assigned: iain)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-complete)

Attachments

(4 files)

The proposal is currently in Stage 2.

Priority: -- → P3

This proposal has been promoted to Stage 3.

Summary: implement the RegExp match array offsets proposal → Implement the RegExp Match Indices proposal
Component: JavaScript Engine → JavaScript: Standard Library
Depends on: 1367105
No longer depends on: 1367105

Is there any update or ETA on this implementation?

This proposal should be straightforward for us to implement. The regexp engine already returns matches in this form (an array of start/end indices), which we post-process to turn into match strings. All we have to do is expose them. We can probably do this while implementing named captures, which should be happening soon (either in 77 or in 78).

I will point out that we are using the same engine as V8 under the covers, so from the point of view of multiple implementations for TC39, it would be good to hear from JSC.

Depends on: 1367105
See Also: → 1362154

While working on named captures, I started looking at match indices. It turns out to be more work to implement efficiently than I had hoped.

Match indices add an additional property to every match result, which is an array containing other arrays. Eagerly allocating those objects is likely to add a noticeable performance penalty to regexp matches. Unlike named captures, which opt in to creating new objects using new syntax, match indices are required for every match, which has the potential to slow down the 99%+ of matches that don't use capture indices in favour of the tiny fraction that do.

It's probably necessary to generate these objects on-demand when the indices property is accessed. There's some tricky caching required to get this right, and reading through the comments on the bug tracking V8's implementation, it seems they've had problems with both performance and correctness.

Making this fast might involve a significant modification to the way we represent regexp results.

(Note: all of the interesting work for match indices happens outside the regexp engine proper.)

The January 2021 TC39 meeting changed the specification of match indices to only generate the indices property when a new /d flag is used. This fixes the performance concerns. We can go ahead and implement this.

Assignee: nobody → iireland
Status: NEW → ASSIGNED

Before match indices, all match results had the same shape. (If there are no named capture groups, .groups exists but is undefined.) Match results with .indices have a distinct shape, so we need a distinct template object. We also need a template object for .indices itself.

Depends on D107135

The spec text is here: https://tc39.es/proposal-regexp-match-indices

Unfortunately, the diff algorithm used to generate the spec text for proposals assigns a number to deleted steps, so the step numbers in RegExpBuiltinExec will be different in the final spec. In an attempt to make the step numbers meaningful, I manually renumbered them. Here's the gist that I worked from: https://gist.github.com/iainireland/8e7541a95a7ec6900ce70fc44f9a3b76

match-indices-warp.js ensures that we have at least one testcase that calls a /d regexp from Warp. match-indices-dictionary ensures we have coverage of dictionary mode; it is based on regexp/bug1640473.js.

Depends on D107136

Depends on D107137

(In reply to Iain Ireland [:iain] from comment #8)

Unfortunately, the diff algorithm used to generate the spec text for proposals assigns a number to deleted steps, so the step numbers in RegExpBuiltinExec will be different in the final spec. In an attempt to make the step numbers meaningful, I manually renumbered them.

You can view the diff for the actual ECMA262 Pull Request for this feature here: https://arai-a.github.io/ecma262-compare/?pr=1713. This tool generates correct algorithm step numbers as part of the diff.

I've also updated the proposal repository to link to the official PR diff rather than the generated proposal text.

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9b26c3d0fe80
Add hasIndices to RegExpFlags r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/9b48010357a1
Support .indices in match result template objects r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/34ba1b4f06ca
Create match indices r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/054f16f96be2
Enable test262 tests r=mgaudet

What is needed for us to remove dev-doc-needed on this? From what I can see the main implication is documentation and compatibility data for hasIndices() - and that seems to have been already done.

Flags: needinfo?(iireland)

It looks good to me!

The only thing I would consider adding is a more prominent description (maybe on the hasIndices page?) of the structure of the output, similar to the examples here. In particular, it might be nice to document the interaction between /d and named capture groups.

Flags: needinfo?(iireland)

Thanks very much Iain. Updates as suggested are in https://github.com/mdn/content/pull/21966

Regressions: 1845715
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: