Closed Bug 773687 Opened 12 years ago Closed 9 years ago

RegExp flag /y (sticky) causes ^ anchor to apply to start of match attempt rather than entire target string

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: steves_list, Assigned: arai)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete, Whiteboard: [js:p2][DocArea=JS])

Attachments

(3 files)

Repro steps for Firefox 13.0.1:

var r = /^r/gy;
r.lastIndex = 2;
console.log(r.exec('str'));

Result: ['r']

Expected: null

If you remove the /y flag, you do get the correct result (null).

Because flag /y is (as-yet) nonstandard, I suppose I'm arguing for what the correct result should be, based on intuitive semantics and what would actually be useful.

Note that XRegExp <http://xregexp.com> uses flag /y behind the scenes for various perf optimizations, and shims it via its XRegExp.exec function via an optional 'sticky' argument (which uses native /y in browsers that support it, i.e., Firefox 3+). The described, incorrect behavior is therefore causing bugs in XRegExp.
Maybe I'm missing something, but isn't the exactly the semantics /y was designed for?  For example, if you were using regular expressions to parse header lines in an HTTP request, you could use a /y regex to extract each header line, without having to mess with creating substrings of the entire HTTP request.  If /y doesn't do what's demonstrated here, what's it supposed to do, and what behavior does it actually change?
For the use case you described, you should probably use the /m flag combined with ^ and /y, in which you would get the behavior you expect.

Flag /y should add an *implicit* non-/m ^ anchor that is applied to the beginning of the match attempt. However, an *explicit* (user-specified) ^ in the RegExp pattern should match only at the beginning of the string, or at the beginning of a line if used with /m.
Consider something like 'baaba'.match(/a|^b/gy). This should return ['b','a','a']. However, currently Firefox returns ['b','a','a','b','a'].
Really should have a https://bugs.ecmascript.org report on file instead of this, or upstream of this.

/be
Status: UNCONFIRMED → NEW
Ever confirmed: true
I've added a new bug at https://bugs.ecmascript.org/show_bug.cgi?id=526
Whiteboard: [js:p2]
But there are no way to test if a string "startsWith" a RegExp at position "position".
The current behaviour of FireFox with "y" flag allows this.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky
This pages tells:
"This allows the match-only-at-start capabilities of the character "^" to effectively be used at any location in a string by changing the value of the lastIndex property."
Also this page contains an example, where "g" flag should be used instead of "y".
I cannot think, when "y" can be usefull. But why not to extend "startsWith" to work with RegExpes at least?
(In reply to 4esn0k from comment #6)
> I cannot think, when "y" can be usefull.

I'm not sure I understand. The part you quite, and the way you quote it, sounds like it does what you want. Why can it then never be useful?

> But why not to extend "startsWith" to work with RegExpes at least?

The exact behavior of the "sticky" flag is now specified in ES6[1]. If you think that something is seriously wrong with the way it works, you should file a bug against the specification, as mentioned by brendan in comment 4.

The same applies if you want to have other functions extended, String.prototype.startsWith[2] in this case. Note that the specification of that function already contains a note about potentially supporting Regexps in the future.

[1]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-regexpbuiltinexec
[2]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.startswith
sorry, i missed, that i need to use /y flag without "^" in pattern.
Assignee: general → nobody
The draft ES6 spec now lays down the rule:

21.2.2.6 ... Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line.

But FF gets this wrong (tested in FF 32 on MacOSX):

var re = /^foo/y
re.lastIndex = 2
re.test("..foo")

This tests true, but should be false.
(In reply to Erik Corry from comment #9)
> But FF gets this wrong (tested in FF 32 on MacOSX):

Thanks for testing. It still does in current Nightly.
Blocks: es6
OS: Windows 7 → All
Hardware: x86_64 → All
Version: 13 Branch → unspecified
Blocks: 887016
Changed following:
  * In irregexp::CompilePattern, do not prepend nodes for ".*?" pattern when sticky flag is set,
    so pattern match starts directly from specified offset
  * remove hack for sticky flag in following:
    * In RegExpShared::compile do not enclose pattern with "^(?:)"
    * In RegExpShared::execute, do not modify the start of input text, and related other parameters, results
  * In RegExpShared::execute, call HasSubstringAt instead of StringFindPattern if sticky flag is set and pattern is simple string

Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9df753d96a0

Currently ^, \b, and \B assertions are tested.  Do you know any other case that sticky flag bahaves differently between current SpiderMonkey and the spec?
Assignee: nobody → arai.unmht
Attachment #8663358 - Flags: review?(till)
Thank you Erik :)
I tested with latest revision:
  https://v8.googlecode.com/svn/trunk/test/mjsunit/harmony/regexp-sticky.js
and looks like the test itself has 2 bugs in lastIndex after match (or, maybe the spec changed after that?).
Except them, patched SpiderMonkey passed it, and I think the test in my patch covers almost cases :)
wow, it was old repository :P

anyway it's not changed:
  https://chromium.googlesource.com/v8/v8.git/+/master/test/mjsunit/harmony/regexp-sticky.js
Comment on attachment 8663358 [details] [diff] [review]
Fix assertion pattern in RegExp with sticky flag.

Review of attachment 8663358 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice, thank you. r=me with feedback addressed.

::: js/src/jsstr.h
@@ +218,5 @@
>  
>  extern int
>  StringFindPattern(JSLinearString* text, JSLinearString* pat, size_t start);
>  
> +/* Return true if the string contains a pattern at start. */

s/start/|start|/ to make clear that the value of that parameter is meant, not the start of the string.

::: js/src/tests/ecma_6/RegExp/sticky.js
@@ +2,5 @@
> +var summary = 'sticky flag should not break assertion behavior.';
> +
> +print(BUGNUMBER + ": " + summary);
> +
> +function test(re, text, results) {

I think I would call |results| |expectations|, and the |ret| member |matches|. And rename |m| to |match|. However, all that is utterly unimportant, so feel free to ignore.

@@ +3,5 @@
> +
> +print(BUGNUMBER + ": " + summary);
> +
> +function test(re, text, results) {
> +  // sanity check for test data itself

Nit: full-sentence comments should start with an upper-case letter and end with a "."

@@ +18,5 @@
> +    } else {
> +      assertEq(re.lastIndex, result.lastIndex);
> +      assertEq(m !== null, true);
> +      assertEq(m.length, result.ret.length);
> +      for (var j = 0; j < result.ret.length; j++) {

Nit: no braces needed for single-line body.

::: js/src/vm/RegExpObject.cpp
@@ +666,5 @@
>      if (canStringMatch) {
>          MOZ_ASSERT(pairCount() == 1);
> +        size_t sourceLength = source->length();
> +        if (sticky()) {
> +            if (sourceLength + start < sourceLength || sourceLength + start > length)

I don't think the first part of this condition can ever hold: |start| is unsigned, and this boils down to `start < 0`. Simply removing it should be fine, IINM.

@@ +753,5 @@
>          result = irregexp::InterpretCode(cx, byteCode, chars, start, length, matches);
>      }
>  
> +    if (result == RegExpRunStatus_Success && matches)
> +        matches->checkAgainst(length);

I think `MatchPairs::displace` is dead after this removal. If I'm right, please also remove the definition.
Attachment #8663358 - Flags: review?(till) → review+
(In reply to Tooru Fujisawa [:arai] from comment #11)
> Currently ^, \b, and \B assertions are tested.  Do you know any other case
> that sticky flag bahaves differently between current SpiderMonkey and the
> spec?

Forgot to comment on this: no, I can't think of more cases either.

And thanks, Erik, for the tests!
Comment on attachment 8663358 [details] [diff] [review]
Fix assertion pattern in RegExp with sticky flag.

Review of attachment 8663358 [details] [diff] [review]:
-----------------------------------------------------------------

Good point, I think V8 has a bug there (and the test) where it is not writing back lastIndex for non-global sticky regexps.
https://hg.mozilla.org/mozilla-central/rev/a567df6edc07
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Keywords: dev-doc-needed
Whiteboard: [js:p2] → [js:p2][DocArea=JS]
Flags: needinfo?(supriyantomaftuh)
Flags: needinfo?(supriyantomaftuh)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: