Closed Bug 378738 Opened 17 years ago Closed 17 years ago

\d pattern matches characters other than the decimal digits 0-9 (ecma_3/RegExp/15.10.2.12.js)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: martin.honnen, Assigned: Waldo)

References

()

Details

Attachments

(1 file)

With Spidermonkey if you evaluate
  /\d/.test("\uFF11")
then it returns true but should return false as the ECMAScript specification clearly says:
  "15.10.2.12 CharacterClassEscape
The production CharacterClassEscape :: d evaluates by returning the ten-element set of characters containing the
characters 0 through 9 inclusive."

This is a bug in Spidermonkey (already present in Netscape 4 builds but still there in a Firefox trunk nightly). Other engines like Rhino or MS JScript return false.

Bug was reported in mozilla.dev.tech.javascript by Igor Tandetnik.
Oops, there is code in browser.js (by me, although Mano has blame) that depends on this bug: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/base/content/browser.js&rev=1.778&mark=1434-1438#1421 checked in in bug 298004
Attached patch Obvious fixSplinter Review
This doesn't fix the browser dependency on the bug; that can be a separate patch.
Assignee: general → jwalden+bmo
Status: NEW → ASSIGNED
Attachment #262776 - Flags: review?(mrbkap)
Attachment #262776 - Flags: review?(mrbkap) → review+
Should we fix the REOP_NONDIGIT case as well?  And are we sure we shouldn't WONTFIX this?  Perhaps the spec is simply wrong here?
It is worth noting probably that \d and \D in character _classes_ do the right thing.  So maybe we should fix this inconsistency.
On WONTFIX: I feel it naturally follows from the principle of least surprise that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10) should return a value other than NaN (similarly for Number(s) ).

Currently this condition does not hold. Personally, I would be happy with either /\d/ changed to only match [0-9] (and \D to match [^0-9]), or parseInt changed to understand "fancy digits".
Please note that ChatZilla has a copy of the code in browser.js mentioned in comment 1, and other things might have copied it too.
(In reply to comment #5)
> On WONTFIX: I feel it naturally follows from the principle of least surprise
> that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10)
> should return a value other than NaN (similarly for Number(s) ).

I'm not sure I agree with this. Making parseInt handle digits other than in the ASCII range seems like a big can of worms. What should parseInt("१২੩") return? (that's a Devanagari 1, a Bengali 2, and a Gurmukhi 3).
I'll be happy with it returning 123. The probability of such a situation arising in practice is very near zero, I believe. Since the string is linguistically meaningless, parsing it to 123 is as good an outcome as any other.

But again, I would be equally happy with \d restricted to [0-9] and "१২੩" failing to parse as a number. What I object to is "१২੩" looking like a number lexically, but then failing to parse. Or, rather, I object to "१२३" looking like a number but failing to parse (here, all three characters are Devanagari digits).
(In reply to comment #3)
> Should we fix the REOP_NONDIGIT case as well?

Dangit, that's twice I've done this recently (not catch an obvious variant of a bug); I'll fix that on checkin (probably this evening).

> And are we sure we shouldn't WONTFIX this?  Perhaps the spec is simply wrong
> here?

Everyone else gets it right.  Also, this contradicts most of ES3 by taking Unicode character classes into account; I'd leave it for ES4 if people want it.
Consider also code like this:

http://lxr.mozilla.org/mozilla/source/calendar/resources/content/datetimepickers/datetimepickers.xml

It uses \d for parsing and Number for converting all over the place. It would seem the practice is quite common.
Blocks: 379088
Fixed on trunk, with the browser bug filed as bug 379088.
Fixed on trunk, with the browser bug filed as bug 379088.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
(I did add the fix for \D as well, to save people the trouble of checking.)
/cvsroot/mozilla/js/tests/ecma_3/RegExp/15.10.2.12.js,v  <--  15.10.2.12.js
initial revision: 1.1
Flags: in-testsuite+
verified fixed 1.9.0 linux/mac* shell.
Status: RESOLVED → VERIFIED
Summary: \d pattern matches characters other than the decimal digits 0-9 → \d pattern matches characters other than the decimal digits 0-9 (ecma_3/RegExp/15.10.2.12.js)
Following document seems to say about JS 1.5 RegExp spec based on test result with Mozilla family on which patch for bug 298004 is applied.
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp
Please add description about "JS 1.9 backed out inappropriate spec alteration by bug 298004" in "New in JavaScript 1.9" of following page, when it'll be added.
  http://developer.mozilla.org/en/docs/JavaScript 
OS: Windows XP → All
Hardware: x86 → All
No longer blocks: 379088
Depends on: 379088
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: