Closed Bug 1806042 Opened 1 year ago Closed 1 year ago

Sites affected by the new date-time format change

Categories

(Core :: Internationalization, defect)

Desktop
All
defect

Tracking

()

RESOLVED FIXED
110 Branch
Webcompat Priority P1
Tracking Status
firefox-esr102 --- unaffected
firefox108 --- unaffected
firefox109 + fixed
firefox110 + fixed

People

(Reporter: ksenia, Assigned: jfkthame)

References

(Regression, )

Details

(Keywords: regression)

Attachments

(3 files)

Attached file 114746.html

We've received a report in https://github.com/webcompat/web-bugs/issues/114746 where store detail pages are broken on homedepot.com.

To reproduce, visit https://www.homedepot.com/l/Hampton/VA/Hampton/23666/4612 in Nightly 110.0a1 (2022-12-15) and observe the page.

The site is using the hour:min:sec part from the result of :

const e = (new Date).toLocaleString('en-US', {
        timeZone: 'America/Anchorage'
    });

and tries to extract only hour from this string with the following regex: match(/(\d+):(\d+):(\d+) (\w)/); and expecting a match, but since bug1792775 this expression returns null. I've attached a test case.

Mozregression:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=7a1faa41f831ac8c3e5e751bde276a6a1d3eff39&tochange=b12f89a6bbffe2c243b664c4a4f60cd8a7b1c65c

I'm not sure whether this is expected behavior since the page/testcase also breaks in Chrome Canary, so just filing this for visibility.

André, wonder if you could take a look at this, please?

Flags: needinfo?(andrebargull)

Set release status flags based on info from the regressing bug 1792775

This is a bug in their code. I18n results are best-effort and may change over time as i18n data improves. This is such case, and all web browsers will migrate to ICU 72 and won't work with such code.

We were aware that such bugs might arise, see bug 1792775, comment #16. That's why we timed this change, so it happens at the same time as in Chromium. Can we first try to notify the website owner, so they update their code to handle the new date-time format for en-US? If that fails we should consider adding an intervention.

Flags: needinfo?(andrebargull)

(In reply to André Bargull [:anba] from comment #4)

We were aware that such bugs might arise, see bug 1792775, comment #16. That's why we timed this change, so it happens at the same time as in Chromium. Can we first try to notify the website owner, so they update their code to handle the new date-time format for en-US? If that fails we should consider adding an intervention.

Thanks, I'll try to contact them then and file a bug for an intervention.

There is another report about group pages on meetup.com being broken as well in https://github.com/webcompat/web-bugs/issues/115508.
They use toLocaleTimeString and expecting third element to be present in an array after splitting a string with a space, but it's undefined (since the change in spaces):

   const t = (new Date).toLocaleTimeString('en-us', {
        timeZone: "US/Mountain",
        timeZoneName: 'short',
        hour12: !0
    }).split(' ') [2]
Webcompat Priority: --- → ?

Can this bug be closed now it had been resolved?

From https://bugs.chromium.org/p/chromium/issues/detail?id=1392814#c26:

Update here is that homedepot is aware of the issue and will work on a fix. But if we discover more broken sites, I suggest we consider rolling back the ICU update.

Looks like homedepot was contacted by the Chrome team and is working on a fix. So there is a remaining issue with meetup.com (so far), I'll reach out to them.

Wonder if we should have a plan in case Chrome rolls back the ICU update?

(In reply to hgreen.9 from comment #6)

Can this bug be closed now it had been resolved?

We should keep this bug open for now to track affected sites

Summary: Store detail page is broken on homedepot.com → Sites affected by the new date-time format change
Flags: needinfo?(dschubert)

Try push of the ICU72 update reverted from Beta:
https://treeherder.mozilla.org/jobs?repo=try&revision=1dcb26f3e14a58ccbdb2ca535121fb9ad7768b73

André, can you please sanity check the commit above just to make sure everything looks as expected? It did backout cleanly, modulo the CLOBBER bump which I did manually.

Flags: needinfo?(andrebargull)

Ugh, as I feared, there do appear to be other bugs which landed after the ICU72 update which also depended on it. We're going to need to backout some other things if we're to go this route.

Alright, this one is looking better:
https://treeherder.mozilla.org/jobs?repo=try&revision=5e209935659b73df8b7303a846daba04cd42ca6a

It maintains the tzdata 2022g update that landed after the ICU72 update landed. It also includes a revert of bug 1790163 and bug 1473911 as they also depended on the newer ICU version. With that, CI is all green from what I can see. I'd still love a sanity check of the resulting patches, André.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #11)

I'd still love a sanity check of the resulting patches, André.

The updated patch looks good to me! I've spot checked some files and validated the ICU data file has the tzdata 2022g update correctly applied.

Flags: needinfo?(andrebargull)

The bug is marked as tracked for firefox109 (beta) and tracked for firefox110 (nightly). However, the bug still isn't assigned.

:Amir, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(ahabibi)

After syncing with the other WebCompat folks, I unfortunately have to request to back out the ICU update for now. :(

While I understand that some changes in i18n APIs are expected, the current situation has us surprised about real-world web breakage, and we're currently not able to mitigate or investigate this in any meaningful way.

The update is riding Firefox 109, which would hit Release on January 17th. As noted in bug 1792775, this update is also shipping in Chrome, but they have slotted it for Chrome 110, which would be released on February 7th. So if we keep this update in, we'd be breaking the web three weeks before Chrome does. This already isn't a great spot to be in, since we can't rely on Web Developers fixing their sites because they're broken everywhere. Reading this comment on the homedepot crbug, and noting that Chrome also has issues on meetup.com, I wouldn't be surprised if Chrome decides to backout as well.

The second factor is timing: between now and January 17th, there isn't much time left, especially considering that a lot of people will be on vacation. We got a bit surprised by the fallout here, and we did not have time to properly investigate this situation, or to reach out to sites and/or find a browser-side mitigation for compat issues. And now it's a bit too late for that, especially since I'd assume that our outreach attempts are severely hindered by the fact that most webdevs probably are on vacation, too.

So a backout, for now, seems like the safest solution here.

After that, we should collaborate with Google and Unicode folks to see if we can find a web-compatible solution. If, for some reason, there is no way to keep ICU compatible, then we can at least spend some time on investigating the potential fallout, reach out to sites, and prepare WebCompat Interventions as a mitigation for big websites that don't get a fix out in time. In the best case, if Chrome sticks with their release plan, we don't even have to do any additional work, and just wait for webdevs to notice their sites being broken in Chrome.

Webcompat Priority: ? → P1
Flags: needinfo?(dschubert) → needinfo?(ryanvm)

As far as I'm aware, the only web-visible change in ICU72 that looked high-risk was the use of narrow no-break spaces as field separators in place of regular ASCII spaces, which confuses poorly-implemented code that wants to parse the result.

So an alternative mitigation for the ICU72-related webcompat breakage would be to post-process the date/time strings we get from ICU to replace U+202F separators with regular U+0020 spaces. That would allow us to keep the ICU72 update (including the IDN bugs that have landed on top, for example) and just patch up the data that's confusing websites.

@anba, do you know of any other changes ICU72 made that might be similarly risky, or are these spaces the only such issue?

Flags: needinfo?(andrebargull)

The ICU72 update has been backed out from Beta for 109.0b5 along with two other bugs which depended on it.
https://hg.mozilla.org/releases/mozilla-beta/rev/94c0c172ef0b
https://hg.mozilla.org/releases/mozilla-beta/rev/7deaef0e20e3

Flags: needinfo?(ryanvm)

FWIW, with this patch:

diff --git a/js/src/builtin/intl/DateTimeFormat.cpp b/js/src/builtin/intl/DateTimeFormat.cpp
--- a/js/src/builtin/intl/DateTimeFormat.cpp
+++ b/js/src/builtin/intl/DateTimeFormat.cpp
@@ -1128,6 +1128,14 @@ static bool intl_FormatDateTime(JSContex
     return false;
   }
 
+  // Replace any occurrences of U+202F in buffer with U+0020, to mitigate
+  // webcompat problems triggered by ICU 72.
+  for (auto& c : mozilla::Span(buffer.data(), buffer.length())) {
+    if (c == 0x202F) {
+      c = 0x0020;
+    }
+  }
+
   JSString* str = buffer.toString(cx);
   if (!str) {
     return false;

to the JS engine, the testcase attached here no longer throws an error (the original page on homedepot.com doesn't allow me access, maybe geo-blocked?), and the meetup page no longer goes blank after loading.

But I really have no idea whether this is an appropriate place to apply such a hack. Andre would be a better person to say....

(In reply to Jonathan Kew [:jfkthame] from comment #15)

@anba, do you know of any other changes ICU72 made that might be similarly risky, or are these spaces the only such issue?

I don't think any other change in ICU 72 is risky. For example we didn't (yet) see any reports about the other space change, where date range formatting is now using U+2009 instead of U+0020.

Flags: needinfo?(andrebargull)

(In reply to Jonathan Kew [:jfkthame] from comment #17)

But I really have no idea whether this is an appropriate place to apply such a hack. Andre would be a better person to say....

Yes, either directly before passing the formatted JSString to the user or earlier in mozilla::intl::DateTimeFormat, if we also need/want to cover other mozilla::intl::DateTimeFormat callers. If we want to restrict this fix-up phase to the Intl.DateTimeFormat JavaScript object, we should also modify the formatted string for Intl.DateTimeFormat.prototype.formatToParts (implemented in the intl_FormatToPartsDateTime C++ function) to ensure consistent results w.r.t. Intl.DateTimeFormat.prototype.format.

(In reply to André Bargull [:anba] from comment #18)

For example we didn't (yet) see any reports about the other space change, where date range formatting is now using U+2009 instead of U+0020.

Hmm; I do wonder if we should patch that similarly, if we're going to apply a hack to minimize compat risk here. Although it hasn't showed up yet, it's not too hard to imagine that a site might try formatting a date range and then use a (fragile) regex to take it apart. If we're temporarily reverting the change to one of these kinds of space separators, we should probably do both.

(In reply to André Bargull [:anba] from comment #19)

(In reply to Jonathan Kew [:jfkthame] from comment #17)

But I really have no idea whether this is an appropriate place to apply such a hack. Andre would be a better person to say....

Yes, either directly before passing the formatted JSString to the user or earlier in mozilla::intl::DateTimeFormat, if we also need/want to cover other mozilla::intl::DateTimeFormat callers.

Right, I wondered whether to do this in mozilla::intl::DateTimeFormat.

My thinking was that the problem with the U+202F spaces only affects JS code that gets a formatted date/time and tries to interpret it in some way (e.g. with a simple regex) that assumed "normal" space. A DateTimeFormat caller that simply wants to render the resulting string won't care what kind of spaces are used, they'll just be passed to the rendering code as-is.

But I'd be fine with doing it at the lower level of mozilla::intl, too. Maybe that would be the safer option, in case some other caller does care.

I've just pushed a try run with a patch to mozilla::intl::DateTimeFormat to "fix" both U+202F and U+2009, for consideration. This will presumably result in some test failures, where tests were updated in bug 1792775 to expect the new types of spaces, so if we proceed with this then corresponding adjustments will be needed there.

Patching mozilla::intl::DateTimeFormat seems like a simpler solution to me compared to doing it in SpiderMonkey, as long as we document it. My only small concern would be other locales were already using the characters we are swapping out, which I guess you could examine the old CLDR data.

OK, I propose that we go ahead and apply a patch to mozilla::intl::DateTimeFormat and mozilla::intl::DateIntervalFormat to restore "normal" spaces in Nightly, so we can see how that goes for the next couple of weeks.

The data for a bunch of locales was updated in ICU 72 to use U+202F and U+2009 in places where previously it had regular Space characters.
Unfortunately, this breaks some sites that attempt to parse the formatted output using naive regular expressions (or similar)
that just expect space, rather than "any whitespace", and fail to match against the new formatted output.

To mitigate this, until more browsers update to the newer ICU/CLDR data and pressure builds on sites to fix such fragile scripts,
we can post-process the formatted output from ICU to replace these "special" spaces with standard ASCII space characters.

This workaround is designed to be easily disabled at build time by just changing the DATE_TIME_FORMAT_REPLACE_SPECIAL_SPACES #define,
when we're ready to try re-enabling the updated formats.

Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

We'll need to revert this when we're ready to disable the workaround.

Depends on D165408

you can add foxnews.com to the list affected in latest Nightly and also the latest Chrome. Edge is not affected it seems.
Images on Foxnews will fail to load when site does its own page refresh. A manual refresh of the main page restores the images, but fail again on the next site refresh.

Attachment #9309601 - Attachment description: Bug 1806042 - Replace Narrow No-Break Space (U+202F) and Thin Space (U+2009) in DateTimeFormat/DateTimeIntervalFormat output with regular Space to mitigate breakage on fragile websites. r=dminor,gregtatum → Bug 1806042 - Replace Narrow No-Break Space (U+202F) and Thin Space (U+2009) in DateTimeFormat/DateTimeIntervalFormat output with regular Space to mitigate breakage on fragile websites. r=anba,dminor,gregtatum

(In reply to Jim Jeffery not reading bug-mail 1/2/11 from comment #25)

you can add foxnews.com to the list affected in latest Nightly and also the latest Chrome. Edge is not affected it seems.
Images on Foxnews will fail to load when site does its own page refresh. A manual refresh of the main page restores the images, but fail again on the next site refresh.

Interestingly, I wasn't able to reproduce a problem on foxnews.com. (Maybe they do IP geolocation and serve different content?) Anyhow, I have just triggered landing for the workaround patch here. So once that's in Nightly, if you could confirm it whether fixes the problem, that'd be good to know.

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/40e2c54d5618
Replace Narrow No-Break Space (U+202F) and Thin Space (U+2009) in DateTimeFormat/DateTimeIntervalFormat output with regular Space to mitigate breakage on fragile websites. r=anba
https://hg.mozilla.org/integration/autoland/rev/4b27c1d6a589
Update tests to expect normal spaces rather than U+202F/2009 in formatted date/time strings. r=gregtatum
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 110 Branch
Regressions: 1807496

(In reply to Jonathan Kew [:jfkthame] from comment #26)

(In reply to Jim Jeffery not reading bug-mail 1/2/11 from comment #25)

you can add foxnews.com to the list affected in latest Nightly and also the latest Chrome. Edge is not affected it seems.
Images on Foxnews will fail to load when site does its own page refresh. A manual refresh of the main page restores the images, but fail again on the next site refresh.

Interestingly, I wasn't able to reproduce a problem on foxnews.com. (Maybe they do IP geolocation and serve different content?) Anyhow, I have just triggered landing for the workaround patch here. So once that's in Nightly, if you could confirm it whether fixes the problem, that'd be good to know.

Well it seems likely its something else causing the page to not load the thumbnails. So the fix you landed did not affect the issue, nor did
we think that it would. Just a hope that it was related. Biggest problem is I can't find a way to repo the problem reliably.

Thanks for your work... will keep digging, hoping it goes away.

Flags: needinfo?(ahabibi)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: