Closed Bug 730209 Opened 12 years ago Closed 12 years ago

Dictionary list should show raw language subtags when names are unavailable

Categories

(Core :: Spelling checker, defect)

9 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla14

People

(Reporter: GPHemsley, Assigned: GPHemsley)

References

(Blocks 3 open bugs, )

Details

(Whiteboard: [bcp47])

Attachments

(2 files, 5 obsolete files)

The spellcheck dictionary interface should not rely on the main localized list of language and region subtags to be displayed on the dictionary selection list.

The list of possible language subtags is magnitudes longer than the list we expect to make available to localizers (bug 716321) in the short term. (In the long term, we hope to have the full list available, if only in their default English names; see bug 666662 and 666731.)

The spellchecking dictionary list interface is not aware of this fact and thus provides a nearly-useless experience for a user who installs a spellchecking dictionary for a language whose language subtag is not available in languageNames.properties. In particular, the code is unable to find a name associated with the particular subtag, so it displays absolutely nothing in its place.

Instead, it should simply display the raw language subtag, which is certainly better than nothing. Although less likely to be a problem, the same should be done for the region subtag, as well.

The code in question should also be made aware of the possibility of an intervening script subtag between the language and region subtags. The code was originally written at a time when the script subtag was not a standard part of a language tag, so it simply assumes that the second subtag is always region tag. This problem has been seen in the wild with, for example, Macedonian. There are spellcheckers available for that language in both Cyrillic and Latin scripts. However, since Firefox does not support the valid xx-Xxxx-XX subtag, the authors of those dictionary addons have resorted to abusing the fact that the spellchecker code outputs the third subtag as a literal (I assume in anticipation of it being a variant subtag), resulting in dictionaries for the invalid mk-MK-Cyrl and mk-MK-Latn language tags.

One final note: Kevin and I seem to recall that this behavior was present in Firefox, at least for the language subtag, at some point in the past. (Bug 666731 comment 1 suggests it being present as of 2011-07-20, though I can't confirm that's what I meant at the time.) I thought I'd tracked the change down to either bug 338427 or bug 678842, but neither of those bugs changed the code in question here. So either Kevin and I are misremembering, or there was some change somewhere else that had a cascading effect on this behavior and might be worth investigating.

Oh, I should be clear. The code in question is here:
http://hg.mozilla.org/mozilla-central/file/cd120efbe4c6/toolkit/content/InlineSpellChecker.jsm#l207
The fix for this is pretty simple, at least in the short term. (That is, without localized script or variant subtags.) I've got a patch forthcoming for this. The long term implementation will happen in bug 666662 and bug 666731.

However, I am worried about what caused the regression, as it's clear that the code actually powering the list was never changed. I'm wondering if the string bundle service was changed so that it no longer returns the raw strings when a lookup failed. In particular, I worry what kind of cascading effects such a change might have had beyond just the spellchecker dictionary list.
Assignee: nobody → gphemsley
Here's a patch that brings the dictionary filename parsing up-to-date with BCP 47. In particular reference to this bug, it allows fallback to the raw subtags if a corresponding localized name cannot be found. It also adds primitive support for script subtags and multiple variants.
Attachment #600568 - Flags: review?(gavin.sharp)
Attachment #600568 - Flags: feedback?(l10n)
Attached image Screenshot of patch's behavior (obsolete) —
Here's a screenshot of the patch in action, parsing the following dictionaries:
haw-XA
en-US
qa-Qaaa-QZ-fonipa
dyo-SN
mk-MK-Latn
qa-Qaaa-QA-fonipa-fonxsamp
haw-US
mk-MK-Cyrl
haw-Cyrl-XA
csk-SN

(Note: Some of these are nonsensical test dictionaries; others are real dictionaries available in the wild.)
Blocks: 709930
Comment on attachment 600568 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way

>diff --git a/toolkit/content/InlineSpellChecker.jsm b/toolkit/content/InlineSpellChecker.jsm

>+      // Get the display name for this dictionary.
>+      let [languageTag, languageSubtag, scriptSubtag, regionSubtag, variantSubtags] = list[i].match(/([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i);

Can you put the regex into a local variable, at least, to make this slightly more readable?

>+      var displayName = ""; dump( languageTag + ", " + languageSubtag + ", " + scriptSubtag + ", " + regionSubtag + ", " + variantSubtags + "\n" );

You'll want to remove the dump before landing this.

>+      if (languageTag) {

Can you move all of this code into a getDictionaryDisplayName helper function?

>+          /*try {
>+            displayName += gScriptBundle.GetStringFromName(scriptSubtag.toLowerCase());
>+          } catch(e) {*/
>+            displayName += scriptSubtag; // Fall back to raw script subtag.
>+          /*}*/

Why is this commented out?

r- for these minor nits - if you address these comments I'll have a quicker turnaround, sorry for lagging behind on this request.
Attachment #600568 - Flags: review?(gavin.sharp) → review-
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #4)
> >+      // Get the display name for this dictionary.
> >+      let [languageTag, languageSubtag, scriptSubtag, regionSubtag, variantSubtags] = list[i].match(/([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i);
> 
> Can you put the regex into a local variable, at least, to make this slightly
> more readable?

So, you mean, something like this?

let languageTagMatch = /([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i;
let [languageTag, languageSubtag, scriptSubtag, regionSubtag, variantSubtags] = list[i].match(languageTagMatch);

> >+      var displayName = ""; dump( languageTag + ", " + languageSubtag + ", " + scriptSubtag + ", " + regionSubtag + ", " + variantSubtags + "\n" );
> 
> You'll want to remove the dump before landing this.

D'oh!

> >+      if (languageTag) {
> 
> Can you move all of this code into a getDictionaryDisplayName helper
> function?

So, something like getDictionaryDisplayName(matches)?

Out of curiosity, what benefit would that serve?

> >+          /*try {
> >+            displayName += gScriptBundle.GetStringFromName(scriptSubtag.toLowerCase());
> >+          } catch(e) {*/
> >+            displayName += scriptSubtag; // Fall back to raw script subtag.
> >+          /*}*/
> 
> Why is this commented out?

We don't currently have a dictionary for script subtags, but I wanted to indicate where it would go and what the code should look like when we got one.

> r- for these minor nits - if you address these comments I'll have a quicker
> turnaround, sorry for lagging behind on this request.

I'll try to get to it this week. Thanks for the review.
(In reply to Gordon P. Hemsley [:gphemsley] from comment #5)
> So, you mean, something like this?
> 
> let languageTagMatch =
> /([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-
> 9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i;
> let [languageTag, languageSubtag, scriptSubtag, regionSubtag,
> variantSubtags] = list[i].match(languageTagMatch);

Yep.

> So, something like getDictionaryDisplayName(matches)?
> 
> Out of curiosity, what benefit would that serve?

No, I meant getDictionaryDisplayName(list[i]). The purpose is to make addDictionaryListToMenu easier to understand.

> We don't currently have a dictionary for script subtags, but I wanted to
> indicate where it would go and what the code should look like when we got
> one.

Seems unnecessary - this code isn't very complicated, so leaving it in commented out is more confusing than useful IMO.
This should address the problems you mentioned.
Attachment #600568 - Attachment is obsolete: true
Attachment #609177 - Flags: review?(gavin.sharp)
Attachment #600568 - Flags: feedback?(l10n)
Comment on attachment 609177 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way (v2)

>diff --git a/toolkit/content/InlineSpellChecker.jsm b/toolkit/content/InlineSpellChecker.jsm

>+function getDictionaryDisplayName(rawLanguageTag) {

Put this at the end of the file, rather than between the definition of InlineSpellChecker and InlineSpellChecker.prototype.

>+  var displayName = "";
>+  if (languageTag) {

if (!languageTag)
  return rawLanguageTag;

and then unindent the rest of the function.

>-      var displayName = "";
>+      displayName = getDictionaryDisplayName(list[i]);

Looks like you lost the "var". But you can get rid of that variable entirely and just do:

item.setAttribute("label", getDictionaryDisplayName(list[i]));

below.

Given that the method is now factored out, it should be pretty easy to write a browser chrome test that verifies its output (ideally at least one test for all possible code paths). There's some info on browser chrome tests at https://developer.mozilla.org/en/Browser_chrome_tests, and an example browser chrome test at http://mxr.mozilla.org/mozilla-central/source/toolkit/content/tests/browser/browser_Geometry.js . 

r=me with those changes.
Attachment #609177 - Flags: review?(gavin.sharp) → review+
This should address the issues mentioned here, as well as the issues discussed via IRC.

This moves getDictionaryDisplayName() back into InlineSpellChecker.prototype, and makes the regex exhaustive. It also changes the check for invalid language tags into a try/catch block.

Most importantly, it adds a bazillion tests. :)
Attachment #609177 - Attachment is obsolete: true
Attachment #609953 - Flags: review?(gavin.sharp)
Here's an updated screenshot of the patch's behavior. (Includes updated list of language names proposed in bug 716321.)
Attachment #600573 - Attachment is obsolete: true
Blocks: 739861
Comment on attachment 609953 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way (v3)

>diff --git a/toolkit/content/tests/browser/browser_InlineSpellChecker.js b/toolkit/content/tests/browser/browser_InlineSpellChecker.js

>+let tempScope = {};
>+Components.utils.import("resource://gre/modules/InlineSpellChecker.jsm", tempScope);
>+let InlineSpellChecker = tempScope.InlineSpellChecker;

You can put this code inside the test() function (it's generally best to avoid running code in tests outside of test()).

Good job with the test coverage!
Attachment #609953 - Flags: review?(gavin.sharp) → review+
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #11)
> Comment on attachment 609953 [details] [diff] [review]
> Parse dictionary filename in a BCP 47–aware way (v3)
> 
> >diff --git a/toolkit/content/tests/browser/browser_InlineSpellChecker.js b/toolkit/content/tests/browser/browser_InlineSpellChecker.js
> 
> >+let tempScope = {};
> >+Components.utils.import("resource://gre/modules/InlineSpellChecker.jsm", tempScope);
> >+let InlineSpellChecker = tempScope.InlineSpellChecker;
> 
> You can put this code inside the test() function (it's generally best to
> avoid running code in tests outside of test()).

FTR, I copied that directly from the test you linked to me. ;)

But here's an updated patch.
Attachment #609953 - Attachment is obsolete: true
Attachment #609988 - Flags: checkin?(gavin.sharp)
(In reply to Gordon P. Hemsley [:gphemsley] from comment #12)
> FTR, I copied that directly from the test you linked to me. ;)

It's wrong there too :)
Rolled up v4 into a Hg Queue patch for easier checkin.
Attachment #609988 - Attachment is obsolete: true
Attachment #610334 - Flags: checkin?(gavin.sharp)
Attachment #609988 - Flags: checkin?(gavin.sharp)
Attachment #610334 - Flags: checkin?(gavin.sharp)
Flags: in-testsuite+
Keywords: checkin-needed
Target Milestone: --- → mozilla14
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/bb679e5939b9 - odds are most of the bustage from that push came from your push-neighbors, but it seems maybe not entirely impossible that the way bug 718316 went from something that hadn't happened a single time since the end of February to something that happened every single time starting with that push might be related to this. I'd highly recommend a try push (and a defensively placed link to tbpl for it put into the bug).
Pushed to try in https://tbpl.mozilla.org/?tree=Try&rev=a81db845ed05 while you were pushing https://tbpl.mozilla.org/?tree=Try&rev=34db65aca0e5 so between us we ought to get you cleared of suspicion.
(In reply to Phil Ringnalda (:philor) from comment #17)
> Pushed to try in https://tbpl.mozilla.org/?tree=Try&rev=a81db845ed05 while
> you were pushing https://tbpl.mozilla.org/?tree=Try&rev=34db65aca0e5 so
> between us we ought to get you cleared of suspicion.

If I'm reading these results correctly, this patch seems to be in the clear. :)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/7737ca752860
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blocks: 741842
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: