Closed Bug 1235365 Opened 8 years ago Closed 8 years ago

Manifest and locale files are not parsed as UTF-8

Categories

(WebExtensions :: Untriaged, defect)

x86_64
macOS
defect
Not set
normal

Tracking

(firefox46 verified, firefox48 verified, firefox-esr45 wontfix)

VERIFIED FIXED
mozilla46
Tracking Status
firefox46 --- verified
firefox48 --- verified
firefox-esr45 --- wontfix

People

(Reporter: ato, Assigned: kmag)

References

Details

Attachments

(5 files)

The description underneath each extension listed on about:addons does not support unicode characters from the "description" field in manifest.json files.

It appears the character ’ gets double encoded as â.
See the attached screenshot for an example.
Attached image Counterexample on Linux
UA:"Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0 SeaMonkey/2.43a1" ID:20151220153800 c-c:076c24f6d01d0a5f0b14fb1cc80c17ff24dc9074 m-c:388bdc46ba51ee31da8b8abe977e0ca38d117434 en-US

I can't reproduce the problem on this Linux build: see attached snapshot, where the β (Greek letter beta) in the description of abcTajpu is U+03B2
Andreas, please pase your "User Agent" and "Build ID"; they can be found in the top section of the "Help → Troubleshooting Information" page.
Flags: needinfo?(ato)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0
Build ID: 20151218030232
Flags: needinfo?(ato)
I updated to the latest Nightly, which at the time of writing is 20151228030213 (Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0), and the problem persists.

The attached screenshot shows an example of the characters €ß’, which get interpreted as â¬Ãâ.
Hm, somehow your add-ons manager apparently doesn't understand the charset of whatever it is defines that "international" string.

In attachment 8702288 [details] I could find the string defined in the add-on's install.rdf, as follows:

[...]
  <em:localized>
    <Description>
      <em:locale>en-US</em:locale>
      <em:name>abcTajpu "a b c type-oo"</em:name>
      <em:description>a` b\ c^ -&gt; à β ĉ  (http://lingvo.org/abctajpu)</em:description>
      <em:homepageURL>http://lingvo.org/abctajpu/</em:homepageURL>
    </Description>
  </em:localized>

which AFAICT is a UTF-8 file without BOM inside the unpacked XPI in my <profile>/extensions/ directory.

In your first example, â<80><99>, or 0xE2 0x80 0x99, is the bytecode for UTF-8 U+2019 RIGHT SINGLE QUOTATION MARK, but interpeted as if it were encoded in Latin1. Similarly, in your second example, â<82>¬Ã<9F>â<80><99> is a Latin1 interpretation of what would be €ß’ (U+20AC U+00DF U+2019) in UTF-8.

It looks like the texts are defined in UTF-8 but interpreted as if they were Latin1 (aka ISO_8859-1).

What is your system locale? Mine is en_US.UTF-8, as shown by typing
    echo $LANG
in the xterm from which I launched SeaMonkey. ($LC_ALL, which is not set, would override all locale settings; $LC_CTYPE, which is also not set, would define only the charset; and $LANG is a fallback for anything not otherwise defined. This is the POSIX convention, I think it applies also on Mac OS X but I can't be sure.)
Version: unspecified → Trunk
(In reply to Tony Mechelynck [:tonymec] from comment #6)
> It looks like the texts are defined in UTF-8 but interpreted as if they were
> Latin1 (aka ISO_8859-1).

True.
 
> What is your system locale?

My locale is en_GB.UTF-8.

I should add that this happens for a WebExtension.  I have not tried with the old SDK-based add-on format.  I’m attaching a sample WebExtension that shows this problem on Mac.

Interestingly, and perhaps unrelated to this bug, if I use the "a` b\ c^ -&gt; à β ĉ  (http://lingvo.org/abctajpu)" description you used above, Firefox thinks the extension altogether is corrupt because of the \ (backslash) character.  I filed bug 1235766 separately about this.
(In reply to Andreas Tolfsen (:ato) from comment #7)
[…]
> Interestingly, and perhaps unrelated to this bug, if I use the "a` b\ c^
> -&gt; à β ĉ  (http://lingvo.org/abctajpu)" description you used above,
> Firefox thinks the extension altogether is corrupt because of the \
> (backslash) character.  I filed bug 1235766 separately about this.

It might well be related. In both cases, your extension is a WebExtension whose description comes from some manifest.json while mine is a "classical" extension whose description comes straight from its install.rdf. I'm updating the bug Summary about this difference which wasn't clear from comment #0.
Summary: Extension description on about:addons does not support unicode → WebExtension description on about:addons does not support unicode
Component: Add-ons Manager → WebExtensions
Flags: blocking-webextensions?
Summary: WebExtension description on about:addons does not support unicode → Manifest and locale files are not parsed as UTF-8
Assignee: nobody → kmaglione+bmo
Attachment #8706054 - Flags: review?(wmccloskey)
Attachment #8706054 - Flags: review?(wmccloskey) → review+
https://hg.mozilla.org/mozilla-central/rev/0ae26b71481a
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
I was able to reproduce this issue on Firefox 46.0a1 (2016-01-10) under Windows 10 64-bit.

Verified fixed using the following xpi https://addons.allizom.org/en-US/firefox/addon/unicode-description/  on Firefox 48.0a1 (2016-03-31) and Firefox 46 beta 6 (20160328182534) under Windows 10 64-bit, Mac OS X 10.11.2 and Ubuntu 12.04 32-bit.
Status: RESOLVED → VERIFIED
This still affects Fx 45 ESR, maybe the fix should be backported to esr45?
WebExtensions are officially pre-release (read: not supported) prior to 48.
(In reply to Kris Maglione [:kmag] from comment #15)
> WebExtensions are officially pre-release (read: not supported) prior to 48.

I'll go and bump the strict_min_version to 48.0, then. Thanks!
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: