Closed Bug 925579 Opened 11 years ago Closed 11 years ago

SecReview: L20n 1.0

Categories

(mozilla.org :: Security Assurance: Review Request, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: stas, Assigned: freddy)

References

Details

(Whiteboard: [completed secreview][Web])

Initial Questions:

Project/Feature Name: L20n 1.0
Tracking  ID:
Description:
L20n is a new localization file format and framework developed by the l10n drivers team for the client-side.  It allows localizers to put small bits of logic in the localization resources to codify the grammar of the language.  This prevents the logic to be hard-coded into the main source code where it would unnecessarily affect all other locales.  You can find some examples of this on my blog: http://www.informationisart.com/21/.

Localizers can add logic to the translations using simple data structures (individual translations can be either strings or dictionaries of values) and custom macros (localizers can write their own plural macro for instance, using a C-like expression syntax).

For more information on the L20n syntax, you can refer to http://l20n.org/learn/ and http://l20n.github.io/tinker/.

As we're nearing the 1.0 release (we're currently feature-frozen), I'd like to ask for some help evaluating the security of the current JavaScript implementation.  The code can be found on GitHub:  https://github.com/l20n/l20n.js.  If you clone this repo, take a look at examples/multilingual-dev.html which should just work when opened in a browser.  The translations for this example are defined in examples/locales/.

The core concept is that we cannot trust the localization resources.  So we need to parse them in a way which doesn't allow malicious actions.  To that end, we currently parse the localization resource into a JSON AST stored in memory which is then "compiled" into JavaScript functions and these functions disallow unknown actions.  So if we find "1 + 2" in a macro, we'll tokenize it, parse and compile it into:

  function addOperator(left, right) {
    if ((typeof left !== 'number' || typeof right !== 'number') &&
        (typeof left !== 'string' || typeof right !== 'string')) {
      throw new RuntimeError('The + operator takes two numbers or two strings');
    }
    return left + right;
  };

https://github.com/l20n/l20n.js/blob/6196438ab3fc59bcf8ac7676c31df9bd5667f8d2/lib/l20n/compiler.js#L753-L760

I see three major areas which could be abused for an attack by a malevolent localizer.  I'll try to summarize my concerns below but there might be something I've missed.  My goal is to give you a rough idea of what L20n does.


1. Localization logic

The localizers can define custom macros (see http://l20n.org/learn/macros/) which should not be able to run arbitrary code from inside of the localization resource, nor render the browser unresponsive.

We support string interpolation and it should be impossible to try to programmatically produce huge strings which would make the system go out of memory.  See bug 803931 for some initial work done to prevent this and http://l20n.org/learn/string-values/ to see how to build interpolated strings.

You can interpolate other translation strings but also variables provided by developers to the localization context (e.g. number of unread email messages).  See http://l20n.org/learn/context-data/.  It should be impossible for localizers to tinker with these variables from inside of the localization resource.

Lastly, since all of this is implemented in JavaScript, it should be impossible to access built-in properties and methods of objects and their prototypes (length, watch etc).  See bug 815962 and bug 883664.


2. Resource imports

We allow one localization resource to import another one with an import() statement.  All resources are downloaded using XHR so they're limited to the current domain.  We disallow recursive or too deeply nested imports.  Are there any other concerns that we should be aware of?  Both security and privacy related.


3. HTML localization

We allow some HTML markup in translations.  See bug 921169 and bug 922576.  There's a whitelist of text-level tags like <em> and <strong>, and a whitelist of attributes based on http://www.w3.org/html/wg/drafts/html/master/dom.html#attr-translate.

We use the <template> element to create an inert DOM out of the translation.  We then sanitize it using the above whitelists and insert into the main DOM.  This hopefully prevents things like:

  <img src="missing.png" onerror="alert('pwnd!');">
Additional Information:
I will be in San Francisco the whole week of October 14th and I'll be happy to meet in person to provide more information.  Gandalf is another person who can be reached and he's permanently based out of SF.
Key Initiative: Firefox OS
Release Date: 2013-10-31
Project Status: ready
Mozilla Data: No
Mozilla Related: Integration with Firefox OS is planned
Separate Party: No
Component: Project Review → Security Assurance: Review Request
Summary: L20n 1.0 → SecReview: L20n 1.0
Whiteboard: [pending secreview]
(In reply to Staś Małolepszy :stas from comment #0)
> 3. HTML localization
> 
> We allow some HTML markup in translations.  See bug 921169 and bug 922576. 
> There's a whitelist of text-level tags like <em> and <strong>, and a
> whitelist of attributes based on
> http://www.w3.org/html/wg/drafts/html/master/dom.html#attr-translate.

See https://groups.google.com/forum/#!topic/mozilla.tools.l10n/9JvxgTUwIrk for the discussion on the whitelists.
Group: mozilla-corporation-confidential
Whiteboard: [pending secreview] → [triage needed]
Assignee: nobody → fbraun
Whiteboard: [triage needed]
The part that whitelists HTML tags and attributes looks good. I checked the is*Allowed() functions, their use and the whitelist object defined on top of the gaia.js file.
Depends on: 931814
If I'm not mistaken, the HTML sanitization happens at the very end of the localization process, i.e. right before the content is inserted into the DOM. This is great as it ensures little to no cause for parsing mismatches (i.e. l20n thinks something is harmless because it lacks understanding of a certain feature).

I have tried things like this (first greater-than sign for indenting, not syntax):
> <foo "<img src='x'><pre">
> <bar ">check 123 </pre>">
> <kthxbye "Freddy prints a test message here: {{ foo }} {{ bar }}">

So it looks pretty good :)
Depends on: 935017
Please mark as fixed/resolved once all blockers are fixed/resolved :)
Whiteboard: [completed secreview][Web]
Blockers fixed, thank you so much for the help!  We should be able to get the 1.0 RC out this week which is great.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.