Closed Bug 921169 Opened 11 years ago Closed 11 years ago

Implement secure DOM localization

Categories

(L20n :: HTML Bindings, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stas, Assigned: stas)

References

Details

Attachments

(2 files, 2 obsolete files)

We should make our DOM localization logic secure and sound for the 1.0 release.  The initial discussion took place in the mailing list:

https://groups.google.com/d/msg/mozilla.tools.l10n/9JvxgTUwIrk/70ja-BE55kQJ

Goals
-----

    1. Make it possible to localize content in HTML elements and
       attributes without forcing developers to split strings into pre-
       and post- parts (definitely a bad practice).  For instance, it
       should be possible for an <a> element to be a part of the
       translation.

    2. Make it possible for localizers to apply text-level semantics to
       the translations and make use of HTML entities.  For instance,
       it should be possible for a localizer to use an <sup> element in
       "M<sup>me</me>" (an abbreviation of French "Madame").


Constraints
-----------

    1. Make the whole system secure and don't trust translations by
       default allowing a safe set of attributes on HTML elements.  For
       instance, the localizer should not be able to add an onclick
       handler or overwrite the target of href or src without the
       developer or the localization engineer knowingly allowing it.

    2. Don't break the Web;  in particular, until we get more feedback
       and gather more data, we should not break any two-way bindings
       the third-party libraries might have set up on existing DOM
       nodes.
Attached patch WIP 1 (obsolete) — Splinter Review
This is an early work-in-progress implementation.  It:

- allows safe elements in translations,
- keeps original elements instead of replacing them,
- uses a whitelist for elements,

- doesn't check attributes against whitelist yet,
- doesn't copy/overlay attributes yet,
- doesn't support ITS translateRules for extending the whitelists,
- currently breaks language switching,
- probably doesn't handle errors well yet.

Gandalf, can you take a look and see if you like the general direction?

Michał, would you mind applying this locally and verifying that it works well 
with Angular?  It should keep the nodes for which Angular creates two-way 
bindings.
Attachment #810757 - Flags: feedback?(m.goleb+mozilla)
Attachment #810757 - Flags: feedback?(gandalf)
Attachment #810757 - Attachment is obsolete: true
Attachment #810757 - Flags: feedback?(m.goleb+mozilla)
Attachment #810757 - Flags: feedback?(gandalf)
I'm quite happy with what I ended up today:

- allows safe elements in translations,
- keeps original elements instead of replacing them,
- uses a whitelist for elements,
- uses a whitelist for attributes,
- copies and overlays attributes,
- works with lang switching and retranslation,
- should handle corner cases well.

- doesn't support ITS translateRules for extending the whitelists (I'd like to 
  fix this in a follow-up bug to not increase the complexity of this patch).

One important change is that I removed the whole l20nSourceNode approach, since 
it's not trivial to make it work with the paradigm of keeping original nodes in 
the DOM.  However, I don't think this is bad.  The patch doesn't support 
reordering anyways, so retranslation based on the actual node instead of 
L20nSourceNode can't break too much.  The only case I can think of is when 
language 1 adds a title attribute, and language 2 doesn't.  Then the title attr 
will stay on the node and will be present in the language 2 localization.  I'd 
prefer to fix this in a follow-up if it turns out to be a problem.  I do think 
it will be quite rare an edge-case.
Attachment #811184 - Flags: review?(gandalf)
Attachment #811184 - Flags: feedback?(m.goleb+mozilla)
Attachment #811184 - Attachment is obsolete: true
Attachment #811184 - Flags: review?(gandalf)
Attachment #811184 - Flags: feedback?(m.goleb+mozilla)
I realized one thing:  creating a dettached div or body and assigning the 
unsanitized translation to innerHTML may trigger HTTP requests or even scripts 
to run.  Sanitizing only after the HTML was parsed is too late.

For instance, the translation might look like this:

    <img src="http://evil.com/track.png">  

Scripts in <script> tags are not executed, but you can easily run arbitrary 
code otherwise.  For instance:

    <img src="missing.png" onerror="alert('pwnd!');">

The overlayElement will correclty sanitize these translations and the <img> nor 
the 'onerror' callback won't be present in the final result, but they will 
actually be parsed and run the moment when the unsanitized translation is 
assigned to innerHTML.

To combat this, we can 1) use DOMParser and create a new document our of every 
translation (overkill; also 10x slower) or 2) take advantage of the <template> 
element, which uses DocumentFragments internally.  It parses the HTML but keeps 
it inert, so no requests are made and no scripts are run when it's created.

Because it uses DocumentFragments, I can use XPath any more to match elements 
in translation to elements in the source HTML (see bug 402129).  I tried using 
CSS selectors instead, but unfortunately I'm only interested in immediate 
children (deeper descendants are taken care of recursively), and CSS doesn't 
make it easy (you'll recall you can't just do querySelector('> span'); you need 
the LHS of the child operator).  There is the Selector level 4 spec in the 
works with the :scope  pseudo-class [1], but the support is scarce (Chrome 
+ Firefox but only on the testign channels) and from my testing :scope also 
doesn't work well with DocumentFragments.

We could go the Sizzle route, which is to assign a temporary id to the context 
node and use querySelector on it [2].  But again, DocumentFragments can't have 
attributes.

So in the end I went for a simple manual approach where I calculate the index 
among siblings of the same type myself and then use it in reverse to match an 
element in the translation.  This works well.

I'd love to get some feedback on the whole approach.  I still need to make 
necessary changes to the documentation.  I wish we had a client-side test 
harness, but I don't want to block on this.


[1] http://www.w3.org/TR/2012/ED-selectors-api2-20121115/ 
[2] https://github.com/jquery/sizzle/blob/c50c710b6411db2cc6d77d3b982ac6625605ea63/src/sizzle.js#L248-L268
Attachment #811518 - Flags: review?(gandalf)
Attachment #811518 - Flags: feedback?(m.goleb+mozilla)
(In reply to Staś Małolepszy :stas from comment #4)
> Created attachment 811518 [details] [diff] [review]
> Use template elements to force inertness of innerHTML

Great, great work! I've just tested it (sorry for the delay, busy Friday at work) and it works with Angular perfectly! :)
Attachment #811518 - Flags: feedback?(m.goleb+mozilla) → feedback+
I'm not too optimistic about it being fixed, though, docco issues seems to linger without attention. We might have to fix it ourselves. (and docco is written in Literate CoffeScript, what a pain)
Sorry for the comment, wrong ticket. :)
This is the same patch as before, but applied to both bindings/l20n/html.js and 
bindings/l20n/gaia.js.
I ran b2gperf tests on Gandalf's l20n-master with and without the patch (settings app)

without patch:  median: 1213, mean: 1221, std: 35
with patch:     median: 1220, mean: 1229, std: 29

The localization files don't have much HTML in them, so there isn't much difference.  I added <em>Foo</em> to all 500 entities to see how the performance changes.

without patch:  no change, but broken; <em> is visible in textContent
with patch:     median: 1283, mean: 1286, std: 33
Comment on attachment 811518 [details] [diff] [review]
Use template elements to force inertness of innerHTML

Review of attachment 811518 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good. It's pretty complex, I'd love to make it very clear for the reader of html.js which part does he need to understand without dom overlays, but it works and performance is ok.

median of 30 iterations on Settings app:
without patch: 1068
with patch: 1076
Attachment #811518 - Flags: review?(gandalf) → review+
Blocks: 922573
Blocks: 922576
(In reply to Staś Małolepszy :stas from comment #4)
> To combat this, we can 1) use DOMParser and create a new document our of
> every 
> translation (overkill; also 10x slower) or 2) take advantage of the
> <template> 
> element, which uses DocumentFragments internally.  It parses the HTML but
> keeps 
> it inert, so no requests are made and no scripts are run when it's created.

The template element has poor browser support for now so I don't believe this is a viable solution. How about a sort-of polyfill for that, i.e. script element with custom type?
Blocks: 922577
Landed in https://github.com/l20n/l20n.js/commit/892b1507a446080e80efebd24534744174561671.

I added docs to https://github.com/l20n/l20n.js/blob/892b1507a446080e80efebd24534744174561671/docs/html.md#making-html-elements-localizable

I also filed three follow-ups: bug 922573, bug 922576 and bug 922577.

(In reply to Michał Gołębiowski from comment #11)
> The template element has poor browser support for now so I don't believe
> this is a viable solution. How about a sort-of polyfill for that, i.e.
> script element with custom type?

Would you mind filing  follow-up bug about this?
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
(In reply to Staś Małolepszy :stas from comment #12)
> (In reply to Michał Gołębiowski from comment #11)
> > The template element has poor browser support for now so I don't believe
> > this is a viable solution. How about a sort-of polyfill for that, i.e.
> > script element with custom type?
> 
> Would you mind filing  follow-up bug about this?

Sorry, I totally forgot about that until I tested in IE recently... :/ I reported bug 942594 about that.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: