Closed Bug 548206 (DirAuto) Opened 14 years ago Closed 12 years ago

Implement the auto value for the HTML dir attribute

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: ehsan.akhgari, Assigned: smontagu)

References

(Blocks 2 open bugs, )

Details

(Keywords: dev-doc-complete, rtl, Whiteboard: [parity-webkit][wanted2.1?])

Attachments

(9 files, 9 obsolete files)

19.66 KB, patch
Details | Diff | Splinter Review
316 bytes, text/html
Details
237 bytes, text/html
Details
21.94 KB, patch
Details | Diff | Splinter Review
922 bytes, patch
Details | Diff | Splinter Review
240.01 KB, patch
smontagu
: review+
Details | Diff | Splinter Review
51.16 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
21.89 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
66.45 KB, patch
ehsan.akhgari
: review+
peterv
: review+
Details | Diff | Splinter Review
We need to support a mechanism to support automatic direction detection for web content which needs to display pieces of text with an unknown source (like comments in a weblog).  Our current thinking is to introduce the auto value for the dir element and make it so that it resolves to either ltr or rtl based on some heuristic.

Fantasai suggested this heuristic:

  Look at the first 63 alphabet characters of the text inside the element.  If it contains any strong RTL characters, then set the direction to rtl.  Otherwise, set it ltr.

Fantasai is going to start a discussion about this in the w3c i18n group, and will probably update this bug with the results.
I experimented with an implementation for this some years ago. I probably still have the patch in an old tree.
Attached patch Old patchSplinter Review
This is my old patch merged to current trunk.

Two points to note: 

* What I implemented was in fact not HTML "dir=auto" but CSS "direction: auto". I have now added the mapping from HTML to CSS in html.css and nsGenericHTMLElement.cpp

* The patch uses the UBA "first strong character" algorithm to determine the direction (which we already have code for in nsBidi.cpp).

This is about three years old, so I want to "review" it myself before asking anybody else to review it (but feedback is welcome, of course), and also reconsider it in the light of the new W3 bidi proposal. If I'm not mistaken, the patch only affects block-level elements. and the proposal doesn't seem to address the block/inline distinction at all -- all this requires further thought.
Another interesting way to test this is to add the attribute to sites with userContent.css. For example:

@-moz-document url-prefix("http://www.facebook.com") { 
 .comment_actual_text { 
   direction: auto;
   display: inline-block !important;
 }

 .UIStoryAttachment_Copy { 
   direction: auto;
 }
}
When we discussed this with Ehsan and jkew, we thought it would be best for the CSS 'direction' property to remain ltr|rtl only, and for HTML dir=auto to resolve to either ltr or rtl. That would allow CSS selectors :rtl and :ltr for directionally styling content.
I'm not sure that I buy that. Firstly, it doesn't address XUL (does XUL have a dir attribute?) Many of the use cases for auto-direction are in XUL (e.g. tab titles and live bookmarks). Secondly, :ltr and :rtl selectors are already not very useful because they don't work for elements with inherited direction.
(In reply to comment #7)
> I'm not sure that I buy that. Firstly, it doesn't address XUL (does XUL have a
> dir attribute?)

We could make XUL have a dir attribute.

> Secondly, :ltr and :rtl selectors are already not
> very useful because they don't work for elements with inherited direction.

The idea of the selectors is that they would work on elements with inherited direction, because the direction would inherit within the markup language.
... and thirdly, how would it work for elements whose text content changes, either programatically or by user action?

(In reply to comment #8)
> The idea of the selectors is that they would work on elements with inherited
> direction, because the direction would inherit within the markup language.

I don't think I understand well enough from the brief references here exactly what is being proposed for these selectors, so I will come back to this later.
:ltr would match any element whose directionality is left-to-right according to the rules of the markup language that the element is in.  HTML says the 'dir' attribute inherits, so an element whose nearest ancestor-or-self with a 'dir' attribute has dir="ltr" has ltr directionality according to HTML.

If HTML had dir="auto", HTML would have rules for assigning ltr or rtl directionality to the element (and its descendants that inherit that directionality; they wouldn't inherit the 'auto').


Doing it this way has the advantage of allowing use of :ltr and :rtl as CSS selectors, which is a possible alternative to adding lots of *-start and *-end properties (and can do things that they can't do).  When directionality is specified in CSS rather than in the markup, then CSS selectors can't match on the document language's concept of directionality.
(In reply to comment #7)
> Firstly, it doesn't address XUL

FWIW, XUL has a very different concept of directionality.  XUL has a locale direction, which is by default rtl for ar, fa and he, and ltr for all other locales.  XUL itself doesn't have any dir attribute in markup, but like dbaron said, we could add one to it.

Firefox uses an attribute called chromedir in some places which helps it tie specific CSS properties to the elements.  But because that is just an arbitrary attribute, it has no concept of inheritance.  If we had |direction: auto| in CSS, then we would probably be able to use the existing chromedir mechanism with it, but I'm not a XUL expert anyway.  CCing Neil to see what he thinks here.
The already existing dir attribute in xul relates to box direction and is only marginally related to rtl use.

chromedir isn't used anymore. localedir on a <window> can be used to override the default ui direction (not specifically the text direction). localedir might be able to be extended to apply to other elements as well.
Blocks: 260445
(In reply to comment #0)
> Fantasai suggested this heuristic:
> 
>   Look at the first 63 alphabet characters of the text inside the element.  If
> it contains any strong RTL characters, then set the direction to rtl. 
> Otherwise, set it ltr.

I agree with Aharon Lanin's response to this at http://lists.w3.org/Archives/Public/public-i18n-bidi/2010JanMar/0020.html:

'This is basically the any-RTL algorithm, and I think is generally less
useful than either first-strong or word-count. It fails on casual LTR text
"peppered" with some RTL words (e.g. a chat between expats from RTL
countries), as well as on scholarly LTR text that uses some RTL words for
precision (e.g. a discussion on biblical topics).'
(In reply to comment #13)
> (In reply to comment #0)
> > Fantasai suggested this heuristic:
> > 
> >   Look at the first 63 alphabet characters of the text inside the element.  If
> > it contains any strong RTL characters, then set the direction to rtl. 
> > Otherwise, set it ltr.
> 
> I agree with Aharon Lanin's response to this at
> http://lists.w3.org/Archives/Public/public-i18n-bidi/2010JanMar/0020.html:
> 
> 'This is basically the any-RTL algorithm, and I think is generally less
> useful than either first-strong or word-count. It fails on casual LTR text
> "peppered" with some RTL words (e.g. a chat between expats from RTL
> countries), as well as on scholarly LTR text that uses some RTL words for
> precision (e.g. a discussion on biblical topics).'

One thing we should keep in mind is that any algorithm will fail for some cases.  I think this algorithm has the least number of failures in real-world applications, and it's efficient in that it doesn't need to inspect the entire contents of text nodes.
Blocks: 542909
You can use character-count on the first 64 characters, and that will give you results similar to word-count, except it doesn't need to inspect the entire text and it doesn't rely on having a concept of "words". If you want, you can even skew it towards RTL by considering e.g. 30% or 40% RTL to resolve to RTL instead of 50%.
Filed bug 562169 on the :rtl / :ltr selector idea.
Blocks: 347502
As of Wednesday 9th June, here are the new values proposed for the dir attribute and associated rules:

The values for dir will also include uba, auto, and normal, and the values for unicode-bidi, will also include uba.

         1. the default dir for all elements is normal, with the exception of block elements whose parent is uba. These inherit uba.

         2. elements with dir=normal have the same resolved direction (both the internal HTML “property” used for CSS purposes and the actual CSS property) as the parent element. It also sets the unicode-bidi CSS property to normal (unless ubi is explicitly on for that element). The primary purpose for explicitly stating dir=normal is to break dir=uba inheritance from the parent.

         3. dir=uba sets the resolved direction (as defined above) of the element according to the UBA applied to its textual content. The textual content is the depth-first traversal of all text nodes (even if they have an explicit dir).

         4. In the application of the UBA to textual content, if the text contains no characters of the bidi classes L, AL, or R, the resolved direction of the text is inherited.

         5. dir=uba sets the unicode-bidi CSS property to uba.

         6. The base directionality of a UBA paragraph (which is distinct from CSS direction, which it does not have) whose containing block element has unicode-bidi:uba is set according to the paragraph’s content using the UBA. A UBA paragraph’s lines’ alignment is determined by the paragraph’s base directionality when the text-align of the containing block element is start or end.

         7. To clarify, when an inline element has dir=uba, its children do not inherit dir=uba, but do inherit the resolved direction of the inline element.

         8. dir=uba implies ubi by default. If ubi is explicitly off on this element, the unicode-bidi value is “uba embed”. Otherwise, unicode-bidi is “uba isolate”.

         9. TBD: what happens in <textarea> when the user sets an explicit direction via browser UI, for all dir values.

        10. auto sets the CSS direction to either ltr or rtl by a mechanism TBD.
I am working on a patch to be able to experiment with these new settings. The patch is currently no more advanced than Simon's old patch. So, I will attach it to this bug when it has a little more meat to it. I am facing one issue I am not sure about:

1. The new 'dir' values do not have directly corresponding css 'direction' values. Currently, the layout code runs off the css values. Internally the code gets the attribute, converts it to its equivalent css, and then uses the css values to calculate the internal blockFrame and context direction for GetStyleVisibility()->mDirection.

Now, when Gecko resolves the bidi of a layout in nsBidiPresUtils::Resolve I need to know the setting of the dir attribute as well as the CSS direction and unicode-bidi. If I can know the attribute value at the time of nsRuleNode::ComputeVisibilityData I can store this and use it in Resolve. Is that possible and the right way to do this?
Probably by coming up with a new -moz CSS property (which need not have parser support) and mapping your new dir values into it, then using that CSS property in layout.
In some cases what you want may depend on the desired inheritance behavior, though.  Depending on what inheritance behavior you want, you may want to do the mapping before it ends up in CSS.

The inheritance rules in comment 17 look complicated; I haven't digested them yet.
The CSS part of the proposal is already spec'd in
  http://dev.w3.org/csswg/css3-text-layout/#unicode-bidi
under the value name 'plaintext'. (Whoever's rewriting bidi resolution to deal with perf issues should look into that, since it allows sub-portions of a block to have different base directions.)

The HTML part is more complicated:

dir=rtl|ltr|auto can be handled by mapping to 'direction' on the element (in the 'auto' case, first resolving to either ltr or rtl and then doing the mapping). This is relatively straightforward.

dir=uba, however requires an inheritance mechanism in the HTML.
  1. Each HTML block-level child of a dir=uba block-level element must
     inherit the uba value. (Note that HTML inline-level elements do
     not participate in this inheritance scheme.)
  2. Then each element with a dir=uba value, whether explicit or inherited,
     must resolve it to either ltr or ltr, and
  3. translate that direction to a CSS 'direction' declaration on the
     element plus a CSS 'unicode-bidi: plaintext' declaration on the element.

A complication is elements with dir=uba that contain no direction-determining characters. If we are not implementing :rtl or :ltr selectors, this can be handled by posting 'direction: inherit' on that element in place of 'direction: ltr|rtl'.

However, if we want :rtl or :ltr to match, then we need a second pass through the HTML element tree that performs the inheritance of the resolved [dir] value and posts the correct CSS 'direction' declaration. The nasty case to consider is the one drawn up by Mark Davis: A two branch tree topped by dir=uba, with the first branch containing only neutral characters and the direction resolved halfway into the second branch.
Sorry, uba is equivalent to 'unicode-bidi: plaintext isolate', not 'unicode-bidi: plaintext'.
Aaaand by "is equivalent to" I meant "sets", since it is not equivalent...
I have put together a patch for dir=bda. This is a bit of a hack - e.g. I am explicitly setting styles and attributes - but it does let you experiment with this feature and gives a good idea of what is involved with patching gecko to support this feature. I list the issue I have encountered here: http://ironymark.diwan.com/2010/06/hacking-diruba/
Blocks: html5bidi
Blocks: 614535
Blocks: 627798
Whiteboard: [parity-webkit][wanted2.1?]
Blocks: 652550
What is the status of this bug ?
Do you have an expected date/release for its resolution ?

I would be glad to help if you need any assistance.
Blocks: 647755
Depends on: 662288
I still don't really grok how this needs to happen. Aharon pointed out in bug 562169 comment 13 that HTML5 defines the concept of element directionality (http://dev.w3.org/html5/spec/Overview.html#the-directionality), which is always ltr or rtl (even when dir=auto), and an algorithm to set it.

Things that are still unclear to me include but are not limited to:
 When and where we should be performing that algorithm?
 How do we account for dynamic changes to the content? In theory any change to the content of an element with dir=auto could change the directionality of the element, but in the most common use case (typing at the end of a textarea) there will be no change, and it will probably be bad for performance to run the directionality algorithm after every key press.
 As Adil pointed out, mapping values of dir to CSS unicode-bidi is problematic, because dir inherits but unicode-bidi does not.
(In reply to comment #26)
> I still don't really grok how this needs to happen. Aharon pointed out in
> bug 562169 comment 13 that HTML5 defines the concept of element
> directionality
> (http://dev.w3.org/html5/spec/Overview.html#the-directionality), which is
> always ltr or rtl (even when dir=auto), and an algorithm to set it.
> 
> Things that are still unclear to me include but are not limited to:
>  When and where we should be performing that algorithm?
>  How do we account for dynamic changes to the content? In theory any change
> to the content of an element with dir=auto could change the directionality
> of the element, but in the most common use case (typing at the end of a
> textarea) there will be no change, and it will probably be bad for
> performance to run the directionality algorithm after every key press.

I know nothing about Mozilla internals, but I would presume that a good optimization would be to keep for each element with dir=auto an index of some sort to the "determining" character, which is the current first character with strong direction in the in-order traversal of its text node descendants (except for those excluded as specified). The direction has to be recalculated only when a change in its descendant text nodes takes place at or before that index, which should be very rare.

Of course the whole trick is in the nature of this "index". Perhaps it could take the form of an array of child indexes defining a path to the text node, and then an index into the text node. Doing a less-than-or-equal-to comparison of two such indexes  is easy. When a change takes place in a text node, the index of the change can be calculated working backwards, going up in the DOM until hitting a script or style or bdi element, or an element with an explicit dir attribute. If this element is a script or style element or an element with dir other than auto (note that by default a bdi element's dir is auto), nothing needs to be done. Otherwise, the index collected for the change is compared with the element's determining character index. If the former is <= the latter, the element's directionality is recalculated.

There also must be an easy way to short-circuit the collection of the change index by knowing that it would terminate in a script or style element or an element with dir other than auto.

>  As Adil pointed out, mapping values of dir to CSS unicode-bidi is
> problematic, because dir inherits but unicode-bidi does not.

The dir attribute does not and never has had inheritance. It is only the CSS direction property (and now the HTML5 directionality) that inherit.

I can prove it to you. In HTML 4.01 / CSS2.1, the default value for unicode-bidi is "normal" (except for the block element's where it is by default embed, and for the bdo element, where it is by default bidi-override). However, when an inline element (other than bdo) has a dir attribute, unicode-bidi is supposed to become embed. If dir inherited, then there would be no elements with unicode-bidi:normal.

So, in HTML 4.01 / CSS2.1, the effect of the dir attribute is limited to its changing the default value for the element's unicode-bidi and direction CSS properties. Although direction inherits, it does not have any effect on layout except for elements that have a non-normal unicode-bidi, so dir's effect on unicode-bidi is crucial.

In HTML5 / CSS3, it is basically the same thing. The effect of the dir attribute is limited to its effects on the element's HTML5 directionality, and its changing the default value for the element's unicode-bidi and direction CSS properties. Namely, direction becomes set to the element's directionality, and unicode-bidi is set as follows (as taken form the HTML5 default stylesheet):

If the element is bdo, unicode-bidi becomes either "override isolate" if dir=auto or just override otherwise.

Otherwise, if the element is pre or textarea, unicode-bidi becomes either "plaintext" if dir=auto or "embed" otherwise.

Otherwise, if dir=auto or if the element is bdi or output or one of the old block elements, unicode-bidi becomes isolate.

Otherwise, unicode-bidi become embed.

Once again, although direction inherits, it does not have any effect on bidi ordering except in elements that have a non-normal unicode-bidi, so dir's effect on unicode-bidi is crucial.
Sorry for being commenting with not helpful information, but I have a theoretical question which might have some impact on this issue.

Given that students learn web development in schools and future web browsers will have html{direction:auto} in user-agent CSS, how do we make sure that the usage of direction:rtl won't be removed from their studies which could make it easier for the teacher to teach his/her students HTML, but at the same time these sites will be more problematic, will be broken in current browsers and could have some impact in the loading time due to the guessing algorithm?
Nobody should be teaching the 'direction' property to students. They should teach the [dir] HTML attribute only. Wrt using [dir=auto] on the root element, see the note on [dir=auto] in the HTML5 spec: it is not encouraged. </ot>
(In reply to comment #29)
> Nobody should be teaching the 'direction' property to students. They should
> teach the [dir] HTML attribute only. Wrt using [dir=auto] on the root
> element, see the note on [dir=auto] in the HTML5 spec: it is not encouraged.
> </ot>

Also:

- There is no such thing as direction:auto. With dir=auto, the direction CSS property is set to either ltr or rtl.

- Setting dir to auto on the html element will base that element's directionality (and CSS direction property) on the first character in a descendant text node (with certain exceptions) that is strongly LTR or RTL. (See the spec in http://dev.w3.org/html5/spec/Overview.html#the-dir-attribute.) That is all it will do. It will *not* flip a descendant element's direction the other way based on its content (unless that element has an explicit dir=auto of its own). The recommended way to use dir=auto is on an element tightly wrapping a single piece of content of unknown direction, and nothing else with the possible exception of children with their own dir attributes.
An example of good usage:
<div>also see <a dir="auto" href="...">ARTICLE TITLE</a></div>

An example of bad usage:
<div dir="auto">also see <a href="...">ARTICLE TITLE</a></div>
The current HTML5 spec for dir=auto is based on the element's descendant text nodes. Unfortunately, this does not give sufficient consideration to <input> and <textarea> nodes, for which dir=auto was always envisioned to work based on their current value, i.e. to potentially change the element's directionality when the user types into them. I have filed bug http://www.w3.org/Bugs/Public/show_bug.cgi?id=13482 on HTML5, asking that the spec be changed such that:

- For input and textarea elements with dir=auto, the directionality should be determined not by the element's descendants, but by the element's current value, and should change as the latter changes.

- For other elements, the text node descendants under a descendant textarea element should be excluded, just like those in script and style elements.
Blocks: 675943
I just noticed this bug...

Please note I have a direction setting heuristic implemented in BiDi Mail UI, which might be of interest here:
http://www.mozdev.org/source/browse/bidiui/source/tbird/chrome/content/bidimailpack/bidimailpack-common.js?rev=1.74
function BiDiMailUI.directionCheck
Line 533 and on
(In reply to Eyal Rozenberg from comment #33)
> I just noticed this bug...
> 
> Please note I have a direction setting heuristic implemented in BiDi Mail
> UI, which might be of interest here:
> http://www.mozdev.org/source/browse/bidiui/source/tbird/chrome/content/
> bidimailpack/bidimailpack-common.js?rev=1.74
> function BiDiMailUI.directionCheck
> Line 533 and on

The first-strong heuristic is certainly not ideal. Google rarely uses it, preferring either the any-RTL heuristic or a word-count heuristic (over 40% of words are RTL -> it's RTL). However, no known algorithm is perfect, and first-strong has the advantages of being very fast for long strings, and fairly easy for the user to figure out and  even control (provided LRM and RLM ever get onto the RTL keyboards - which is apparently in the works for the Hebrew keyboard at least). The HTML5 editor / WG decided to stick with first-strong and did not accept the proposal to add some way for the page to choose the algorithm. First-strong is now the spec (http://dev.w3.org/html5/spec/Overview.html#the-dir-attribute), and if you think that dir=auto should do something else, you need to take it up there.
As a regular user of both RTL and LTR text, I agree with the first-strong test. When providing support, it is easy to explain to people. Simple to understand is a sign of a good engineering decision! Additionally, only in convoluted speech would one ever start a sentence in an RTL language with LTR text, or vice versa. Nobody would ever speak or type like that, it doesn't even sound right.

I see that the BidiUi extension uses the first-strong test. In years of using that extension, it works _every_ time. That is a consequence of how people write. In fact, on my page [1] explaining RTL, I give this quote:
"For writing Email, the apparent directionality for the entire document can be set with Ctrl-Shift-X. For reading mail, I recommend the BiDi Mail UI, which gets it right so often that I don't even know how to change the directionality because I've never needed it."

First-strong is a proven real-world successful algorithm that fits naturally with the way people use language.


[1] http://dotancohen.com/howto/rtl_right_to_left.html
Note that the dir attribute now operates on the value of text form controls: http://www.w3.org/Bugs/Public/show_bug.cgi?id=13482
Wasn't this implemented in bug 662288? Can we close this bug now?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #37)
> Wasn't this implemented in bug 662288? Can we close this bug now?

Only for textarea and input controls, so no
(In reply to Simon Montagu from comment #38)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #37)
> > Wasn't this implemented in bug 662288? Can we close this bug now?
> 
> Only for textarea and input controls, so no

1. Did the fixing of bug 662288 include giving <textarea dir=auto> unicode-bidi:plaintext by default? But without setting its direction property?

2. Did the fixing of bug 662288 include giving <input dir=auto> unicode-bidi:plaintext by default? That was not part of the spec, and, as far as I know, does not add any advantage over setting its direction property, which it is supposed to do.
(In reply to Aharon (Vladimir) Lanin from comment #39)

Firstly, "textarea and input controls" was an error (a thinko rather than a typo), I should have written "<textarea> and <pre> elements".

> 1. Did the fixing of bug 662288 include giving <textarea dir=auto>
> unicode-bidi:plaintext by default? But without setting its direction
> property?

When you say "its direction property", do you mean the CSS "direction" property or the HTML5 "directionality"? I don't see from the HTML5 spec that setting dir=auto is supposed to affect the CSS property, and I'm not 100% sure that it should (although making it do so would have prevented bug 698291, and similar bugs that may exist but not have been reported yet). The HTML5 directionality feature will be implemented in this bug.
Both. CSS 'direction' should be keying off the HTML5 "directionality", i.e. it is set via

  :dir(rtl) { direction: rtl; }
  :dir(ltr) { direction: ltr; }

IIRC this is why dir=auto cannot be handled at the CSS layer alone: it not only sets unicode-bidi: plaintext, it also sets the directionality (and 'direction') of the element.
(In reply to fantasai from comment #41)
> Both. CSS 'direction' should be keying off the HTML5 "directionality"

Right.

> IIRC this is why dir=auto cannot be handled at the CSS layer alone: it not
> only sets unicode-bidi: plaintext,

(only for <textarea> and <pre>)

> it also sets the directionality (and
> 'direction') of the element.

Right. That's how it works for most elements. It may also have some effects on <textarea> and <pre>, e.g. on text-align:start and text-align:end, unless plaintext takes over there as well (which was left unspecified).
Assigned to Simon Montagu
Assignee: nobody → smontagu
Status: NEW → ASSIGNED
I have written a document to explain how I think we should implement support for dir=auto efficiently <https://etherpad.mozilla.org/dir-auto>.

Unfortunately, it's a lot more complicated than I would have liked, but I don't see any other way around the complexity, especially if we want to make sure that editing operations will be relatively efficient in the face of dir=auto.

I would appreciate if people can provide feedback on this proposal, especially see if my algorithms miss some case, or if they can be simplified in any way.
Depends on: 726420
The spec is sort of weird on the directionality of <textarea dir=auto> when its value does not have a strong character.  It currently says that the directionality must be LTR but I think the directionality should be resolved based on the directionality of the parent.  I filed a spec bug about this: https://www.w3.org/Bugs/Public/show_bug.cgi?id=16564
(In reply to Ehsan Akhgari [:ehsan] from comment #45)
> The spec is sort of weird on the directionality of <textarea dir=auto> when
> its value does not have a strong character.  It currently says that the
> directionality must be LTR but I think the directionality should be resolved
> based on the directionality of the parent.  I filed a spec bug about this:
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=16564

Please note that unicode-bidi:plaintext is what is actually making <textarea dir=auto> behave the way it does in Mozilla (given that dir=auto in the general case is not yet implemented). It used to be implemented the way you wanted, and was changed to conform to the (CSS) spec in response to a bug I filed about that, https://bugzilla.mozilla.org/show_bug.cgi?id=726420.

While I do *not* agree that all-neutral paragraphs should follow the inherited direction, and have responded to that effect on https://www.w3.org/Bugs/Public/show_bug.cgi?id=16564, I do see the caret behavior you describe there for <textarea dir=auto> as problematic. However, it can be fixed by changing the (unicode-bidi:plaintext) definition for *empty* paragraphs, without running into the problems inherent in doing so for all-neutral paragraphs generally. Fantasai seems to be ok with changing the CSS spec to make empty paragraphs follow the direction style; an even better approach might be to make them follow the direction of the preceding paragraph (if any). If you want to pursue this course, start a thread on www-style and/or file a new bug on Mozilla (cc-ing me).
That's generally fine by me, except that as I noted in <https://www.w3.org/Bugs/Public/show_bug.cgi?id=16564>, the HTML directionality of the textarea itself does matter as well, and I'm not sure how fixing this at the CSS spec level is going to take care of those cases.
Blocks: 744659
Assignee: smontagu → nobody
Depends on: 562169
Target Milestone: --- → mozilla14
Version: Trunk → Other Branch
Component: Layout → Layout: Text
QA Contact: layout → layout.fonts-and-text
Version: Other Branch → Trunk
Attached patch Tests for this bug (obsolete) — Splinter Review
These tests are adapted from the tests at http://www.w3.org/International/tests/html-css/bidi-html5/results-bidi-html5#dirauto
Attached patch w-i-p patch (obsolete) — Splinter Review
This patch passes all the tests in the previous attachments. It doesn't include all the stuff for "handling dynamic changes" in https://etherpad.mozilla.org/dir-auto
Attached patch Dynamic tests (obsolete) — Splinter Review
Assignee: nobody → smontagu
Attached patch More tests for changing text (obsolete) — Splinter Review
Attached patch w-i-p patch, v.2 (obsolete) — Splinter Review
This still needs some polishing, and there maybe some corner cases that it doesn't handle, but I'd like some feedback before continuing.

There are tryserver builds with the patch available at ftp://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-bc5c2cb2639b/
Attachment #625028 - Attachment is obsolete: true
Attachment #633257 - Flags: feedback?(ehsan)
I'm off tomorrow, so I'll try to provide you with feedback early next week.  Does that sound good?
That will be great. It doesn't have to be too detailed at this stage either, I'm looking for a relatively high-level overview kind of feed back that this is heading in the right direction.
Comment on attachment 633257 [details] [diff] [review]
w-i-p patch, v.2

Review of attachment 633257 [details] [diff] [review]:
-----------------------------------------------------------------

(Note that my feedback here is very high-level.  I didn't read the implementation of many functions, and I just assumed that they work correctly for now...)

Firstly, this is obviously very good complex.  We have a good reason for this to be complex (performance) and we have a fairly good design for how this is supposed to work.  I know we usually don't do this, but I'd like the information in <https://etherpad.mozilla.org/dir-auto> to live in the tree somewhere, and maybe linked to from nsINode.h or something.

Overall, I like this patch very much.  I think it reflects what I had in mind very well, and is very similar to how I would write the patch, only a lot better.  :-)

If you have questions about my comments below, please don't hesitate to ask.

::: content/base/public/nsINode.h
@@ +164,5 @@
> +  // Set if the node has dir=auto
> +  NODE_HAS_DIR_AUTO             = 0x00400000U,
> +
> +  // Set if a node in the node's parent chain has dir=auto
> +  NODE_PARENT_HAS_DIR_AUTO      = 0x00800000U,

Exactly what I had in mind!

@@ +987,5 @@
> +      UnsetFlags(NODE_PARENT_HAS_DIR_AUTO);
> +    }
> +  }
> +
> +  bool HasDirAutoInDNA() const

Nit: DNA?

::: content/base/src/nsContentUtils.cpp
@@ +182,5 @@
>  #include "nsICharsetDetectionObserver.h"
>  #include "nsIPlatformCharset.h"
> +#include "nsIDOMNodeFilter.h"
> +#include "nsTreeWalker.h"
> +#include "nsIDOMMutationEvent.h"

I completely ignored everything in this file, as they all seemed like implementation helpers.

::: content/base/src/nsGenericDOMDataNode.cpp
@@ +300,5 @@
>      };
>      nsNodeUtils::CharacterDataWillChange(this, &info);
>    }
>  
> +  if (IsNodeOfType(nsINode::eTEXT)) {

This should probably be optimized a bit further.  For example, we may be able to tell whether changes happen after a first strong character, and in that case we can skip this whole block.  Also this doesn't seem to account for adjacent text nodes, which is a bug (and I hope we're going to have test cases for that).

::: content/base/src/nsNodeUtils.cpp
@@ +616,5 @@
> +nsNodeUtils::AddEntryToTextNodeDirectionalityMap(nsINode* aTextNode,
> +                                                 nsINode* aElement)
> +{
> +  NS_ASSERTION(aTextNode->IsNodeOfType(nsINode::eTEXT), "Must be a text node");
> +  nsTHashtable<nsPtrHashKey<nsINode> > *map;

Our hashtable classes are horrible.

It would be great if you could create a class called TextDirectionalityMap, put the nsTHashtable inside it, and provide easy to read methods on that class.  Objects of that class would be stored as node properties, and the code in this file (and elsewhere) would hopefully be a lot more readable.

@@ +644,5 @@
> +{
> +  NS_ASSERTION(aEntry->GetKey()->IsElement(), "Must be an Element");
> +  nsNodeUtils::AttributeWillChange(aEntry->GetKey()->AsElement(),
> +                                   kNameSpaceID_None, nsGkAtoms::dir,
> +                                   nsIDOMMutationEvent::MODIFICATION);

Not sure whether this is supposed to raise mutation events or not.

::: content/base/src/nsTextNode.cpp
@@ +173,5 @@
> +
> +  nsDirectionality dir = nsContentUtils::GetDirectionFromText(&mText);
> +  if (dir != eDir_NotSet) {
> +    nsContentUtils::SetAncestorDirectionIfAuto(this, aParent, dir);
> +  }

Food for thought: is it better to first check to see if we're dealing with auto directionality and then get the text direction?

::: content/html/content/src/nsHTMLInputElement.cpp
@@ +711,5 @@
>          CancelImageRequests(aNotify);
>        }
>      } else if (aNotify && aName == nsGkAtoms::disabled) {
>        mDisabledChanged = true;
> +    } else if (aName == nsGkAtoms::dir && 

Should nsHTMLTextAreaElement also receive the handling similar to the changes in this file?

::: layout/style/nsCSSRuleProcessor.cpp
@@ +1963,5 @@
>            }
> +          if ((elementDir == eDir_LTR && !dirString.EqualsLiteral("ltr")) ||
> +              (elementDir == eDir_RTL && !dirString.EqualsLiteral("rtl"))) {
> +            return false;
> +          }

This part I don't understand at all.
Attachment #633257 - Flags: feedback?(ehsan) → feedback+
(In reply to Ehsan Akhgari [:ehsan] from comment #56)
> > +  bool HasDirAutoInDNA() const
> 
> Nit: DNA?

HasDirAutoInDNA is meant to encapsulate the concept of "either this node has dir=auto or its parent does". If that's not obvious from the name, I need to think of a better one.

> > +  if (IsNodeOfType(nsINode::eTEXT)) {
> 
> This should probably be optimized a bit further.  For example, we may be
> able to tell whether changes happen after a first strong character, and in
> that case we can skip this whole block.

There is a test for that below - nsContentUtils::GetDirectionFromText has an outparam which returns the position of the first strong character. I'm not sure whether it's more efficient to test for that first or for the HAS_DIR_AUTO flags.

> Also this doesn't seem to account
> for adjacent text nodes, which is a bug (and I hope we're going to have test
> cases for that).

Can you expand on this?

> Our hashtable classes are horrible.

Yes!
 
> It would be great if you could create a class called TextDirectionalityMap,
> put the nsTHashtable inside it, and provide easy to read methods on that
> class.  Objects of that class would be stored as node properties, and the
> code in this file (and elsewhere) would hopefully be a lot more readable.

Will do
> > +  nsNodeUtils::AttributeWillChange(aEntry->GetKey()->AsElement(),
> > +                                   kNameSpaceID_None, nsGkAtoms::dir,
> > +                                   nsIDOMMutationEvent::MODIFICATION);
> 
> Not sure whether this is supposed to raise mutation events or not.

I'm not sure what you're saying here. There are test cases in attachment 633248 [details] [diff] [review] and attachment 633250 [details] [diff] [review] that fail without this line in SetNodeDirection and ResetNodeDirection

> Food for thought: is it better to first check to see if we're dealing with
> auto directionality and then get the text direction?

as above, I'm not sure.

> ::: content/html/content/src/nsHTMLInputElement.cpp
> 
> Should nsHTMLTextAreaElement also receive the handling similar to the
> changes in this file?

I think that textarea already gets everything it needs from bug 662288, but I'll double check.

> ::: layout/style/nsCSSRuleProcessor.cpp
...
> This part I don't understand at all.

It doesn't belong to this patch, I'll move it to the patch for bug 562169.
(In reply to Simon Montagu from comment #57)
> > Also this doesn't seem to account
> > for adjacent text nodes, which is a bug (and I hope we're going to have test
> > cases for that).
> 
> Can you expand on this?

Sure, consider this test case:

document.body.dir = "auto";
var text1 = document.createTextNode("   "); // no strong chars
var text2 = document.createTextNode("احسان"); // strong chars
document.body.appendChild(text1);
document.body.appendChild(text2);

Here, if all that your code does is look at the first child text node, it will determine that it doesn't have a strong char, but the adjacent text node does.

> > > +  nsNodeUtils::AttributeWillChange(aEntry->GetKey()->AsElement(),
> > > +                                   kNameSpaceID_None, nsGkAtoms::dir,
> > > +                                   nsIDOMMutationEvent::MODIFICATION);
> > 
> > Not sure whether this is supposed to raise mutation events or not.
> 
> I'm not sure what you're saying here. There are test cases in attachment
> 633248 [details] [diff] [review] and attachment 633250 [details] [diff] [review]
> [review] that fail without this line in SetNodeDirection and
> ResetNodeDirection

Oh, OK, then ignore me, please.  I was just not sure whether this was needed or not.

> > Food for thought: is it better to first check to see if we're dealing with
> > auto directionality and then get the text direction?
> 
> as above, I'm not sure.

I think it's fine to defer this kind of micro-optimization to follow-up bugs.
current w-i-p checkpoint
Attached patch w-i-p patch, v.3 (obsolete) — Splinter Review
Attachment #633257 - Attachment is obsolete: true
Attached patch Dynamic tests updated (obsolete) — Splinter Review
Attachment #633248 - Attachment is obsolete: true
Attachment #633250 - Attachment is obsolete: true
Attachment #643446 - Attachment is patch: true
I made changes to a couple of tests so as to test that text in a <bdi> both sets the direction of the bdi and does not set the direction of the enclosing element even if it has dir=auto.
Attachment #625026 - Attachment is obsolete: true
Attachment #652106 - Flags: review+
Attachment #643448 - Attachment is obsolete: true
Attachment #652107 - Flags: review?(ehsan)
Attachment #643449 - Attachment is obsolete: true
Attachment #652109 - Flags: review?(ehsan)
Attached patch w-i-p patch, v.4Splinter Review
Attachment #643445 - Attachment is obsolete: true
Attachment #643446 - Attachment is obsolete: true
Attachment #652110 - Flags: review?(ehsan)
Attachment #652110 - Flags: review?(bzbarsky)
Comment on attachment 652110 [details] [diff] [review]
w-i-p patch, v.4

Can you please tell me what level of review you're looking for here?  I can beat the patch to death if you want me to, but it's marked as WIP which makes me think maybe you're looking for a higher level review.  :-)
Comment on attachment 652107 [details] [diff] [review]
Dynamic tests updated

Review of attachment 652107 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/reftests/bidi/dirAuto/setDir.js
@@ +2,5 @@
> +{
> +    for (var i = 0; ; ++i) {
> +        try {
> +            theElement = document.getElementById("set" + i);
> +            theElement.dir = value;

It would also be interesting to have another version of these tests which uses setAttribute and removeAttribute to set the dir attribute directly to make sure that the behavior matches with setting the dir property.
Attachment #652107 - Flags: review?(ehsan) → review+
Comment on attachment 652109 [details] [diff] [review]
More tests for changing text, updated

Review of attachment 652109 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/reftests/bidi/dirAuto/setDir.js
@@ +45,5 @@
> +        }
> +        if (theElement.tagName == "INPUT") {
> +            theElement.value = newText;
> +        } else {
> +            theElement.firstChild.textContent = newText;

Please also have variations of these functions which tests what happens when you set the value and defaultValue properties for a textarea instead of manipulating its children directly.
Attachment #652109 - Flags: review?(ehsan) → review+
(In reply to Ehsan Akhgari [:ehsan] from comment #67)
> Can you please tell me what level of review you're looking for here?  I can
> beat the patch to death if you want me to, but it's marked as WIP which
> makes me think maybe you're looking for a higher level review.  :-)

Please beat it to death, hang draw and quarter it, feather and tar it. I only wrote w-i-p because I'm sure there will be a few more iterations.
Comment on attachment 652110 [details] [diff] [review]
w-i-p patch, v.4

Review of attachment 652110 [details] [diff] [review]:
-----------------------------------------------------------------

This looks very good!

I have one nit all over the patch: please get rid of all of the trailing spaces that you're adding!  Splinter now makes it tough to get away with this stuff.

r=me with the below addressed.

::: content/base/public/DirectionalityUtils.h
@@ +23,5 @@
>  } // namespace mozilla
>  
>  namespace mozilla {
>  
>  namespace directionality {

Any chance we can avoid using nested namespaces?  I think mozilla:: should serve perfectly well here...

@@ +29,5 @@
>  enum Directionality {
>    eDir_NotSet = 0,
>    eDir_RTL    = 1,
> +  eDir_LTR    = 2,
> +  eDir_Auto   = 3

Nit: what's the point of specifying the enum values?  You'll get the same values by default, and I don't think that the numerical values really matter...

@@ +66,5 @@
> + * After setting dir=auto on an element, walk its descendants in tree order.
> + * If the node doesn't have the NODE_PARENT_HAS_DIR_AUTO flag, set the
> + * NODE_PARENT_HAS_DIR_AUTO flag on all of its descendants.
> + * Resolve the directionality of the element by the "downward propagation
> + * algorithm"

Nit: would be helpful to point out where that algorithm is defined.

::: content/base/public/nsINode.h
@@ +156,5 @@
> +  // Set if the node has dir=auto
> +  NODE_HAS_DIR_AUTO             = 0x00400000U,
> +
> +  // Set if a node in the node's parent chain has dir=auto
> +  NODE_PARENT_HAS_DIR_AUTO      = 0x00800000U,

Nit: please add a comment saying that people should use the helper functions to (un)set these.

@@ +973,5 @@
> +  }
> +
> +  bool NodeOrParentHasDirAuto() const
> +  {
> +    return HasFlag(NODE_HAS_DIR_AUTO) || HasFlag(NODE_PARENT_HAS_DIR_AUTO);

/me grumbles about why HasFlag is not called HasFlags...

@@ +1321,5 @@
> +    // Set if node has a dir attribute with a fixed value (ltr or rtl, NOT auto)
> +    NodeHasFixedDir,
> +    // Set if the node has dir=auto and has a property pointing to the text
> +    // node that determines its direction
> +    NodeHasDirAutoSet,

Hmm, this name is rather confusing, because it only tells half of the story.  I tried to think of a better name and I thought of NodeHasDirAutoAndTextNodeDirectionalityMap, but I'm not convinced that is strictly better.  ;-)

@@ +1410,5 @@
> +    { SetBoolFlag(NodeHasTextNodeDirectionalityMap); }
> +  void ClearHasTextNodeDirectionalityMap()
> +    { ClearBoolFlag(NodeHasTextNodeDirectionalityMap); }
> +  bool HasTextNodeDirectionalityMap() const
> +    { return GetBoolFlag(NodeHasTextNodeDirectionalityMap); }

Please MOZ_ASSERT in each of the modifier methods to make sure that NodeHasFixedDir and NodeHasAutoDir are not set on text nodes, and NodeHasTextNodeDirectionalityMap is not set on non-text nodes.

::: content/base/src/DirectionalityUtils.cpp
@@ +220,5 @@
> + * It *does* include textarea, because even if a textarea has dir=auto, it has
> + * unicode-bidi: plaintext and is handled automatically in bidi resolution.
> + */
> +bool
> +DoesNotParticipateInAutoDirection(const Element* aElement)

Nit: please make the helper functions that are only used in this file static.

@@ +330,5 @@
> +
> +/**
> + * Set the directionality of a node with dir=auto as defined in
> + * http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-directionality
> +s *

Nit: s/s// :-)

@@ +334,5 @@
> +s *
> + * @param[in] aStartAfterNode as an optimization, a caller may pass in a node
> + *            from which to begin walking the descendants of aElement, if it is
> + *            known that all text nodes before this node do not contain any
> + *            strong directional characters

Please MOZ_ASSERT this precondition #ifdef DEBUG.

@@ +340,5 @@
> + */
> +nsINode*
> +WalkDescendantsSetDirectionFromContent(Element* aElement, bool aNotify = true,
> +                                       nsINode* aStartAfterNode = nullptr)
> +{

Please MOZ_ASSERT aElement.

@@ +379,5 @@
> +
> +class nsTextNodeDirectionalityMap
> +{
> +
> +static void

Nit: indentation.

@@ +386,5 @@
> +{
> +  nsTextNodeDirectionalityMap* map =
> +    reinterpret_cast<nsTextNodeDirectionalityMap * >(aPropertyValue);
> +  map->DeleteHashtable();
> +  delete map;

Hmm, shouldn't it be the job of the nsTextNodeDirectionalityMap's destructor to call DeleteHashtable()?  This looks fragile, and will leak if the objects of this class are freed in another way.

@@ +390,5 @@
> +  delete map;
> +}
> +
> +public:
> +  nsTextNodeDirectionalityMap(nsINode* aTextNode)

Nit: make this explicit please.

@@ +391,5 @@
> +}
> +
> +public:
> +  nsTextNodeDirectionalityMap(nsINode* aTextNode)
> +  {

Nit: please MOZ_ASSERT aTextNode.

@@ +394,5 @@
> +  nsTextNodeDirectionalityMap(nsINode* aTextNode)
> +  {
> +    MOZ_COUNT_CTOR(nsTextNodeDirectionalityMap);
> +    mTextNode = aTextNode;
> +    mHashtable = new nsTHashtable<nsPtrHashKey<Element> >();

Nit: please move these two assignments to the initializer list.

@@ +395,5 @@
> +  {
> +    MOZ_COUNT_CTOR(nsTextNodeDirectionalityMap);
> +    mTextNode = aTextNode;
> +    mHashtable = new nsTHashtable<nsPtrHashKey<Element> >();
> +    if (mHashtable) {

No need for a null check, new is infallible.

@@ +405,5 @@
> +  }
> +
> +  ~nsTextNodeDirectionalityMap()
> +  {
> +    MOZ_COUNT_DTOR(nsTextNodeDirectionalityMap);

As I said, I think this should call DeleteHashtable.

@@ +410,5 @@
> +  }
> +
> +  void AddEntry(Element* aElement)
> +  {
> +    if (mHashtable && !mHashtable->Contains(aElement)) {

I don't think the null check is needed here or in RemoveEntry.

@@ +431,5 @@
> +  }
> +
> +private:
> +  nsTHashtable<nsPtrHashKey<Element> >* mHashtable;
> +  nsINode* mTextNode;

I don't think holding a bare pointer is safe, since the text node might get destroyed...  Similarly I think you should be using nsRefPtrHashKey for the hashtable.

@@ +450,5 @@
> +  }
> +
> +  static PLDHashOperator SetNodeDirection(nsPtrHashKey<Element>* aEntry, void* aDir)
> +  {
> +    NS_ASSERTION(aEntry->GetKey()->IsElement(), "Must be an Element");

Is there a good reason why this is not a MOZ_ASSERT?

@@ +458,5 @@
> +  }
> +
> +  static PLDHashOperator ResetNodeDirection(nsPtrHashKey<Element>* aEntry, void* aData)
> +  {
> +    NS_ASSERTION(aEntry->GetKey()->IsElement(), "Must be an Element");

Ditto.

@@ +481,5 @@
> +public:
> +  void UpdateAutoDirection(Directionality aDir)
> +  {
> +    if (mHashtable && mHashtable->Count() > 0) {
> +      mHashtable->EnumerateEntries(SetNodeDirection, (void*)&aDir);

The void* casts are redundant here and in ResetAutoDirection.

@@ +495,5 @@
> +  }
> +
> +  static void RemoveElementFromMap(nsINode* aTextNode, Element* aElement)
> +  {
> +    NS_ASSERTION(aTextNode->IsNodeOfType(nsINode::eTEXT), "Must be a text node");

MOZ_ASSERT.  (Here and elsewhere, unless there is a good reason why you don't want fatal assertions.

@@ +545,5 @@
> +
> +    if (aTextNode->HasTextNodeDirectionalityMap()) {
> +      nsTextNodeDirectionalityMap* map =
> +        reinterpret_cast<nsTextNodeDirectionalityMap * >
> +        (aTextNode->GetProperty(nsGkAtoms::textNodeDirectionalityMap));

This hurts my eyes.  I'd have a GetDirectionalityMap() helper which does this ugly casting.  Also, wouldn't static_cast do the job?

@@ +723,5 @@
> +        resetDirection = true;
> +      } else {
> +        // If parent's direction is already set, we need to know if
> +        // aTextNode is before or after the text node that had set it.
> +        // We will walk parent's descendants in tree order staring from

Nit: starting.

::: content/base/src/nsGenericElement.cpp
@@ +2073,5 @@
>      rv = AfterSetAttr(aNamespaceID, aName, &aValueForAfterSetAttr, aNotify);
>      NS_ENSURE_SUCCESS(rv, rv);
> +
> +    if (aNamespaceID == kNameSpaceID_None && aName == nsGkAtoms::dir) {
> +      mozilla::directionality::OnSetDirAttr(this, &oldDirValue,

What guarantees that oldDirValue would be valid here?

::: content/base/src/nsTextNode.cpp
@@ +110,5 @@
>  
>    return NS_OK;
>  }
>  
> +typedef nsGenericDOMDataNode nsTextNodeBase;

Hrm, if you want to do this, please do it in the header and make nsTextNode actually inherit from nsTextNodeBase.

::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +1903,5 @@
> +{
> +  if (aNamespaceID == kNameSpaceID_None) {
> +    if (aName == nsGkAtoms::dir && HasFlag(NODE_HAS_DIR_AUTO)) {
> +      // setting dir on an element that currently has dir=auto
> +      if (HasFlag(NODE_HAS_DIR_AUTO)) {

Nit: and'em'all!

::: content/html/content/src/nsGenericHTMLElement.h
@@ +52,5 @@
>      AddStatesSilently(NS_EVENT_STATE_LTR);
>      SetFlags(NODE_HAS_DIRECTION_LTR);
> +    if (mNodeInfo->Equals(nsGkAtoms::bdi)) {
> +      SetDirAutoFlag(true);
> +    }

Hmm, OK.  I don't really like doing this for every element...  Perhaps a better option would be to create a content class for bdi.  I'll let Boris arbitrate on that.

::: content/html/content/src/nsHTMLInputElement.cpp
@@ +3145,5 @@
> +      (mType == NS_FORM_INPUT_TEXT ||
> +       mType == NS_FORM_INPUT_SEARCH ||
> +       mType == NS_FORM_INPUT_TEL ||
> +       mType == NS_FORM_INPUT_URL ||
> +       mType == NS_FORM_INPUT_EMAIL)) {

Nit: please use nsIFormControl::IsSingleLineTextControl.
Attachment #652110 - Flags: review?(ehsan) → review+
Also, please make sure you test Talos numbers on this patch on try before landing.
I can't give a useful review ETA here.... You might want to try a different DOM peer.  :(
Attachment #652110 - Flags: review?(bzbarsky) → review?(peterv)
Any word on this r?
Comment on attachment 652110 [details] [diff] [review]
w-i-p patch, v.4

Review of attachment 652110 [details] [diff] [review]:
-----------------------------------------------------------------

We seem to be doing a lot of calculations even for nodes that are not in a document. If this is only for layout it makes more sense to only do it if the node is in the document (check IsInDoc()), but I might be missing how the computed value is exposed.

::: content/base/public/nsINode.h
@@ +156,5 @@
> +  // Set if the node has dir=auto
> +  NODE_HAS_DIR_AUTO             = 0x00400000U,
> +
> +  // Set if a node in the node's parent chain has dir=auto
> +  NODE_PARENT_HAS_DIR_AUTO      = 0x00800000U,

Hmm, so this really is NODE_ANCESTOR_HAS_DIR_AUTO. I think I prefer that name if that's what it means.

@@ +971,5 @@
> +      UnsetFlags(NODE_PARENT_HAS_DIR_AUTO);
> +    }
> +  }
> +
> +  bool NodeOrParentHasDirAuto() const

Again, it seems this should be about ancestors, not parent.

::: content/base/src/DirectionalityUtils.cpp
@@ +47,5 @@
> +
> +  Each text node should maintain a list of elements which have their
> +  directionality determined by the first strong character of that text node.
> +  This is useful to make dynamic changes more efficient.  One way to implement
> +  this is to have a per-document hash table mapping a text node to a set of

You might want to document somewhere that we implement this using a per-textnode property. I was a bit confused since this says per-document, but of course properties are a per-document hash of nodes to something :-).

@@ +225,5 @@
> +{ 
> +  nsINodeInfo* nodeInfo = aElement->NodeInfo();
> +  return (nodeInfo->Equals(nsGkAtoms::script) ||
> +          nodeInfo->Equals(nsGkAtoms::style) ||
> +          nodeInfo->Equals(nsGkAtoms::textarea));

Do we really want to do this for any namespace? You could use aElement->IsHTML(...) if not. Not sure if you need to support multiple namespaces for script and style, if so you can check aElement->GetNameSpaceID() and aElement->Tag(). Please check all the places where you check tagnames in this patch, they probably need similar fixes (except probably in nsGenericHTMLElement).

@@ +257,5 @@
> +               value is eDir_NotSet). 
> + * @return the directionality of the string
> + */
> +Directionality
> +GetDirectionFromContent(const PRUnichar* aContent, const PRUint32 aLength,

FromContent ususally means "from nsIContent", might want to call these GetDirectionFromText too.

@@ +430,5 @@
> +    }
> +  }
> +
> +private:
> +  nsTHashtable<nsPtrHashKey<Element> >* mHashtable;

Use a nsAutoPtr around the nsTHashtable.
But, how often do we think we'll have more than one item in this hash? If it's rare then you should consider using nsCheapSet.

@@ +431,5 @@
> +  }
> +
> +private:
> +  nsTHashtable<nsPtrHashKey<Element> >* mHashtable;
> +  nsINode* mTextNode;

Ehsan wants to use a strong pointers here. For the text node: since these objects are stored as a property of the text node itself, using a raw pointer seems better. We clear properties if a node is destroyed and we don't want to add cycles between nodes and their properties. For the elements, if the element dies we unbind its descendants, which should delete the hashtables, so using a raw pointer seems ok there too (the tables with a reference to an element should only exist as properties on its descendants, right?). If we do end up adding strong pointers we'll need to traverse those from nsINode::Traverse.

@@ +435,5 @@
> +  nsINode* mTextNode;
> +
> +  void DeleteHashtable()
> +  {
> +    if (mHashtable && mHashtable->Count() > 0) {

No need for the null-check.

@@ +480,5 @@
> +
> +public:
> +  void UpdateAutoDirection(Directionality aDir)
> +  {
> +    if (mHashtable && mHashtable->Count() > 0) {

No need for the null-check.

@@ +487,5 @@
> +  }
> +
> +  void ResetAutoDirection(nsINode* aStartAfterNode)
> +  {
> +    if (mHashtable && mHashtable->Count() > 0) {

No need for the null-check.

@@ +500,5 @@
> +    if (aTextNode->HasTextNodeDirectionalityMap()) {
> +      nsTextNodeDirectionalityMap* map =
> +        reinterpret_cast<nsTextNodeDirectionalityMap * >
> +        (aTextNode->GetProperty(nsGkAtoms::textNodeDirectionalityMap));
> +      if (map) {

No need for the null-check.

@@ +518,5 @@
> +    } else {
> +      map = new nsTextNodeDirectionalityMap(aTextNode);
> +    }
> +
> +    if (map) {

No need for the null-check.

@@ +531,5 @@
> +    if (aTextNode->HasTextNodeDirectionalityMap()) {
> +      nsTextNodeDirectionalityMap* map = 
> +        reinterpret_cast<nsTextNodeDirectionalityMap * >
> +        (aTextNode->GetProperty(nsGkAtoms::textNodeDirectionalityMap));
> +      if (map) {

No need for the null-check.

@@ +546,5 @@
> +    if (aTextNode->HasTextNodeDirectionalityMap()) {
> +      nsTextNodeDirectionalityMap* map =
> +        reinterpret_cast<nsTextNodeDirectionalityMap * >
> +        (aTextNode->GetProperty(nsGkAtoms::textNodeDirectionalityMap));
> +      if (map) {

No need for the null-check.

@@ +558,5 @@
>  RecomputeDirectionality(Element* aElement, bool aNotify)
>  {
> +  NS_ASSERTION(!aElement->HasFlag(NODE_HAS_DIR_AUTO),
> +               "RecomputeDirectionality called with dir=auto");
> +  if (aElement->HasDirAutoSet()) {

Given the assert above, shouldn't this just check for NODE_PARENT_HAS_DIR_AUTO?

@@ +731,5 @@
> +          static_cast<nsINode*>(parent->GetProperty(nsGkAtoms::dirAutoSetBy));
> +        if (!directionWasSetByTextNode) {
> +          resetDirection = true;
> +        } else if (directionWasSetByTextNode != aTextNode) {
> +          nsIContent* child = aTextNode->AsContent();

You should set child to aTextNode->GetNextNode(parent) here. You know aTextNode is not an element and it's not equal to directionWasSetByTextNode, so you're doing a useless iteration through the loop below.

@@ +732,5 @@
> +        if (!directionWasSetByTextNode) {
> +          resetDirection = true;
> +        } else if (directionWasSetByTextNode != aTextNode) {
> +          nsIContent* child = aTextNode->AsContent();
> +          if (nsContentUtils::ContentIsDescendantOf(child, parent)) {

How can this not be true? Parent is set to aTextNode->GetElementParent() first and then to parent->GetElementParent(). Drop this check unless I'm missing something.

@@ +764,5 @@
> +
> +      // Since we found an element with dir=auto, we can stop walking the
> +      // parent chain: none of its ancestors will have their direction set by
> +      // any of its descendants.
> +      break;

I'd return here, otherwise one has to figure out which loop is being broken and that nothing happens after it.

::: content/base/src/nsGenericDOMDataNode.cpp
@@ +303,5 @@
>      nsNodeUtils::CharacterDataWillChange(this, &info);
>    }
>  
> +  if (IsNodeOfType(nsINode::eTEXT)) {
> +    mozilla::directionality::SetDirectionFromChangedTextNode(this, aOffset,

no need for |mozilla::directionality::|, you've imported the namespace with |using namespace|

::: content/base/src/nsTextNode.cpp
@@ +110,5 @@
>  
>    return NS_OK;
>  }
>  
> +typedef nsGenericDOMDataNode nsTextNodeBase;

I'd rather we don't do this, but if we must then you should change all nsGenericDOMDataNodes to nsTextNodeBase in this file (might be better to do that in a separate patch).

::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +1934,5 @@
>        Directionality dir;
>        if (aValue &&
>            (aValue->Equals(nsGkAtoms::ltr, eIgnoreCase) ||
> +           aValue->Equals(nsGkAtoms::rtl, eIgnoreCase) ||
> +           aValue->Equals(nsGkAtoms::_auto, eIgnoreCase))) {

I think you can check for |aValue->Type() == nsAttrValue::eEnum|

@@ +1939,2 @@
>          SetHasValidDir();
> +        if (aValue->Equals(nsGkAtoms::_auto, eIgnoreCase)) {

and compare aValue->GetEnumValue() with the enum here.

@@ +2126,5 @@
>  {
>    if (aNamespaceID == kNameSpaceID_None) {
>      if (aAttribute == nsGkAtoms::dir) {
> +      bool rv = aResult.ParseEnumValue(aValue, kDirTable, false);
> +      SetDirAutoFlag(rv && aValue.LowerCaseEqualsLiteral("auto"));

Check aResult.GetEnumValue(). It seems wrong to do it here though, isn't the code in AfterSetAttr enough to deal with it?

::: content/html/content/src/nsGenericHTMLElement.h
@@ +52,5 @@
>      AddStatesSilently(NS_EVENT_STATE_LTR);
>      SetFlags(NODE_HAS_DIRECTION_LTR);
> +    if (mNodeInfo->Equals(nsGkAtoms::bdi)) {
> +      SetDirAutoFlag(true);
> +    }

I agree with Ehsan, let's not do this for every HTML element. It seems like you could do this in nsHTMLUnknownElement (please double-check), or add a class for bdi elements (inherit from nsGenericHTMLElement, override constructor, add a NS_DECLARE_NS_NEW_HTML_ELEMENT and HTML_TAG, that should be it I think).
Attachment #652110 - Flags: review?(peterv) → review+
> +  if (IsHTML() && !HasFlag(NODE_HAS_DIR_AUTO) && !HasDirAutoSet()) {

If !NODE_HAS_DIR_AUTO, we couldn't have HasDirAutoSet() anyway, right?

I assume dir="auto" is not the default state, right?

Might be better to use a NodeType() test instead of the virtual IsNodeOfType to test for text nodes.

SetDirectionFromChangedTextNode seems to be doing possibly-expensive GetDirectionFromText calls before it even knows whether anyone cares.  Why can that call not move into the "I have an ancestor that cares" if block?

If I'm right that GetDirectionFromText can be expensive, then why are we unconditionally doing it on every bind/unbind of a textnode?  Those are typically perf-sensitive; ideally we'd avoid even non-inlined function calls in the common case of bind/unbind (e.g. I think ResetDirectionSetByTextNode should have the initial if check inlined).  And we _may_ want to just have a boolean on the nodeinfo for "DoesNotParticipateInAutoDirection" if we care.
Comment on attachment 652110 [details] [diff] [review]
w-i-p patch, v.4

Review of attachment 652110 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/DirectionalityUtils.cpp
@@ +431,5 @@
> +  }
> +
> +private:
> +  nsTHashtable<nsPtrHashKey<Element> >* mHashtable;
> +  nsINode* mTextNode;

I forgot: given that we store this class as a property of the textnode, callers that use it need to look it up on the textnode so they have a pointer to that already. Seems like whatever uses mTextNode could have the textnode passed in, and we could reduce the size of this class by a pointer. You could add some helper functions that takes a textnode, looks up the nsTextNodeDirectionalityMap and calls the relevant function on it passing it the text node.
(In reply to comment #77)
> Comment on attachment 652110 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=652110
> w-i-p patch, v.4
> 
> Review of attachment 652110 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/base/src/DirectionalityUtils.cpp
> @@ +431,5 @@
> > +  }
> > +
> > +private:
> > +  nsTHashtable<nsPtrHashKey<Element> >* mHashtable;
> > +  nsINode* mTextNode;
> 
> I forgot: given that we store this class as a property of the textnode, callers
> that use it need to look it up on the textnode so they have a pointer to that
> already. Seems like whatever uses mTextNode could have the textnode passed in,
> and we could reduce the size of this class by a pointer. You could add some
> helper functions that takes a textnode, looks up the
> nsTextNodeDirectionalityMap and calls the relevant function on it passing it
> the text node.

That seems like a great idea!
Since bug 801609 (well, lots of other things too, but that was the straw that broke the camel's back) there is no longer room to add two new NODE_FLAG_BITs. What should I do about that?
As noted on IRC, mBoolFlags is your friend!
Indeed.  The right thing to do is to find boolean flags that are incorrectly in NODE_FLAG_BIT stuff and move them to mBoolFlags.
And what are the criteria for which flags belong where? FTR, mBoolFlags is not far off full either.
> And what are the criteria for which flags belong where?

Generally speaking, flags that are gotten or set as a group, or "flags" that are multibit, should live in mFlags.  Flags that are only gotten/set individually and are boolean should live in mBoolFlags.

At first glance, we have 12 flags free in mBoolFlags...  If we get desperate we could free up some, but we're ok for now.
Depends on: 809446
(In reply to Ehsan Akhgari [:ehsan] from comment #72)
> Also, please make sure you test Talos numbers on this patch on try before
> landing.

I did a try run with Talos at https://tbpl.mozilla.org/?tree=Try&rev=859d08759875, but I have no clue how to assess the results.
(In reply to comment #84)
> (In reply to Ehsan Akhgari [:ehsan] from comment #72)
> > Also, please make sure you test Talos numbers on this patch on try before
> > landing.
> 
> I did a try run with Talos at
> https://tbpl.mozilla.org/?tree=Try&rev=859d08759875, but I have no clue how to
> assess the results.

Try http://perf.snarkfest.net/compare-talos/.  You can look at the central/inbound revision for the base of this push, trigger a number of Talos jobs on them in order to get a useful baseline.
Flags: in-testsuite+
Seems like a pixel difference.  Simon, you should be able to easily update the reftest manifest to adjust for that if you concur that this is not a real bug.
Target Milestone: mozilla14 → mozilla20
Depends on: 815477
Depends on: 815500
Blocks: 712603
Depends on: 816253
Depends on: 822723
Depends on: 819014
Depends on: 824719
Depends on: 827190
Depends on: 828054
Alias: DirAuto
Depends on: 819623
Depends on: 829428
Depends on: 830098
Depends on: 832644
Are we tracking the various regressions from this for landing on Aurora 20?
Depends on: 836890
Depends on: 836925
(In reply to Boris Zbarsky (:bz) from comment #91)
> Are we tracking the various regressions from this for landing on Aurora 20?

Blocking bugs that we've fixed in FF21 are also fixed on FF20 at this point or is tracked. I've asked Simon to nominate future critical regressions for tracking-firefox20.
Depends on: 838489
Depends on: 831287
Depends on: 841205
Depends on: 844404
Depends on: 845093
Depends on: 849745
Depends on: 849727
Depends on: 849732
Depends on: 859014
Depends on: 859016
Depends on: CVE-2013-1686
Depends on: CVE-2013-1724
Depends on: CVE-2014-1567
Depends on: 1471872
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: