Closed Bug 702121 Opened 13 years ago Closed 12 years ago

Incorrect hyphenation patterns used by CSS moz-hyphens in XHTML content (en-US patterns used, regardless of lang declaration)

Categories

(Core :: Layout: Text and Fonts, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla17
Tracking Status
relnote-firefox --- -

People

(Reporter: frfxtst, Assigned: dbaron)

References

Details

Attachments

(5 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0
Build ID: 20111104165243

Steps to reproduce:

I used -moz-hyphens in a xhtml file which has no language specified (no lang attribute used).


Actual results:

According to the specification (https://developer.mozilla.org/en/CSS/hyphens), a hyphenation rule should only be applied, if a language is specified using the "lang" attribute.

Which hyphenation rule is applied if no language is specified?
Summary: CSS moz-hyphens → CSS moz-hyphens / no language set
None?
Component: General → Style System (CSS)
Product: Firefox → Core
QA Contact: general → style-system
Sorry, my description was not clear. The actual result is, that a hyphenation rule is applied but it shouldn't.
Attachment #574156 - Attachment mime type: application/octet-stream → text/xml
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
This is not a duplicate of bug 702123. The topic of bug 702123 is, that a hyphenation rule is applied but shouldn't. The topic of this bug is that different hyphenation rules are applied in html and xhtml files.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Summary: CSS moz-hyphens / no language set → -moz-hyphens should not apply hyphenation rules when no language is specified
> According to the specification (https://developer.mozilla.org/en/CSS/hyphens)

That's not the specification.  That's documentation.  And it was just wrong.  I've fixed it.

What hyphenation dictionary is used depends on the encoding of the page and the declared language.

> The topic of this bug is that different hyphenation rules are applied in html and xhtml
> files.

The attached testcase is using UTF-8 as the encoding.  If I load it a UTF-8 HTML (as opposed to ISO-8859-1 HTML), it hyphenates exactly the same as the attached testcase.

So as far as I can tell the behavior is correct; over to layout:text to verify.
Component: Style System (CSS) → Layout: Text
QA Contact: style-system → layout.fonts-and-text
Summary: -moz-hyphens should not apply hyphenation rules when no language is specified → CSS moz-hyphens / no language set
I guess the difference between ISO-8859-1 and UTF-8 could be considered a problem.  Is that the concern?
Oh, and what the actual _spec_ at http://dev.w3.org/csswg/css3-text/#hyphens says is this:

  The UA is therefore only required to automatically hyphenate text for which the author
  has declared a language (e.g. via HTML lang or XML xml:lang) and for which it has an
  appropriate hyphenation resource. 

Note that the UA is allowed to hyphenate other texts too; it's just not required to.
It looks to me like we're not correctly detecting the declared language (or lack thereof) in XHTML documents, and always treating the text as being en-US. Even after adding a specific declaration such as xml:lang="af", I still see the en-US hyphenation dictionary being used.

Not sure yet whether this is a problem specifically in the hyphenation code, or a higher-level issue with xml:lang support.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 7 → All
Hardware: x86_64 → All
Summary: CSS moz-hyphens / no language set → Incorrect hyphenation patterns used by CSS moz-hyphens in XHTML content (en-US patterns used, regardless of lang declaration)
Version: 8 Branch → Trunk
In this testcase, the content is explicitly tagged with xml:lang="ar", so we should definitely *not* be applying English rules. (And as we don't ship any rules for Arabic, the net result should be no auto-hyphenation.)

However, it seems that when BuildTextRunsScanner::SetupBreakSinksForTextRun looks at GetStyleVisibility()->mLanguage to find the appropriate language, it's getting "en" despite the "ar" tag in the document.
I think this is a duplicate of bug 234485.  I thought we'd implemented support for xml:lang, but I don't think we have.
... except for the part where we should store whether the language information came from something explicit, and only do language-based hyphenation when it is.
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/4e1148566cbf/explicit-language is an untested patch for the style system side of tracking whether the language is explicitly set.
Attachment #644757 - Attachment description: Track whether nsStyleVisibility::mLanguage came from explicit information in the document. () UNTESTED → Track whether nsStyleVisibility::mLanguage came from explicit information in the document. (, patch 1)
Comment on attachment 644757 [details] [diff] [review]
Track whether nsStyleVisibility::mLanguage came from explicit information in the document.  (, patch 1)

Review of attachment 644757 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/nsStyleStruct.cpp
@@ +139,5 @@
>        language.FindChar(PRUnichar(',')) == kNotFound) {
>      mLanguage = do_GetAtom(language);
> +    // REVIEW: Should this count as explicit, or should only the lang
> +    // attribute count as explicit?
> +    mExplicitLanguage = true;

My inclination would be to treat only the lang attribute as explicit. That's often the only thing that an author will control (or even be aware of), I suspect. And given that this (for now, at least) is targeted at hyphenation, which in turn is author-controlled via CSS, it seems right to keep the language control also firmly and solely in the hands of the page author, without involving the server configuration.

In which case the initialization of mExplicitLanguage (always false) could be moved to the initializer lists for the non-copy constructors, instead of needing to be included in the Init() method.

::: layout/style/nsStyleStruct.h
@@ +113,5 @@
>    nscoord mScriptMinSize;        // [inherited] length
>    float   mScriptSizeMultiplier; // [inherited]
>    nsCOMPtr<nsIAtom> mLanguage;   // [inherited]
> +  // was mLanguage set based on something in the document?
> +  bool mExplicitLanguage;        // [inherited]

It looks to me like the struct will pack better if you insert this alongside the existing PR[U]Int8 fields, rather than appending it.

(The initializer in nsStyleStruct.cpp would then need to be re-ordered to match.)
Comment on attachment 644758 [details] [diff] [review]
Only do hyphenation when the language was specified explicitly, rather than using an encoding-inferred language.  (, patch 2)

Review of attachment 644758 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me (assuming it passes the tests, of course!)

If it's easy to do, maybe it'd be nice to split the variable-renaming stuff that doesn't involve any functional change (langGroup -> hyphenationLanguage, etc) into a separate patch, and then the actual behavior-changing patch would become much smaller. But not a big deal if you don't want to bother.
Attachment #644758 - Flags: review?(jfkthame) → review+
Attachment #644757 - Attachment is obsolete: true
Attachment #644757 - Flags: review?(jfkthame)
Attachment #652991 - Flags: review?(jfkthame) → review+
Assignee: nobody → dbaron
https://hg.mozilla.org/mozilla-central/rev/198f6c4784cb
https://hg.mozilla.org/mozilla-central/rev/67f1eff2cc7c
Status: NEW → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Depends on: 822554
Attachment #703339 - Flags: review?(jfkthame) → review+
Not rel-noting it as majority of the feature  landed in mozilla-17. The patch that recently landed seems to be reftest's hence not tracking for rel-noting
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: