Closed Bug 619521 Opened 14 years ago Closed 9 years ago

Send font-needed notification for scripts/languages not supported by the default system fonts

Categories

(Core :: Graphics: Text, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
fennec + ---

People

(Reporter: jfkthame, Assigned: jfkthame, Mentored)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-needed, Whiteboard: [read from comment 151 for documentation])

User Story

Documentation:

These changes add a new "gfx.missing_fonts.notify" preference option, disabled by default. When enabled, the graphics code will check characters that can not be drawn (replaced with "Tofu" boxes) for lack of appropriate fonts and send notifications via the nsIObserverService service to inform observers about missing characters.

Each Unicode character is assigned a Unicode script identified by 4 letters (https://hg.mozilla.org/mozilla-central/file/9696d1c4b3ba/intl/unicharutil/util/nsUnicodeScriptCodes.h#l89, https://en.wikipedia.org/wiki/Script_%28Unicode%29#Table_of_scripts_in_Unicode). The notifications sent have aSubject=null, aTopic="font-needed" and aData is a stringified array of 4-letter scripts corresponding to the missing characters detected. In order to limit the number of notifications, a missing script is only notified once per browser session (see discussion in this bug).

Clients willing to observe the notification should register via the nsIObserverService. Note that for e10s applications, the notifications are sent from the content process. See https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/The_message_manager#Loading_scripts_per_child_process and https://github.com/mrbkap/e10sbug/tree/observer for a way to register observers from that process.

Attachments

(6 files, 25 obsolete files)

1.08 KB, text/csv
Details
81.89 KB, image/png
Details
4.78 KB, patch
jtd
: review+
fredw
: checkin+
Details | Diff | Splinter Review
16.75 KB, application/x-xpinstall
Details
3.17 KB, patch
Details | Diff | Splinter Review
49.62 KB, patch
Details | Diff | Splinter Review
(followup to bug 609649)

We now have glyph-shaping support for Arabic/Persian, but many devices have no font that supports these characters and so the user will still see "missing glyph" boxes.

We need to figure out how to enable users to display content in such scripts. Possibilities might include:
- ship a font with wide coverage (such as DejaVu Sans) as part of the Fennec product (but this impacts installer/download size);
- provide a mechanism for users to download additional fonts and add to the product, perhaps via an add-on (but need to figure out how to inform users of the possibility);
- have a download-on-demand mechanism that detects when we try to display text in a script for which no font is available, and offers to automatically fetch the necessary font.
I strongly support the third option: offer to download the font when needed. I am really concerned about the size of our installer package and wouldn't want to add to it. Users should also have an option to *remove* fonts they don't need anymore. Maybe that could be done via an add-on "installed fonts"?
Blocks: 631159
tracking-fennec: --- → ?
tracking-fennec: ? → 2.0-
Whiteboard: [fennec-6]
Assignee: nobody → mark.finkle
This is a platform bug. Bug 648548 is the front-end part
Assignee: mark.finkle → nobody
so what's the plan here and who is going to own it?
(In reply to comment #3)
> so what's the plan here and who is going to own it?

Is there any way for the platform to notify the front-end that fonts are needed? We can try to create a mozilla.org repo of fonts and make fennec try to download the required fonts when needed. But front-end don't know when fonts are needed.
Blocks: 653911
tracking-fennec: - → 6+
Whiteboard: [fennec-6]
I've looked for a solution, and here is what I've found : 
- Most languages, except Chinese, use normal fonts (e.g. Arial), and UTF-8, hence we cannot simply parse the CSS and search for a font.
- Chinese use a specific encoding.

Plan : 
- Register to the observer in the font rendering code.
- When characters cannot be rendered, notify the observer, providing the values of the characters that couldn't be rendered. This can be done after the page rendering, or when the page is rendering, or elsewhere. We can intercept the calls to DrawMissingGlyph() for that purpose.
- The front sends these values to the server, which determines the fonts to offer (based on the subset of unicode characters needed). There is a possibility to need several ranges of characters, for example on the indian Google Search page (http://www.google.co.in/) where several distinct languages are proposed. As stated above, we need wide coverage font, split by language or something, to offer a fine granularity for the download.
- The client download these fonts.
(In reply to comment #6)
> I've looked for a solution, and here is what I've found : 
> - Most languages, except Chinese, use normal fonts (e.g. Arial), and UTF-8,
> hence we cannot simply parse the CSS and search for a font.
> - Chinese use a specific encoding.
> 
> Plan : 
> - Register to the observer in the font rendering code.
> - When characters cannot be rendered, notify the observer, providing the
> values of the characters that couldn't be rendered. This can be done after
> the page rendering, or when the page is rendering, or elsewhere. We can
> intercept the calls to DrawMissingGlyph() for that purpose.
> - The front sends these values to the server, which determines the fonts to
> offer (based on the subset of unicode characters needed). There is a
> possibility to need several ranges of characters, for example on the indian
> Google Search page (http://www.google.co.in/) where several distinct
> languages are proposed. As stated above, we need wide coverage font, split
> by language or something, to offer a fine granularity for the download.
> - The client download these fonts.

Sounds like you want to use the nsIObserverService to send notifications from the backend to the front-end. Sounds good to me.
Yes, I forgot to add that I'll use the nsIObserverService. I will start writing a patch tomorrow.
Status: NEW → ASSIGNED
(In reply to comment #6)
> - When characters cannot be rendered, notify the observer, providing the
> values of the characters that couldn't be rendered. This can be done after
> the page rendering, or when the page is rendering, or elsewhere. We can
> intercept the calls to DrawMissingGlyph() for that purpose.

DrawMissingGlyph isn't the right place for this.  Font matching is for each character.  First the fonts in the font-family list are looked up and the cmap checked, then the specified pref font and then system font fallback.  So the final check ends up in gfxAndroidPlatform::FindFontForChar():

http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxAndroidPlatform.cpp#631

So catching characters that don't match should occur at the point where the "no match?" comment is.

But I think before you start coding this you probably want to put some thought into how you want this feature to work.  Some "can't find a font" situations will be because of a missing script, others will be because of data errors (e.g. text mime type for binary data).  From a UI standpoint you probably don't want to be putting up a slew of dialogs about "Install a font for language x?" all the time.  So here's my suggestion:

- set up a list of font families for various unsupported but commonly used scripts (whatever is missing from DroidFallback)
- as part of the build process, construct a cmap that unions together the cmaps of all those fonts
- in FindFontForChar, use the union of the cmaps to test for whether the current list of available fonts for download supports a given character
- if available, pull down and install a font family that supports the character into a "fennec fonts" folder
- include the "fennec fonts" folder in the list of fonts iterated over at startup

Nothing too hard but there are tricky details here for sure.

Note that this bug really has *3* components, the UI part, the backend work here and a server component (i.e. a server-side service that given a codepoint and/or script provides a zipfile with a family of appropriate fonts).

One slight variation on the solution here might be to create a new category of add-on's, font packages for specific languages/scripts.  Then the UI would simply need to be something similar to how add-ons are installed, with an option to "Disable Auto Activation" for those not interested in this feature.
This .csv file (attached) contains all the language not supported by a stock Android device (a Xoom tablet has been used to check the support of languages), as well as the Unicode range(s) that has to be available for this language to be rendered correctly. Other fields are the country where the language is in use, as well as the ISO language code.

I guess we could use Deja Vu's wide coverage of Unicode as the font of choice to provide the support for these languages.
(In reply to comment #10)
> Created attachment 534787 [details]
> Unicode ranges for common missing glyphs
> 
> This .csv file (attached) contains all the language not supported by a stock
> Android device (a Xoom tablet has been used to check the support of
> languages)

Note that many Android devices will have more gaps in coverage than this; for example, not all manufacturers include a font with Arabic characters (see bug 609649).
I've added arabic Unicode range in the file, but since I don't have many android devices, I cannot be exhaustive.
Attachment #534787 - Attachment is obsolete: true
Attached patch Patch v1 for bug 619521 (obsolete) — Splinter Review
Here is a patch showing a possible solution.

I've set up a test page at http://paul.cx/unicode.html, which contains several characters that are not available on a stock android.
This page listen to nsIObserverService events, which are sent by the backend. Whenever a character is not drawable, but can be downloaded, an alert() pops-up.

It is working quite fine on a stock Motorola Xoom Tablet.

For this demo to work, remember to change the |signed.applets.codebase_principal_support| preference to |true| in about:config, in order to be able to use XPCOM component in a remote web page.

The discutable kDownloadableUnicodeRanges array in the IsFontDownloadable method is in fact generated by a python script from the attached csv file. I don't know how such array should be integrated in the Mozilla codebase, so I'll just leave it here for now, as it facilitates the reading of the patch.
Attachment #535370 - Flags: feedback?(jdaggett)
(In reply to comment #13)
> Created attachment 535370 [details] [diff] [review] [review]
> Patch v1 for bug 619521
> 
> Here is a patch showing a possible solution.
> 
> I've set up a test page at http://paul.cx/unicode.html, which contains
> several characters that are not available on a stock android.
> This page listen to nsIObserverService events, which are sent by the
> backend. Whenever a character is not drawable, but can be downloaded, an
> alert() pops-up.

I don't think you want to be sending notifications to an observer from FindFontForChar. In the (common) case where there are a whole bunch of characters from a specific script for which no font is installed, that will spam the observer with a whole lot of notifications, and will be a drag on what is a fairly perf-critical part of page rendering.

IMO, the most logical place to hook in would probably be gfxFontGroup::InitScriptRun, after it has called ComputeRanges. At this point, you know whether there are any missing-glyph font ranges in the run. And you also know which script is being handled, so you can directly notify the observer "font needed for script X", without needing to look up each character in some separately-maintained table of Unicode ranges.
I've read a lot of code from this module (since I'm new to all this), and I've found a nice unicode table that contains all the information we need (http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.cpp#82). We could definitely get the info we need from there.

Jonathan, I'm trying to implement what you suggests. Would it be acceptable to use a set of PRUnichar keep track of all the characters we can't draw, and then, notify the frontend of the scripts we need to support, using the name of the Unicode ranges looked up in the table linked above ? That way, there will be only one notification by text run, I guess.
(In reply to comment #15)
> I've read a lot of code from this module (since I'm new to all this), and
> I've found a nice unicode table that contains all the information we need
> (http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFontUtils.
> cpp#82). We could definitely get the info we need from there.
> 
> Jonathan, I'm trying to implement what you suggests. Would it be acceptable
> to use a set of PRUnichar keep track of all the characters we can't draw,
> and then, notify the frontend of the scripts we need to support, using the
> name of the Unicode ranges looked up in the table linked above ? That way,
> there will be only one notification by text run, I guess.

I'd prefer to base this on the Unicode script involved, which is available in gfxFontGroup::InitScriptRun as the aRunScript parameter; these are values from the HB_SCRIPT_* enum. (There's API forthcoming in harfbuzz to convert these to strings, etc., which may be preferable for notification purposes.)

The ranges in gfxFontUtils break things up into contiguous ranges of Unicode codepoints, which is not really the most useful taxonomy of characters for this purpose, as the incremental development of the standard means that related characters may have ended up in separate ranges. The "script" property is defined on the individual characters regardless of when they got added or what range they ended up being allocated in, and as such it provides a more logical categorization.

So I don't think you need to collect individual characters in a set; just collect scripts for which there are missing fonts (which will typically be an order of magnitude (or two) less than the total number of characters). Maybe notifying on a per-script-run basis would be ok, in which case you don't really have to keep track of anything, just set a flag if the matchedFont is ever null in InitScriptRun, and then notify with the script code at the end of the function. Or to coalesce notifications further, you could keep track of the script codes involved (which will be a small set), and just notify once at the end of reflow with a list of missing scripts.
I've tried to minimize the number of notifications sent to the front end, by putting the actual notification on the method that calls (gfxFontGroup::InitTextRun), but this is useful only if a page is written using different scripts that are unavailable in the same text run.

I've added an example of mixed script in the same text run at the bottom of http://bde.insa-lyon.fr/orgaif/test/unicode.html
Attachment #535370 - Attachment is obsolete: true
Attachment #536330 - Flags: review?(jfkthame)
Attachment #535370 - Flags: feedback?(jdaggett)
Comment on attachment 536330 [details] [diff] [review]
Patch v2 - implements Jonathan Kew's idea

Review of attachment 536330 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +2429,5 @@
>      // split into script runs so that script can potentially influence
>      // the font matching process below
>      gfxScriptItemizer scriptRuns(aString, aLength);
> +    nsString scriptsUnavailable;
> +    nsCOMPtr<nsIObserverService> service = mozilla::services::GetObserverService();

I don't like having to do a GetService call every time we enter InitTextRun; we should find a way to call it once and hold on to the reference, so that we don't add this overhead every time. Maybe some object such as the gfxPlatform could own a reference to the observer service.

@@ +2462,5 @@
>          }
>  #endif
>  
> +        if ( ! InitScriptRun(aContext, aTextRun, aString, aLength,
> +                      runStart, runLimit, runScript))

Please adjust spacing and indents to match the surrounding coding style - no spaces around the "!", and indent the second line so that the parameters line up.

@@ +2466,5 @@
> +                      runStart, runLimit, runScript))
> +        {
> +            nsString script;
> +            if(nsTextFormatter::ssprintf(script, NS_LITERAL_STRING("%d,").get(), runScript) == 0) {
> +              scriptsUnavailable.Append(script);

Going through nsTextFormatter seems like overkill here; how about just

    if (!scriptsUnavailable.IsEmpty()) {
        scriptsUnavailable.Append(PRUnichar(','));
    }
    scriptsUnavailable.AppendInt(script);

(untested, but seems like it ought to work).

But see next comment.

@@ +2468,5 @@
> +            nsString script;
> +            if(nsTextFormatter::ssprintf(script, NS_LITERAL_STRING("%d,").get(), runScript) == 0) {
> +              scriptsUnavailable.Append(script);
> +            }
> +        }

It's not uncommon to have "interleaved" script runs (think of an English page that mentions several foreign words in the course of a sentence, with English words/phrases connecting them; each of those will become a separate script run). The approach here will end up appending the same script identifier to the "unavailable" string several times in this case, which seems non-optimal.

I'd suggest using an nsHashSet<PRInt32> to collect the set of missing script codes, so that the notification can be sent without duplicates. Then at the end, if the hashset is non-empty, enumerate its entries to create the notification.

@@ +2473,4 @@
>      }
>  
> +    if ( ! scriptsUnavailable.IsEmpty() )
> +      service->NotifyObservers(nsnull, "fontneeded", scriptsUnavailable.get());

Spacing & indents. Also, mozilla style guide calls for braces around the controlled block, even when it's a single statement.

Need to check that service is non-null before dereferencing it; I'm sure the observer service is always supposed to be available, but better to err on the side of caution.

(Actually, a cheap way to mitigate the issue of calling GetService on every text run would be to move that inside the if-condition here, so that you don't bother locating the service unless there actually are missing characters. But I'd still like to explore the option of holding a reference to the service somewhere more central rather than getting it on a per-textrun basis.)

::: gfx/thebes/gfxFont.h
@@ +2419,5 @@
>                       PRUint32 aLength);
>  
>      // InitTextRun helper to handle a single script run, by finding font ranges
>      // and calling each font's InitTextRun() as appropriate
> +    // Return true if a script is unavailable, false otherwise

This comment is incorrect - what you've implemented is to return true if all needed font(s) _are_ available, and false if any characters are missing.
(Also a nit: please fix up the indentation of the following lines.)
(In reply to comment #18)
> ::: gfx/thebes/gfxFont.cpp
> @@ +2429,5 @@
> >      // split into script runs so that script can potentially influence
> >      // the font matching process below
> >      gfxScriptItemizer scriptRuns(aString, aLength);
> > +    nsString scriptsUnavailable;
> > +    nsCOMPtr<nsIObserverService> service = mozilla::services::GetObserverService();
> 
> I don't like having to do a GetService call every time we enter InitTextRun;
> we should find a way to call it once and hold on to the reference, so that
> we don't add this overhead every time. Maybe some object such as the
> gfxPlatform could own a reference to the observer service.

The mozilla::services getters are already cached internally, so this is just a simple addref call.
(In reply to comment #19)
> The mozilla::services getters are already cached internally, so this is just
> a simple addref call.

Fair enough, that'll do! Though I'd still like to move it to within the conditional at the end - no point in even addref'ing unless we're actually going to use it.
Attached patch Patch v3 - Using a HashSet (obsolete) — Splinter Review
It appears that nsHashSet<PRInt32> doesn't exist. I used a nsTHashtable<nsUint32HashKey> instead.

I've also addressed your other remarks.
Attachment #536330 - Attachment is obsolete: true
Attachment #539490 - Flags: review?(jfkthame)
Attachment #536330 - Flags: review?(jfkthame)
tracking-fennec: 6+ → 7+
Comment on attachment 539490 [details] [diff] [review]
Patch v3 - Using a HashSet

Review of attachment 539490 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, modulo a few small nits.

Do we have any code yet that can listen to this notification and do something useful with it?

::: gfx/thebes/gfxFont.cpp
@@ +2417,5 @@
> +PLDHashOperator gfxFontGroup::EnumerateScripts(nsUint32HashKey* aEntry, void* aUserArg)
> +{
> +    nsString* scriptsList = static_cast<nsString*>(aUserArg);
> +    scriptsList->AppendInt(aEntry->GetKey());
> +    scriptsList->Append(PRUnichar(','));

To avoid a trailing comma, I'd prefer to reverse the order here, and only append the comma if the string is non-empty.

@@ +2440,5 @@
>  
>      PRUint32 runStart = 0, runLimit = aLength;
>      PRInt32 runScript = HB_SCRIPT_LATIN;
> +		nsTHashtable<nsUint32HashKey> scriptsSet;
> +		scriptsSet.Init();

No tabs, please!
Given that the normal situation is that there aren't any missing characters, let's defer the initialization of scriptsSet to the conditional block, so that it doesn't happen at all in most cases....

@@ +2464,5 @@
>          }
>  #endif
>  
> +        if (!InitScriptRun(aContext, aTextRun, aString, aLength,
> +                           runStart, runLimit, runScript)) {

....so, test here whether the set is initialized, and call Init() if not....

@@ +2471,3 @@
>      }
>  
> +    if (scriptsSet.Count()) {

....which means testing for IsInitialized() instead of Count() here, too.

@@ +2474,5 @@
> +        nsString scriptsUnavailable;
> +        nsCOMPtr<nsIObserverService> service = mozilla::services::GetObserverService();
> +        scriptsSet.EnumerateEntries(gfxFontGroup::EnumerateScripts, &scriptsUnavailable);
> +        service->NotifyObservers(nsnull, "fontneeded", scriptsUnavailable.get());
> +		}

Tabs again!
Line length is supposed to be limited to 80 chars, so please break appropriately to ensure this. (I know existing code doesn't always conform to this, but we're trying....)

::: gfx/thebes/gfxFont.h
@@ +2419,5 @@
>                       PRUint32 aLength);
>  
>      // InitTextRun helper to handle a single script run, by finding font ranges
>      // and calling each font's InitTextRun() as appropriate
> +    // Return false if a script is unavailable, true otherwise

More accurate would be "Return true if we have font coverage for all characters in the script run, false if there are any missing characters."

@@ +2429,5 @@
> +                         PRUint32 aScriptRunEnd,
> +                         PRInt32 aRunScript);
> +
> +    // Callback function to enumerate the set of unavailable scripts
> +		static PLDHashOperator EnumerateScripts(nsUint32HashKey* aEntry, void* aUserArg);

Use spaces, not tabs.
Attachment #539490 - Attachment is obsolete: true
Attachment #540065 - Flags: review?(jfkthame)
Attachment #539490 - Flags: review?(jfkthame)
Comment on attachment 540065 [details] [diff] [review]
Patch v4 - Adressed Jonathan Kew remarks

Review of attachment 540065 [details] [diff] [review]:
-----------------------------------------------------------------

Just a couple tiny style nits below that should ideally be fixed before landing, but this looks good to me.

::: gfx/thebes/gfxFont.cpp
@@ +2417,5 @@
> +PLDHashOperator gfxFontGroup::EnumerateScripts(nsUint32HashKey* aEntry, void* aUserArg)
> +{
> +    nsString* scriptsList = static_cast<nsString*>(aUserArg);
> +    if(!scriptsList->IsEmpty())
> +      scriptsList->Append(PRUnichar(','));

Space after "if", braces around the controlled statement, and we use 4-space indent in gfx.

@@ +2466,5 @@
>  
> +        if (!InitScriptRun(aContext, aTextRun, aString, aLength,
> +                           runStart, runLimit, runScript)) {
> +            if (!scriptsSet.IsInitialized()) {
> +              scriptsSet.Init();

4-space indent.
Attachment #540065 - Flags: review?(jfkthame) → review+
Attached patch Patch v5 -- Fixed indentation (obsolete) — Splinter Review
Attachment #540065 - Attachment is obsolete: true
Attachment #540414 - Flags: review?(jfkthame)
Note : 
I'm noticing the issue in a list as well on the language selection in the drop down for www.google.com/chrome

Mozilla/5.0 (Android; Linux armv71; rv6.0a2) Gecko/20110624 Firefox/6.0a2 Fennec/6.0a2
Device: Thunderbolt
OS: Android 2.2

Mozilla/5.0 (Android; Linux armv71; rv6.0a2) Gecko/20110624 Firefox/6.0a2 Fennec/6.0a2
Device: HTC Flyer
OS: Android 2.3
Whiteboard: [testday-2011-06-24]
(In reply to comment #26)
> Note : 
> I'm noticing the issue in a list as well on the language selection in the
> drop down for www.google.com/chrome
what issue are you noticing?
Attached image comparison screenshot
Some languages are not showing up correctly in the list.  They show up in blocks.  Screenshot shows blocks on left, a comparison to the desktop list on the right.
Can this be landed as it is, or should it be wrapped inside Android only #ifdef ?
Comment on attachment 540414 [details] [diff] [review]
Patch v5 -- Fixed indentation

The code seems fine to me, though we may find we want to make adjustments once we see how it interacts with the front-end. I don't have a strong opinion on whether we should #ifdef this so that it's Android-only for now - John, any thoughts? I doubt there'll be a detectable impact on text performance (though we should watch out for that when it lands, in case I'm guessing wrong!)
Attachment #540414 - Flags: review?(jfkthame)
Attachment #540414 - Flags: review?(jdaggett)
Attachment #540414 - Flags: review+
tracking-fennec: 7+ → 8+
jdaggett, review ping?
(In reply to Brad Lassey [:blassey][blassey@mozilla.com] from comment #31)
> jdaggett, review ping?

jdaggett, review ping #2?
Comment on attachment 540414 [details] [diff] [review]
Patch v5 -- Fixed indentation

+    nsTHashtable<nsUint32HashKey> scriptsSet;

+PLDHashOperator gfxFontGroup::EnumerateScripts(nsUint32HashKey* aEntry, void* aUserArg)
+{
+    nsString* scriptsList = static_cast<nsString*>(aUserArg);
+    if (!scriptsList->IsEmpty())
+      scriptsList->Append(PRUnichar(','));
+    scriptsList->AppendInt(aEntry->GetKey());
+    return PL_DHASH_NEXT;
+}

I realize Jonathan suggested this but I think that using a hashtable that is then converted to a string seems like overkill to me.  I doubt you'll ever get more than a handful of missing scripts on a page.  I would instead suggest just using a simple nsTArray of ints and manually iterating through it to assure no duplicates.  Then just pass the array to the observer, rather than converting it to a string (unless there's some limitation that requires strings here).
Attachment #540414 - Flags: review?(jdaggett) → review-
(In reply to John Daggett (:jtd) from comment #33)
> I realize Jonathan suggested this but I think that using a hashtable that is
> then converted to a string seems like overkill to me.  I doubt you'll ever
> get more than a handful of missing scripts on a page.  I would instead
> suggest just using a simple nsTArray of ints and manually iterating through
> it to assure no duplicates.

We could go either way on this, I suppose. It might be interesting to compare performance of hashtable lookup vs iterating over an array, to see how they compare with various (smallish) numbers of entries. Although we are unlikely to get more than a handful of missing scripts per page (maybe up to a dozen or so, in the not-uncommon cases where a site like wikipedia or facebook presents a list of alternate languages, each written in its own script), there may also be multiple text runs for a specific missing script, so we may end up doing the same lookup repeatedly.
Or just using a fixed array of 4 longs as a bitvector, that way you wouldn't need to check for duplicates, just set the bit for script n each time.  That's probably even simpler.
(In reply to John Daggett (:jtd) from comment #35)
> Or just using a fixed array of 4 longs as a bitvector, that way you wouldn't
> need to check for duplicates, just set the bit for script n each time. 
> That's probably even simpler.

I rather like that, except that there's a possibility we might want to switch from the current sequentially-allocated script codes to 4-byte script tags at some point (as that's what harfbuzz has decided to use).... if we follow that pattern then the fixed-size bitvector won't work any more, whereas the array or hash versions wouldn't care.
Let's keep things simple for now rather than making a "future-proof" version.
Have we agreed on which solution to implement here ?
(In reply to Paul ADENOT :padenot from comment #38)
> Have we agreed on which solution to implement here ?

Paul, why don't you code up the patch based on comment 35 and I'll take responsibility for making sure that Jonathan and I agree on the eventual patch.  That will make the patch simpler I think.
tracking-fennec: 8+ → +
tracking-fennec: + → 9+
Kurdish Language akso need this good Idea , .

thanks FF we are waiting On ANdroid.
Assignee: paul → blassey.bugs
tracking-fennec: 9+ → +
I am still not clear about what to do to add these fonts to my smartphobne?
This bug is still open. There is no way to add additional fonts at this time.
As this has been dormant for a while, I decided to try and update the patch so that we can get it landed. This version is rebased to current tip, and uses a small array of PRUint32 as a bitvector in gfxFontGroup::InitTextRun to keep track of any scripts that are lacking font coverage.

The notification sent is "font-needed", with a comma-separated list of script tags (not numeric codes).

Note that notifications will be sent repeatedly as multiple text-runs are constructed, so front-end code that listens to this will need to keep track of which missing scripts it has already dealt with, and avoid doing expensive or intrusive stuff for repeated notifications of the same script.
Attachment #540414 - Attachment is obsolete: true
Attachment #629103 - Flags: review?(jdaggett)
To support extra fonts that may be downloaded as needed, in addition to the standard fonts on the device, we need the Android font list to load fonts from additional location. One option - probably a pretty reasonable one - is to store such fonts in the user's profile. So this patch loads any font files found in the profile into the font list, along with the device's original fonts.
Attachment #631153 - Flags: review?(jdaggett)
The front end will need to be able to inform the platform when it has downloaded new fonts that should be included in the font list. This patch makes the font list observe an "update-font-list" notification, and rebuild the list in response; it also flushes the font caches and reflows the page, so that the newly-added fonts will be used where appropriate.
Attachment #631157 - Flags: review?(jdaggett)
Assignee: blassey.bugs → nobody
OS: All → Android
Product: Fennec → Fennec Native
Hardware: All → ARM
Assignee: nobody → blassey.bugs
Previous version of the patch caused some oranges on tryserver; this version resolves those and is also a bit simpler.
Attachment #631157 - Attachment is obsolete: true
Attachment #631157 - Flags: review?(jdaggett)
Attachment #633830 - Flags: review?(jdaggett)
Review ping? Would be nice to get this stuff landed so the Mobile front-end guys can move forward...
Re-ping.
Attachment #631153 - Flags: review?(jdaggett) → review+
Comment on attachment 629103 [details] [diff] [review]
reworked patch to send notification for scripts lacking font coverage


For me, the fundamental problem with these patches is that they're
doing notifications within InitTextRun, that seems *way* too low-level
a place to do this, for a page containing scripts that lack fonts the
code will end up calling the observers many, many times.  You noted
precisely this in comment 14.  The logic of checking for missing
scripts and notifying observers should happen once per page and thus
is really something that belongs in layout/content handling code
somewhere, not in gfx code.  The only thing in gfx code should be the
simple recording of when no font could be found for a given character.

In addition, I don't think the "bring up UI to download font when
missing scripts are present" logic is sufficient.  Situations such as
encoding errors, language selection menus such as the one used on
Wikipedia will often produce lots of missing scripts on pages.  In
these cases we probably *don't* want to bring up a "Do you want to
install fonts?" dialog.  In order to avoid a "nagging dialogs" type of
problem, there needs to be some higher-level heuristics used here. 
For example, "if 50% of the text runs on the page contain missing
scripts, ask whether to download fonts".

>      PRUint32 numRanges = fontRanges.Length();
> +    bool allCharsPresent = true;
>  
>      for (PRUint32 r = 0; r < numRanges; r++) {
>          const gfxTextRange& range = fontRanges[r];
>          PRUint32 matchedLength = range.Length();
>          gfxFont *matchedFont = (range.font ? range.font.get() : nsnull);
>  
>          // create the glyph run for this range
>          if (matchedFont) {
> @@ -3586,16 +3641,17 @@ gfxFontGroup::InitScriptRun(gfxContext *
>                  if (sizeof(T) == sizeof(PRUnichar)) {
>                      if (NS_IS_HIGH_SURROGATE(ch) &&
>                          index + 1 < aScriptRunEnd &&
>                          NS_IS_LOW_SURROGATE(aString[index + 1]))
>                      {
>                          aTextRun->SetMissingGlyph(index,
>                                                    SURROGATE_TO_UCS4(ch,
>                                                                      aString[index + 1]));
> +                        allCharsPresent = false;
>                          index++;
>                          aTextRun->SetIsLowSurrogate(index);
>                          continue;
>                      }
>  
>                      // check if this is a known Unicode whitespace character that
>                      // we can render using the space glyph with a custom width
>                      gfxFloat wid = mainFont->SynthesizeSpaceWidth(ch);
> @@ -3622,21 +3678,24 @@ gfxFontGroup::InitScriptRun(gfxContext *
>  
>                  if (IsInvalidChar(ch)) {
>                      // invalid chars are left as zero-width/invisible
>                      continue;
>                  }
>  
>                  // record char code so we can draw a box with the Unicode value
>                  aTextRun->SetMissingGlyph(index, ch);
> +                allCharsPresent = false;

Do this in gfxTextRun::SetMissingGlyph and set a flag/array within the
text run, passing in runScript if that's what's needed.
Attachment #629103 - Flags: review?(jdaggett) → review-
Comment on attachment 633830 [details] [diff] [review]
part 3 v2 - update the Android font list in response to an "update-font-list" notification

> +NS_IMETHODIMP
> +FontListUpdateObserver::Observe(nsISupports     *aSubject,
> +                                const char      *aTopic,
> +                                const PRUnichar *aData)
> +{
> +    NS_ASSERTION(!strcmp(aTopic, "update-font-list"), "invalid topic");
> +    gfxPlatformFontList::PlatformFontList()->ClearPrefFonts();
> +    gfxFontCache::GetCache()->AgeAllGenerations();
> +    gfxFontCache::GetCache()->FlushShapedWordCaches();
> +    gfxPlatformFontList::PlatformFontList()->InitFontList();
> +
> +    // modify a gfx.font_rendering.* pref to trigger reflow everywhere
> +    PRInt32 i = Preferences::GetInt("gfx.font_rendering.font_list_generation");
> +    Preferences::SetInt("gfx.font_rendering.font_list_generation", ++i);
> +
> +    return NS_OK;
> +}
> +
>  gfxFT2FontList::gfxFT2FontList()
>  {
> +    nsCOMPtr<nsIObserverService> obs = mozilla::services::GetObserverService();
> +    if (obs) {
> +        obs->AddObserver(new FontListUpdateObserver, "update-font-list", false);
> +    }
>  }

This will work but it's somewhat nuclear.  Why do we need to
completely iterate over all fonts when this is only intended for
additional fonts?
Blocks: 788450
can we just put the file path of the new font in the data field of the notification (and then only read in that font file)? Would that alleviate your concerns with the third patch John?
I don't see any harm in rebuilding the list from scratch; in fact, I think it's the safest approach. Remember that this will happen asynchronously, and only when a new font has been downloaded for a previously-unsupported script, which will generally be a handful of times over the entire installed life of the browser. It takes a matter of milliseconds to recreate the list, and it avoids the risk that we end up with fonts listed in a different order in the font list depending on the order of downloads, which could then affect font-matching behavior.
Please leave this bug open when merging to mozilla-central, as only one part has been landed so far.
Whiteboard: [testday-2011-06-24] → [testday-2011-06-24][leave open]
Blocks: 797662
This is obsolete/unused, so we may as well remove it rather than bother fixing it up in the following patch.
Attachment #669071 - Flags: review?(jdaggett)
Here's a version of the notification patch that collects missing-font info during reflow, and then fires the notification of scripts that lack coverage when reflow is finished; that will greatly reduce the notification chatter. It's still up to the front-end code to decide what (if anything) to do with the notifications, and to avoid bothering the user excessively.
Attachment #669073 - Flags: review?(jdaggett)
(In reply to John Daggett (:jtd) from comment #51)
> This will work but it's somewhat nuclear.  Why do we need to
> completely iterate over all fonts when this is only intended for
> additional fonts?

In addition to the points in comment 53, this update-font-list notification would also be used if the front-end decides to offer UI for disabling or removing locally-added fonts (see also bug 798199), in which case it needs to remove the no-longer-available font(s) from the list.

I believe it's simpler and more reliable to rebuild the list from scratch on any change, rather than attempt to modify it in-place, and the minor performance difference is not an issue for these rare operations.
Attachment #669071 - Flags: review?(jdaggett) → review+
jdaggett, review ping?
(In reply to Jonathan Kew (:jfkthame) from comment #53)
> I don't see any harm in rebuilding the list from scratch; in fact, I think
> it's the safest approach. Remember that this will happen asynchronously, and
> only when a new font has been downloaded for a previously-unsupported
> script, which will generally be a handful of times over the entire installed
> life of the browser. It takes a matter of milliseconds to recreate the list,
> and it avoids the risk that we end up with fonts listed in a different order
> in the font list depending on the order of downloads, which could then
> affect font-matching behavior.

The order of fonts in the font list does not affect font matching
behavior, when more than one font match for a given character the name
is used to determine the font chosen and not the order within the font
list.

The time taken to load will be dependent on the i/o system at the time
along with the number of fonts involved.  On any particular device it
may end up being only a handful of milliseconds but I don't think
history has shown that's *always* the case.  We should do simply
append the newly installed fonts on to the existing global font list.
Attachment #633830 - Flags: review?(jdaggett) → review-
I still doubt the added complexity is worthwhile. While the usual scenario we're envisioning here is that these added fonts will be entirely new to the device, probably just a single font per script, we can't guarantee that will always be the case. It's possible a newly-added font might belong to an existing family, or even be a replacement for an existing face.

So for each new font, we'd need to check whether it belongs to a family that's already in the font list, and if so, insert it appropriately there - potentially converting a "simple" family to an "extended" one in the process. (Unlikely? Yes, but if we're going to do in-place munging of the font list, we need to handle all eventualities.)

And then we'll probably end up wanting a separate notification to handle the removal of fonts, so that the front-end can offer users a way to disable or delete downloaded fonts.

IMO, the "nuclear" update-font-list notification is simpler, safer, more general, and likely to be more than adequate. I think we should defer more complex solutions until we have clear evidence of need, such as telemetry showing that users are actually suffering from significant jank due to font-list reconstruction.
FYI, I've just created a small add-on that places a Khmer script font into the user profile directory. It's available here: https://addons.mozilla.org/en-US/firefox/addon/khmer-fonts-package/

I've tested it using Aurora v19, and it works like a charm. This is a real good news, it effectively means that Android users (v2.3 -> v4.2) now have an easy way to read Khmer on their phones without need for rooting his/her device. 

Few questions/comments:
- Upon placing a font into the user profile directory, Firefox will need to be restarted for that font, right? If so, people developing font packages should be made aware of this and inform users via an alert box.
- If there's a way to trigger a font list refresh to avoid a restart, that'd be nice. (I.e. after add-on copies fonts into user profile, it calls a function that forces firefox to refresh its fonts list)
- I am wondering whether things wouldn't be cleaner by having Firefox compile fonts located in a subfolder of user profile (instead of root directory). maybe <user profile>/fonts ?
Also, while developing this add-on, it occurred to me that I was wondering whether it might be good to create a new add-on type (<em:type>) value for font packages. The creation of such package could be greatly eased by this. A simple install.rdf, and a bunch of fonts. This would allow firefox to be the one in charge of placing fonts where it wants, as well as refreshing font list, making it possible to add fonts without a restart.
(In reply to Mathieu Pellerin from comment #64)
> Few questions/comments:
> - Upon placing a font into the user profile directory, Firefox will need to
> be restarted for that font, right? If so, people developing font packages
> should be made aware of this and inform users via an alert box.

Right - we don't currently have a way to inform the browser there's a new font available, short of quitting and restarting.

> - If there's a way to trigger a font list refresh to avoid a restart, that'd
> be nice. (I.e. after add-on copies fonts into user profile, it calls a
> function that forces firefox to refresh its fonts list)

The "part 3" patch here was intended to provide a notification mechanism that would trigger a refresh, but hasn't been accepted at this point. (I don't actually know about firing such a notification from an add-on, but I expect it's possible...?)

> - I am wondering whether things wouldn't be cleaner by having Firefox
> compile fonts located in a subfolder of user profile (instead of root
> directory). maybe <user profile>/fonts ?

Yes, on the whole I think that would be preferable. And the sooner we make that change, the better - as it'll mean anyone making a font add-on (like yours) will need to adjust it accordingly.
Depends on: 822048
Comment on attachment 669073 [details] [diff] [review]
pt 1 - send a notification of any scripts for which font coverage is lacking.

jdaggett: review ping?
Good news, everyone!

It turns out there is already a way to trigger an update to the fonts list via script. It's done via the nsIFontEnumerator. For e.g.:


   var fonts = Cc["@mozilla.org/gfx/fontenumerator;1"].getService(Components.interfaces.nsIFontEnumerator);
   fonts.updateFontList();

AMO reviewer Nils Maier pointed this out, kudos to him.

The Khmer fonts package (https://addons.mozilla.org/en-US/developers/addon/khmer-fonts-package/) has been updated to 1.0 and make use of the above function. That should be a pretty good resource for people wanting to come up with their own font(s) add-on.
(In reply to Mathieu Pellerin from comment #68)
> It turns out there is already a way to trigger an update to the fonts list
> via script. It's done via the nsIFontEnumerator. 

Cool - so there is! That means we don't really need Part 3 here; any future front-end code (or add-on) that installs extra fonts can simply use that method.
Attachment #633830 - Attachment is obsolete: true
Attachment #629103 - Attachment is obsolete: true
Updated to current trunk; re-pinging for review.
Attachment #699680 - Flags: review?(jdaggett)
Jonathan, out of interest for this feature, what are you aiming at with this notification bit? I thought it was rendered useless due to the possibility to trigger an update to font list via add-on.
(In reply to Mathieu Pellerin from comment #71)
> Jonathan, out of interest for this feature, what are you aiming at with this
> notification bit? I thought it was rendered useless due to the possibility
> to trigger an update to font list via add-on.

No, this is a separate issue. The idea here is for the core layout code to send a notification about any scripts that occur in the text being rendered, but for which fonts are lacking. This makes it possible for the browser front-end to detect this and take appropriate action - such as automatically downloading additional fonts. (See bug 648548.) After fetching such fonts, the front-end would then need to trigger a font-list update, which as you pointed out can already be done via the font-enumerator interface so we don't need to add a notification for use in that direction.
Attachment #669073 - Attachment is obsolete: true
Attachment #669073 - Flags: review?(jdaggett)
(In reply to Jonathan Kew (:jfkthame) from comment #70)
> Created attachment 699680 [details] [diff] [review]
> pt 1 - send a notification of any scripts for which font coverage is lacking.
> 
> Updated to current trunk; re-pinging for review.

Review ping...?
(In reply to Jonathan Kew (:jfkthame) from comment #73)
> (In reply to Jonathan Kew (:jfkthame) from comment #70)
> > Created attachment 699680 [details] [diff] [review]
> > pt 1 - send a notification of any scripts for which font coverage is lacking.
> > 
> > Updated to current trunk; re-pinging for review.
> 
> Review ping...?

<re-ping>
Comment on attachment 699680 [details] [diff] [review]
pt 1 - send a notification of any scripts for which font coverage is lacking.

Looks fine overall but I'm wondering if this needs to be effectively
"always on". Depending upon the platform, there won't be any observers
in many situations so there's no point in passing in gfxMissingFontRecorder objects for text run creation.

+void
+gfxMissingFontRecorder::Flush()
+{
+    nsAutoString fontNeeded;
+    for (uint32_t i = 0; i < kNumScriptBitsWords; ++i) {
+        if (!mMissingFonts[i]) {
+            continue;
+        }
+        for (uint32_t j = 0; j < 32; ++j) {
+            if (!(mMissingFonts[i] & (1 << j))) {
+                continue;
+            }
+            if (!fontNeeded.IsEmpty()) {
+                fontNeeded.Append(PRUnichar(','));
+            }
+            uint32_t tag = GetScriptTagForCode(i * 32 + j);
+            fontNeeded.Append(PRUnichar(tag >> 24));
+            fontNeeded.Append(PRUnichar((tag >> 16) & 0xff));
+            fontNeeded.Append(PRUnichar((tag >> 8) & 0xff));
+            fontNeeded.Append(PRUnichar(tag & 0xff));
+        }
+        mMissingFonts[i] = 0;
+    }
+    if (!fontNeeded.IsEmpty()) {
+        nsCOMPtr<nsIObserverService> service = GetObserverService();
+        service->NotifyObservers(nullptr, "font-needed", fontNeeded.get());
+    }
+}

This code is putting all the script codes into a single long string
and passing it into the someData parameter of NotifyObservers.  The
observers will need to iterate over the strings.  Better to set up a
proper array of strings and pass it in via the aSubject parameter.

+    ~gfxMissingFontRecorder()
+    {
+        Flush();
+    }

I really don't like this, this calls observers at teardown.  Why is
this needed if the nsPresContext::NotifyMissingFonts is already
calling Flush()?  I think the flush operation should be explicit and
not done in the destructor.

>  nsTransformedTextRun*
>  nsTransformingTextRunFactory::MakeTextRun(const PRUnichar* aString, uint32_t aLength,
>                                            const gfxTextRunFactory::Parameters* aParams,
>                                            gfxFontGroup* aFontGroup, uint32_t aFlags,
> -                                          nsStyleContext** aStyles, bool aOwnsFactory)
> +                                          nsStyleContext** aStyles,
> +                                          bool aOwnsFactory)
>  {
>    return nsTransformedTextRun::Create(aParams, this, aFontGroup,
>                                        aString, aLength, aFlags, aStyles, aOwnsFactory);
>  }
>  
>  nsTransformedTextRun*
>  nsTransformingTextRunFactory::MakeTextRun(const uint8_t* aString, uint32_t aLength,
>                                            const gfxTextRunFactory::Parameters* aParams,
>                                            gfxFontGroup* aFontGroup, uint32_t aFlags,
> -                                          nsStyleContext** aStyles, bool aOwnsFactory)
> +                                          nsStyleContext** aStyles,
> +                                          bool aOwnsFactory)

If you're fixing line length, please fix it for all parameters (or
simply omit these sorts of edits from the patch).
Attachment #699680 - Flags: review?(jdaggett) → review-
Assignee: blassey.bugs → nobody
Whiteboard: [testday-2011-06-24][leave open] → [testday-2011-06-24][leave open][mentor=blassey]
Comment on attachment 631153 [details] [diff] [review]
part 2 - load fonts from the current profile as well as the android system fonts

So IIUC, this part has already landed.
Attachment #631153 - Flags: checkin+
Comment on attachment 669071 [details] [diff] [review]
pt 0 - remove obsolete MakeTextRun function from nsCanvasRenderingContext2D.

nsCanvasRenderingContext2D::MakeTextRun no longer exists so this part can be ignored.
Attachment #669071 - Attachment is obsolete: true
Refreshing Jonathan's patch.

This is not tested.
Attachment #699680 - Attachment is obsolete: true
Attachment #8413159 - Flags: review?(jdaggett)
Attachment #8413159 - Attachment description: Part 1 - Part 1 - send a notification of any scripts for which font coverage is lacking. → Part 1 - send a notification of any scripts for which font coverage is lacking.
(In reply to John Daggett (:jtd) away 25 - 29 april from comment #75)
> Comment on attachment 699680 [details] [diff] [review]
> pt 1 - send a notification of any scripts for which font coverage is lacking.
> 
> Looks fine overall but I'm wondering if this needs to be effectively
> "always on". Depending upon the platform, there won't be any observers
> in many situations so there's no point in passing in gfxMissingFontRecorder
> objects for text run creation.

So you mean doing an #ifdef?

I guess this will be useful for all mobile platforms where the set of pre-installed fonts is limited.
Also, this is needed for Linux (bug 467729). I wonder if other platforms are likely to implement a similar mechanism (or already have one).

> This code is putting all the script codes into a single long string
> and passing it into the someData parameter of NotifyObservers.  The
> observers will need to iterate over the strings.  Better to set up a
> proper array of strings and pass it in via the aSubject parameter.

done.

> +    ~gfxMissingFontRecorder()
> +    {
> +        Flush();
> +    }
> 
> I really don't like this, this calls observers at teardown.  Why is
> this needed if the nsPresContext::NotifyMissingFonts is already
> calling Flush()?  I think the flush operation should be explicit and
> not done in the destructor.

Jonathan might give a better answer, but my understanding is that Flush() is not explicitly called for missing font object of CanvasRenderingContext2D and nsRenderingContext (not sure when it should be done otherwise). Flush() clears the mMissingFonts member and only sends the notification if the list of missing font is nonempty. So I believe the final Flush() call will not send the notification if it was done by nsPresContext::NotifyMissingFonts before.
(In reply to Frédéric Wang (:fredw) |away 27/04 to 06/05 from comment #79)
> (In reply to John Daggett (:jtd) away 25 - 29 april from comment #75)
> > Comment on attachment 699680 [details] [diff] [review]
> > pt 1 - send a notification of any scripts for which font coverage is lacking.
> > 
> > Looks fine overall but I'm wondering if this needs to be effectively
> > "always on". Depending upon the platform, there won't be any observers
> > in many situations so there's no point in passing in gfxMissingFontRecorder
> > objects for text run creation.
> 
> So you mean doing an #ifdef?
> 
> I guess this will be useful for all mobile platforms where the set of
> pre-installed fonts is limited.
> Also, this is needed for Linux (bug 467729). I wonder if other platforms are
> likely to implement a similar mechanism (or already have one).

Thinking further, I think it would be useful for Desktop platforms to have a Firefox add-on that listens for missing fonts and then displays a warning, points the users to font installation page or even calls packagekit (this would be possible with some XPCOM interface like the one added in the patch of bug 467729). I'd like to have that for MathML fonts, but that's certainly useful for other languages. So in my opinion, it is worth having it "always on".
Attached file font-needed-observer.xpi (obsolete) —
This is a restartless add-on to observe the font-needed notifications and print the script array.
To help with bug 930504, I'd like gfxMissingFontRecorder::Flush to return 'math' for the "Mathematical Alphanumeric Symbols", as defined in the Open Font Format script tags. However, I'm a bit confused by bug 953385 comment 15. Would that make sense to make GetScriptTagForCode recognize the 'math' tag?
Flags: needinfo?(mozilla)
Flags: needinfo?(khaledhosny)
Flags: needinfo?(jfkthame)
Flags: needinfo?(jdaggett)
Assignee: nobody → fred.wang
Blocks: 1001570
(In reply to Frédéric Wang (:fredw) |away 27/04 to 06/05 from comment #79)
> (In reply to John Daggett (:jtd) away 25 - 29 april from comment #75)
> > Comment on attachment 699680 [details] [diff] [review]
> > pt 1 - send a notification of any scripts for which font coverage is lacking.

> > This code is putting all the script codes into a single long string
> > and passing it into the someData parameter of NotifyObservers.  The
> > observers will need to iterate over the strings.  Better to set up a
> > proper array of strings and pass it in via the aSubject parameter.
> 
> done.
> 

Actually, I don't think this is a good idea: ISTM that the work of creating a separate nsISupportsString for each script code will adding more overhead to the gfxMissingFontRecorder's Flush() method, which is a potential performance hit on every reflow when any missing glyphs are encountered. Better to keep this as lightweight as possible. It's not hard for the front-end observer code to split the string on commas; and if desired, we could delegate all the work of processing the notifications to a background thread so that it happens asynchronously, rather than blocking the main thread.

> > +    ~gfxMissingFontRecorder()
> > +    {
> > +        Flush();
> > +    }
> > 
> > I really don't like this, this calls observers at teardown.  Why is
> > this needed if the nsPresContext::NotifyMissingFonts is already
> > calling Flush()?  I think the flush operation should be explicit and
> > not done in the destructor.
> 
> Jonathan might give a better answer, but my understanding is that Flush() is
> not explicitly called for missing font object of CanvasRenderingContext2D
> and nsRenderingContext (not sure when it should be done otherwise). Flush()
> clears the mMissingFonts member and only sends the notification if the list
> of missing font is nonempty. So I believe the final Flush() call will not
> send the notification if it was done by nsPresContext::NotifyMissingFonts
> before.

This sounds right to me. However, if you'd prefer, we can explicitly call mMissingFonts.Flush() from the context destructor instead; maybe this makes things clearer?

In the case of CanvasBidiProcessor, this is explicitly designed to be a stack-based class, so doing finalization/cleanup in the destructor makes perfect sense.

For nsRenderingContext, it's not clear to me whether it is always used in such a way that nsPresContext::NotifyMissingFonts() will have been called; this certainly should account for the majority of cases, but in the event that there are any exceptions, calling Flush() from the destructor will ensure that we don't miss anything.

In that case, the gfxMissingFonts destructor can instead assert that the bitset should be empty by the time the missing fonts recorder is actually deleted.
(In reply to Frédéric Wang (:fredw) |away 27/04 to 06/05 from comment #82)
> To help with bug 930504, I'd like gfxMissingFontRecorder::Flush to return
> 'math' for the "Mathematical Alphanumeric Symbols", as defined in the Open
> Font Format script tags. However, I'm a bit confused by bug 953385 comment
> 15. Would that make sense to make GetScriptTagForCode recognize the 'math'
> tag?

For MathML purposes, I think it would make sense to send a missing-font notification from a higher level in the MathML code, based on the availability of a font that we recognize as having the appropriate MATH support. So this wouldn't necessarily depend on per-glyph font matching.
Flags: needinfo?(jfkthame)
(In reply to Jonathan Kew (:jfkthame) from comment #84)
> For MathML purposes, I think it would make sense to send a missing-font
> notification from a higher level in the MathML code, based on the
> availability of a font that we recognize as having the appropriate MATH
> support. So this wouldn't necessarily depend on per-glyph font matching.

For MathML, there are two levels: one for missing stretchy font (and more generally MATH font) and one for missing Mathematical Alphanumeric Symbols. For bug 930504, we only need the per-glyph font matching to prevent e.g. italic variables to display as missing boxes. However, we can probably assume that all MATH fonts have Mathematical Alphanumeric Symbols so perhaps only the first level is needed.

> Actually, I don't think this is a good idea: ISTM that the work of creating a separate nsISupportsString for each script code will adding more overhead to the gfxMissingFontRecorder's Flush() method, which is a potential performance hit on every reflow when any missing glyphs are encountered. Better to keep this as lightweight as possible. It's not hard for the front-end observer code to split the string on commas; and if desired, we could delegate all the work of processing the notifications to a background thread so that it happens asynchronously, rather than blocking the main thread.

Yes, that makes sense to me. Also while writing the test add-on I realized that handling the nsIArray was actually harder for the front-end too.
Updated as per discussion above.
Attachment #8413184 - Flags: review?(jdaggett)
Assignee: fred.wang → jfkthame
Flags: needinfo?(khaledhosny)
Attached file font-needed-observer.xpi (obsolete) —
Updated add-on that expects a string for the list of missing scripts.
Attachment #8413159 - Attachment is obsolete: true
Attachment #8413170 - Attachment is obsolete: true
Attachment #8413159 - Flags: review?(jdaggett)
(In reply to Jonathan Kew (:jfkthame) from comment #83)
> For nsRenderingContext, it's not clear to me whether it is always used in
> such a way that nsPresContext::NotifyMissingFonts() will have been called;
> this certainly should account for the majority of cases, but in the event
> that there are any exceptions, calling Flush() from the destructor will
> ensure that we don't miss anything.
> 
> In that case, the gfxMissingFonts destructor can instead assert that the
> bitset should be empty by the time the missing fonts recorder is actually
> deleted.

I get the new ASSERTION quite frequently. Adding Flush() in the nsRenderingContext destructor seems to solve that, though.

BTW, in addition to Wikipedia, one can test missing fonts on the xml-entity-names spec:

http://www.w3.org/TR/xml-entity-names/1EE.html

(I believe only Khaled's amiri/XITS fonts cover the  characters on this page)
(In reply to Frédéric Wang (:fredw) |away 27/04 to 06/05 from comment #88)
> (In reply to Jonathan Kew (:jfkthame) from comment #83)
> > For nsRenderingContext, it's not clear to me whether it is always used in
> > such a way that nsPresContext::NotifyMissingFonts() will have been called;
> > this certainly should account for the majority of cases, but in the event
> > that there are any exceptions, calling Flush() from the destructor will
> > ensure that we don't miss anything.
> > 
> > In that case, the gfxMissingFonts destructor can instead assert that the
> > bitset should be empty by the time the missing fonts recorder is actually
> > deleted.
> 
> I get the new ASSERTION quite frequently. Adding Flush() in the
> nsRenderingContext destructor seems to solve that, though.

Interesting; maybe we'll need to do that, although first I'd like to look at where it's happening to see if there's somewhere else we should add a Flush() call.
Flags: needinfo?(mozilla)
Flags: needinfo?(jdaggett)
Blocks: 467729
Comment on attachment 8413373 [details] [diff] [review]
Part 3 - send a notification for missing math fonts

Review of attachment 8413373 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +4565,5 @@
> +    if (mMissingMathFont) {
> +        if (!fontNeeded.IsEmpty()) {
> +            fontNeeded.Append(PRUnichar(','));
> +        }
> +        fontNeeded.AppendLiteral("math");

I wonder if it would make sense to use "Zmth" here, which is used for "Mathematical notation" (ISO 15924 Code Lists) albeit is not a script in Unicode.

In particular, I guess that would simplify the mapping with PackageKit & fontconfig (bug 467729), since all the scripts will be handled the same way.
Comment on attachment 8413184 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking.

>+void
>+gfxMissingFontRecorder::Flush()
...
>+        service->NotifyObservers(nullptr, "font-needed", fontNeeded.get());
...
> void
>+nsPresContext::NotifyMissingFonts()
>+{
>+  mMissingFonts.Flush();
I just discovered this bug, and I was just wondering whether it would be possible to arrange to pass a DOM window or document through (presumably as an nsISupports, which is all NotifyObservers needs), so that the UI can correlate a missing font with a browser tab?
(In reply to Frédéric Wang (:fredw) |away 27/04 to 06/05 from comment #91)
> Comment on attachment 8413373 [details] [diff] [review]
> Part 3 - send a notification for missing math fonts
> 
> Review of attachment 8413373 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/thebes/gfxFont.cpp
> @@ +4565,5 @@
> > +    if (mMissingMathFont) {
> > +        if (!fontNeeded.IsEmpty()) {
> > +            fontNeeded.Append(PRUnichar(','));
> > +        }
> > +        fontNeeded.AppendLiteral("math");
> 
> I wonder if it would make sense to use "Zmth" here, which is used for
> "Mathematical notation" (ISO 15924 Code Lists) albeit is not a script in
> Unicode.

I'd support the idea of adding MOZ_SCRIPT_MATHEMATICAL_NOTATION as a script constant in Gecko, which would map to the 'Zmth' tag via GetScriptTagForCode(). We could easily add this to the genUnicodePropertyData.pl tool that maintains the list of script codes and the mapping to tags. Think I'll file a separate bug to do that...

Then you wouldn't need special methods to handle the math notification at all; you could just call gfxMissingFontRecorder::RecordScript(MOZ_SCRIPT_MATHEMATICAL_NOTATION) and let the existing script bitset take care of things.
Depends on: 1002425
just updating the patch per previous comments...
Attachment #8413373 - Attachment is obsolete: true
Attachment #8413843 - Flags: review?(karlt)
Attachment #8413843 - Flags: review?(jfkthame)
Comment on attachment 8413843 [details] [diff] [review]
Part 3 - send a notification for missing math fonts

Review of attachment 8413843 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +5348,5 @@
> +                        // basic mathvariants, so we record the fact that a math
> +                        // font is missing. Note that italic h is handled below.
> +                        if (aMFR && 0x1D400 <= ucs4ch && ucs4ch <= 0x1D49B) {
> +                            aMFR->RecordScript(MOZ_SCRIPT_MATHEMATICAL_NOTATION);
> +                        }

Why not include the other math alphabets (up to 1D7FF) here as well?
(In reply to Jonathan Kew (:jfkthame) from comment #95)
> Why not include the other math alphabets (up to 1D7FF) here as well?

I'm not sure how much we want to do among the characters from http://www.w3.org/TR/xml-entity-names/#blocks... The italic/bold/bold-italic are almost all at the same place (with the exception of the italic h) so it was not too much a burden to check them in the gfx code. Currently these mathvariants work without specific MATH fonts (using CSS style) so if we move to MATH fonts, I think we still want to be sure we support them...
I experimented with this patch today and I think the functionality really needs to be rethought, along the lines of what I wrote in comment 50.  There needs to be more heuristics here, not simply recording script codes for missing glyphs.

I really think any sort of notification should only occur *once* per page load (not on every reflow as with the current patch) and the code handling the notification needs to be able to distinguish incidental usage from primary usage.  That's very difficult with just a list of script codes.

Couple missing fonts, don't really need them for the page
http://www.wikipedia.org/

Page unreadable without the right fonts:
http://dv.wikipedia.org/wiki/%DE%89%DE%A6%DE%87%DE%A8_%DE%9E%DE%A6%DE%8A%DE%B0%DE%99%DE%A7
http://bug.wikipedia.org/wiki/Mappadec%C3%A9%C5%8B

For some scripts with a small set of defined codepoints, missing a single codepoint probably indicates the need for a font that supports that script. But for scripts with larger ranges it indicates the use of an unusual character that may not even be supported by the font that will be downloaded, such as a relatively rare Chinese character in the Han range.  

PUA codepoints need to be ignored completely. For example, downloadable icon fonts often involve the use of PUA codepoints. The current patch notifies with the script code 'Zzzz', signifying "script unknown".

Pages which use a downloadable font for handling an unusual script need to be distinguished from pages that don't.  Currently, the initial fallback pass will cause a missing font notification to occur, even though the page provides a font.

I think the MathML use case could be handled via a smaller set of patches, since the fonts needed for MathML need to be ones with specific characteristics, i.e. ones that support the 'MATH' table.  The heuristics for handling the MathML case is much simpler, since it's very clear when a page contains MathML content.  That's different from the general font fallback case.

>    // text performance metrics
>    nsAutoPtr<gfxTextPerfMetrics>   mTextPerf;
>  
> +  gfxMissingFontRecorder mMissingFonts;

Given that this is perf-sensitive code, I think this should be behind a pointer and only enabled when explicitly used.  Right now the patch code is always recording scripts that are missing fonts within performance sensitive code.
Additionally, I think there's a resource management issue with storing fonts in the profile.  How does a user manage those and/or know when they are taking up a significant amount of space? Do we intend to just support a small set of scripts or a wide range of scripts?  I think whatever usage scenario we consider needs to define the desired behavior on a page like www.wikipedia.com, which contains a wide variety of scripts, many of which a single given user may have no interest in reading.
(In reply to John Daggett (:jtd) from comment #98)
> I experimented with this patch today and I think the functionality really
> needs to be rethought, along the lines of what I wrote in comment 50.  There
> needs to be more heuristics here, not simply recording script codes for
> missing glyphs.
> 
> I really think any sort of notification should only occur *once* per page
> load (not on every reflow as with the current patch)

Seems like that wouldn't work very well in the case of a page that is dynamically updated - filling in new results as the user enters a search phrase, or expanding initially-hidden comments on an article, etc - and this causes content in a previously-unused language to appear without a new page load.

> and the code handling
> the notification needs to be able to distinguish incidental usage from
> primary usage.  That's very difficult with just a list of script codes.

I'm having a hard time imagining a way to distinguish "incidental" from "primary" usage that would lead to good behavior. A font/script can be crucial to the reader of the page even if it's used only for a small minority of the text runs. To take an example, if I do a simple search for "hindi news" on my Android device (running 2.3, with no Indic fonts), several of the first results that appear include Hindi text, which gets displayed as boxes. This is the sort of situation where downloading a suitable font would be a huge benefit - yet the Hindi content on the search results page is little enough that a heuristic based on percentage of characters or textruns would probably consider it "incidental".

If we tracked the exact collection of missing characters rather than just the affected scripts, then in theory we could probably do some kind of heuristics to guess how important a given script is, but ISTM that would create an unacceptable level of overhead - we'd need a hashset or similar and it could end up with many hundreds, even thousands, of entries. And analysing that data would also have a greater cost.

The approach based on tracking scripts is intended to provide a balance between the cost and complexity of tracking, and the desire for content to "just work" for users in the simplest possible way.

> PUA codepoints need to be ignored completely. For example, downloadable icon
> fonts often involve the use of PUA codepoints. The current patch notifies
> with the script code 'Zzzz', signifying "script unknown".

Yes, it'd make sense to ignore PUA characters.

> Pages which use a downloadable font for handling an unusual script need to
> be distinguished from pages that don't.  Currently, the initial fallback
> pass will cause a missing font notification to occur, even though the page
> provides a font.

Good point, we should avoid tracking missing fonts while there's a pending download.


> > +  gfxMissingFontRecorder mMissingFonts;
> 
> Given that this is perf-sensitive code, I think this should be behind a
> pointer and only enabled when explicitly used.  Right now the patch code is
> always recording scripts that are missing fonts within performance sensitive
> code.

This -only- happens when font fallback has failed to find a font that supports a given character, so it's not in the code path for normal text rendering; the (small) perf impact of recording script codes will only occur when we're rendering missing-glyph boxes. At that point, we've already failed to deliver an optimal user experience; and I doubt that the cost of recording will make a measurable difference even then.

But I suppose we could put this behind a pointer, and provide some kind of API for the front-end to enable or disable the feature - e.g. Fennec could have an option in Settings to determine whether it's used.

(In reply to John Daggett (:jtd) from comment #99)
> Additionally, I think there's a resource management issue with storing fonts
> in the profile.  How does a user manage those and/or know when they are
> taking up a significant amount of space? Do we intend to just support a
> small set of scripts or a wide range of scripts?  I think whatever usage
> scenario we consider needs to define the desired behavior on a page like
> www.wikipedia.com, which contains a wide variety of scripts, many of which a
> single given user may have no interest in reading.

This is about user interface, and managing the tradeoff between simplicity and detailed control. E.g. we could envisage a UI that lists all the scripts and lets the user check/uncheck them to indicate which ones they want supported; but do we want to burden Fennec with that level of UI complexity? I think these are questions for bug 648548 rather than here.
(In reply to neil@parkwaycc.co.uk from comment #92)
> Comment on attachment 8413184 [details] [diff] [review]
> Part 1 - send a notification of any scripts for which font coverage is
> lacking.
> 
> >+void
> >+gfxMissingFontRecorder::Flush()
> ...
> >+        service->NotifyObservers(nullptr, "font-needed", fontNeeded.get());
> ...
> > void
> >+nsPresContext::NotifyMissingFonts()
> >+{
> >+  mMissingFonts.Flush();
> I just discovered this bug, and I was just wondering whether it would be
> possible to arrange to pass a DOM window or document through (presumably as
> an nsISupports, which is all NotifyObservers needs), so that the UI can
> correlate a missing font with a browser tab?

Hmmmm. It looks like it would be easy to do that for notifications triggered from reflow in an nsPresContext, but it's not so obvious how we'd do it for the cases where the missing-font recorder is created and owned by a temporary nsRenderingContext. This happens when dealing with XUL textbox, for example, and I think in a few other cases. We could probably add plumbing to pass a relevant document through to such places, but I'm not entirely sure it's worth it?

Note that unlike fonts linked via @font-face, fonts downloaded via this mechanism would become available globally and persistently to all documents; they're not restricted to the document that initially triggered the request. And I'd expect the front end to coalesce multiple notifications - possibly coming from multiple documents - into a single prompt for the user. As such, I'm not sure trying to associate them with a specific document or browser tab is necessarily going to be needed or useful.
Updated patch to take account of several of the concerns, as per comment 100. In particular, PUA missing characters will no longer be tracked, and pending font downloads will cause us to ignore missing chars. Also, the whole missing-font tracking code is conditional on a preference; I've set it to true by default for now, but we can make it false by default on desktop platforms where the front-end does not actually observe the notifications; and the fennec UI could offer a switch to toggle it, if desired.
Attachment #8416107 - Flags: review?(jdaggett)
Attachment #8413184 - Attachment is obsolete: true
Attachment #8413184 - Flags: review?(jdaggett)
(In reply to Jonathan Kew (:jfkthame) from comment #100)
> (In reply to John Daggett (:jtd) from comment #98)
> > I experimented with this patch today and I think the functionality really
> > needs to be rethought, along the lines of what I wrote in comment 50.  There
> > needs to be more heuristics here, not simply recording script codes for
> > missing glyphs.
> > 
> > I really think any sort of notification should only occur *once* per page
> > load (not on every reflow as with the current patch)
> 
> Seems like that wouldn't work very well in the case of a page that is
> dynamically updated - filling in new results as the user enters a search
> phrase, or expanding initially-hidden comments on an article, etc - and this
> causes content in a previously-unused language to appear without a new page
> load.

The flip side is that the current patch is doing a *lot* of notifications.  After logging the notifications, loading the Wikipedia page below causes 10 notifications:

http://dv.wikipedia.org/wiki/%DE%89%DE%A6%DE%87%DE%A8_%DE%9E%DE%A6%DE%8A%DE%B0%DE%99%DE%A7

(textrun-missing-scripts) Thaa
(textrun-missing-scripts) Thaa
(textrun-missing-scripts) Thaa
(textrun-missing-scripts) Thaa
(textrun-missing-scripts) Syrc,Thaa,Bugi
(textrun-missing-scripts) Thaa
(textrun-missing-scripts) Syrc,Thaa,Bugi
(textrun-missing-scripts) Thaa
(textrun-missing-scripts) Thaa
(textrun-missing-scripts) Thaa

> > and the code handling
> > the notification needs to be able to distinguish incidental usage from
> > primary usage.  That's very difficult with just a list of script codes.
> 
> I'm having a hard time imagining a way to distinguish "incidental" from
> "primary" usage that would lead to good behavior. A font/script can be
> crucial to the reader of the page even if it's used only for a small
> minority of the text runs. To take an example, if I do a simple search for
> "hindi news" on my Android device (running 2.3, with no Indic fonts),
> several of the first results that appear include Hindi text, which gets
> displayed as boxes. This is the sort of situation where downloading a
> suitable font would be a huge benefit - yet the Hindi content on the search
> results page is little enough that a heuristic based on percentage of
> characters or textruns would probably consider it "incidental".

I agree it's not easy.  But for this feature to work, there needs to be a balance between noisy UI requests and requesting only "when needed". I don't think noisy UI requests are going to work.  There is a benefit to having a Hindi font in the situation above, the problem is distinguishing that one from situations where the user doesn't care at all about seeing Hindi text. There are all sorts of situations where random scripts appear on a page that a user doesn't care about.  When traveling, Google ads show up in the local locale so in Taiwan you'll see ads in Chinese and I imagine the same is true in India.  Many pages also provide UI options for multiple localizations -- as a user I really don't need to pull down fonts for all those localizations.

Unless the majority of the content on the page is in a script for which there's no font, I don't think it makes sense to pull down fonts for that page. I think a good model for this is Chrome's translate feature, it doesn't propose translating the page unless the content seems predominantly in another language. And it allows a "never" option to be selected so that requests to translate for that language don't occur again.

The textperf logging already gives us a hint that the Wikipedia page needs a font - for the 7215 characters shaped, 4187 required system font fallback.

NSPR_LOG_MODULES=textperf:5

(textperf-loaddone) 0x127543c00 time-ms:    4178 reflow: 14 chars: 7215 [http://dv.wikipedia.org/wiki/%DE%89%DE%A6%DE%87%DE%A8_%DE%9E%DE%A6%DE%8A%DE%B0%DE%99%DE%A7] content-textruns: 1525 chrome-textruns: 2 max-textrun-len: 288 word-cache-lookups: 1322 word-cache-hit-ratio: 0.567 word-cache-space: 0 word-cache-long: 0 pref-fallbacks: 231 system-fallbacks: 4187 textruns-const: 1036 textruns-destr: 467 cumulative-textruns-destr: 467

> If we tracked the exact collection of missing characters rather than just
> the affected scripts, then in theory we could probably do some kind of
> heuristics to guess how important a given script is, but ISTM that would
> create an unacceptable level of overhead - we'd need a hashset or similar
> and it could end up with many hundreds, even thousands, of entries. And
> analysing that data would also have a greater cost.
> 
> The approach based on tracking scripts is intended to provide a balance
> between the cost and complexity of tracking, and the desire for content to
> "just work" for users in the simplest possible way.

I wasn't suggesting tracking all missing codepoints. But to avoid noisy UI requests, you need better heuristics than "there were missing glyphs in these scripts".
With the latest patch I'm still getting assertions periodically.

Steps:

1. set default page as http://dv.wikipedia.org/wiki/%DE%89%DE%A6%DE%87%DE%A8_%DE%9E%DE%A6%DE%8A%DE%B0%DE%99%DE%A7
2. restart the browser
3. wait 10 minutes
4. quit

Result: assertion fires
(In reply to John Daggett (:jtd) from comment #103)

> I agree it's not easy.  But for this feature to work, there needs to be a
> balance between noisy UI requests and requesting only "when needed". I don't
> think noisy UI requests are going to work.

I suspect the number of notifications we send (e.g. from that wikipedia page) could be substantially reduced if we suppress sending them for XUL textboxes - many of those are coming from the text in the URL bar.

However, if you actually try this in conjunction with the proposed patch from bug 648548, you'll see that it does *not* result in "noisy UI requests". The front-end code that observes the notifications will coalesce them into a single doorhanger prompt.

>  There is a benefit to having a
> Hindi font in the situation above, the problem is distinguishing that one
> from situations where the user doesn't care at all about seeing Hindi text.
> There are all sorts of situations where random scripts appear on a page that
> a user doesn't care about.  When traveling, Google ads show up in the local
> locale so in Taiwan you'll see ads in Chinese and I imagine the same is true
> in India.  Many pages also provide UI options for multiple localizations --
> as a user I really don't need to pull down fonts for all those localizations.
> 
> Unless the majority of the content on the page is in a script for which
> there's no font, I don't think it makes sense to pull down fonts for that
> page. I think a good model for this is Chrome's translate feature, it
> doesn't propose translating the page unless the content seems predominantly
> in another language. And it allows a "never" option to be selected so that
> requests to translate for that language don't occur again.

And the patch from 648548 similarly remembers when the user has declined the font for a given script, and doesn't offer it again during the session. (But again, exactly how to manage that - once per session? once per day? what if the user declines and then changes his mind? - is a UX issue, separate from this bug.)

The gfx.missing_fonts.notify pref also enables the front-end to easily offer an option to turn the feature off altogether.

> But to avoid noisy UI
> requests, you need better heuristics than "there were missing glyphs in
> these scripts".

To repeat, the proposed patches here do *not* lead to noisy UI requests. Gecko sends notifications of what it encounters; the front-end collects these and presents one simple prompt to the user.

The prompt will only be repeated if, after dismissing it, the user then visits a page that requires *other* new script(s). It's not going to be constantly reappearing for every wikipedia page loaded, or whatever.


(In reply to John Daggett (:jtd) from comment #104)
> With the latest patch I'm still getting assertions periodically.
> 
> Steps:
> 
> 1. set default page as
> http://dv.wikipedia.org/wiki/
> %DE%89%DE%A6%DE%87%DE%A8_%DE%9E%DE%A6%DE%8A%DE%B0%DE%99%DE%A7
> 2. restart the browser
> 3. wait 10 minutes
> 4. quit
> 
> Result: assertion fires

OK, I think I know where that's coming from. Will fix.
As I said on the FontConfig mailing list, perhaps an option to improve the heuristics would be to rely on the "lang" attribute of the current document and check whether missing scripts correlate with that language. However, that won't in general since we don't always know the document nor its lang attribute.

In general, doing a more clever behavior seems a bit complicated because the desired notifications strongly depend on the context. The approach of keeping things simple at the low level and deferring the more advanced work to the front-end sounds good to me...

Maybe the "missing fonts notify" preference could have one bit for each script, so that for example one could ask to the backend to never raise notification for Chinese script when traveling but still receive other notifications for difference scripts. I think only four 32-bits words preference options will be enough to store the existing scripts. BTW, when working on bug 947654, I saw that nsPresContext does some stuff to cache the preference. I guess we want an optimization here too...

(In reply to John Daggett (:jtd) from comment #98)
> I think the MathML use case could be handled via a smaller set of patches,
> since the fonts needed for MathML need to be ones with specific
> characteristics, i.e. ones that support the 'MATH' table.  The heuristics
> for handling the MathML case is much simpler, since it's very clear when a
> page contains MathML content.  That's different from the general font
> fallback case.

At the moment, the "missing math font" is raised when a stretch fails and we don't have a MATH table ; or when we are missing some basic mathvariants that are necessary for bug 930504. I wonder if we want to be even stricter and always require a font with a MATH table (which in practice implies good unicode coverage for math) for MathML layout. That would make the detection simpler. However, I think I still want to rely on Jonathan's code, to share the code and to get consistent behavior with notifications ; so that the front-end code can handle it the same.

(In reply to Jonathan Kew (:jfkthame) from comment #105)
> However, if you actually try this in conjunction with the proposed patch
> from bug 648548, you'll see that it does *not* result in "noisy UI
> requests". The front-end code that observes the notifications will coalesce
> them into a single doorhanger prompt.

Is it always true? I see that they are grouped while the JSON file is loading, but it seems to me that they could still go in separate prompts? (perhaps it was the rationale for the "available-fonts-update" notification, so that we collect other "font-needed" notifications first?)
(In reply to Frédéric Wang (:fredw) |away 27/04 to 06/05 from comment #106)

> > However, if you actually try this in conjunction with the proposed patch
> > from bug 648548, you'll see that it does *not* result in "noisy UI
> > requests". The front-end code that observes the notifications will coalesce
> > them into a single doorhanger prompt.
> 
> Is it always true? I see that they are grouped while the JSON file is
> loading, but it seems to me that they could still go in separate prompts?

Possibly, although in experimenting with it I haven't seen any problems of spammy prompts - IIRC, what can happen if additional notifications arrive while the first prompt is being shown is that it gets automatically replaced by a new one with the extended script list. The doorhanger UX apparently handles this quite neatly; the user just sees additional content appear in it, while it's waiting for a response.

Anyhow, potentially refining this is an issue for bug 648548. Here at the Gecko level, we can't entirely solve the problem of multiple notifications, given that the content could be arriving gradually or changing dynamically; the front end must be the part responsible for deciding when and how to interact with the user.
Comment on attachment 8413843 [details] [diff] [review]
Part 3 - send a notification for missing math fonts

Review of attachment 8413843 [details] [diff] [review]:
-----------------------------------------------------------------

Removing review request. I'll wait to see how the notification for other scripts is finally implemented.

The change to nsMathMLChar::StretchEnumContext::TryParts has been moved to bug 1005657.
Attachment #8413843 - Flags: review?(karlt)
Attachment #8413843 - Flags: review?(jfkthame)
Cleaned up slightly, and AFAICT no longer hits a potential assertion when shutting down.
Attachment #8417058 - Flags: review?(jdaggett)
Attachment #8416107 - Attachment is obsolete: true
Attachment #8416107 - Flags: review?(jdaggett)
Excuse my ignorance; I haven't studied the patch / approach, but IMO font selection / fallback / etc should be per language, not per script.  Here's what I wrote in response to Frédéric's email on fontconfig list:

On 14-04-30 03:45 AM, Frédéric WANG wrote:
>
> I've recently worked on automatic font installation in Gecko-based
> applications using PackageKit [1]. The current patch to add notifications for
> missing glyphs in Gecko relies on script tags as defined in iso15924 [2] [3].

The problem with choosing fonts based on script tags is that it doesn't work.

Really.  Find me a font that supports script=Arab.  I don't think you can find
any font right now that supports **all** Unicode characters for Arabic
scripts.  If you did, it probably would be a junk font like GNU Unifont or
some other "quantity-over-quality" font that should be buried.

Fonts can only be meaningfully chosen per language tag, not script tag.

Fix your design.  You should resolve to a language tag given the script and
any hinted languages (HTML tags, etc).  Pango has code for that, and it's easy
to replicate.  I'd be happy to elaborate if you care.

behdad
In the places where we have a style context, I think we could do nsStyleContext::StyleFont()->mLanguage to get the current language provided by the HTML lang attribute. However, that won't necessarily be related to the missing glyphs.
(In reply to Behdad Esfahbod from comment #110)
> Excuse my ignorance; I haven't studied the patch / approach, but IMO font
> selection / fallback / etc should be per language, not per script.  Here's
> what I wrote in response to Frédéric's email on fontconfig list:
> 
> On 14-04-30 03:45 AM, Frédéric WANG wrote:
> >
> > I've recently worked on automatic font installation in Gecko-based
> > applications using PackageKit [1]. The current patch to add notifications for
> > missing glyphs in Gecko relies on script tags as defined in iso15924 [2] [3].
> 
> The problem with choosing fonts based on script tags is that it doesn't work.
> 
> Really.  Find me a font that supports script=Arab.  I don't think you can
> find
> any font right now that supports **all** Unicode characters for Arabic
> scripts.  If you did, it probably would be a junk font like GNU Unifont or
> some other "quantity-over-quality" font that should be buried.
> 
> Fonts can only be meaningfully chosen per language tag, not script tag.
> 
> Fix your design.  You should resolve to a language tag given the script and
> any hinted languages (HTML tags, etc).  Pango has code for that, and it's
> easy
> to replicate.  I'd be happy to elaborate if you care.

Font preferences should be configured (or at least configurable) per language, yes. And language is an input to the font selection process.

But here we're at the final fallback level, where the question comes down to whether the browser has access to *any* font that supports the characters, or is going to render missing-glyph hexboxes. At this level, what we want is one reasonable default font that can render the Unicode codepoints present - regardless of any language "hints" - and if the device doesn't have one, we want to try and download one.

(Choosing fonts based on script *is* the correct approach, in general. As a user, I want to specify a font for Latin script, not a font for English, one for French, one for German, one for Polish, one for Vietnamese, one for Ewe, etc., etc.

Overriding the script default for specific languages - e.g. Urdu or Sindhi or Fulfulde may prefer a different style than the default Arabic-script font - is a valuable refinement when it comes to specifying font preferences, but this bug is about providing a single "last-resort" font that can render each script, rather than ending up with hexboxes.)
Mentor: blassey.bugs
Whiteboard: [testday-2011-06-24][leave open][mentor=blassey] → [testday-2011-06-24][leave open]
Any update on the review, here?
Flags: needinfo?(jdaggett)
Comment on attachment 8417058 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking.

Review of attachment 8417058 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsPresShell.cpp
@@ +8265,5 @@
>      mCaret->UpdateCaretPosition();
>    }
>  
> +  mPresContext->NotifyMissingFonts();
> +

As detailed in comment 98, this is much too fine-grained an approach. Even with event coalescing and careful mechanisms for handling notifications async, this is simply unnecessary for the required functionality. We should not be notifying observers about missing fonts per reflow.
Attachment #8417058 - Flags: review?(jdaggett) → review-
Flags: needinfo?(jdaggett)
filter on [mass-p5]
Priority: -- → P5
(In reply to John Daggett (:jtd) from comment #114)
> As detailed in comment 98, this is much too fine-grained an approach. Even
> with event coalescing and careful mechanisms for handling notifications
> async, this is simply unnecessary for the required functionality. We should
> not be notifying observers about missing fonts per reflow.

Doing "per page load" instead of "per reflow" does not seem to address the case mentioned in comment 100 of dynamic pages. There are many pages using ASCIIMathML.js or MathJax containing LaTeX or AsciiMath source code that is only converted into MathML via Javascript. How would we handle this case?
Flags: needinfo?(jdaggett)
(In reply to Frédéric Wang (:fredw) from comment #116)
> (In reply to John Daggett (:jtd) from comment #114)
> > As detailed in comment 98, this is much too fine-grained an approach. Even
> > with event coalescing and careful mechanisms for handling notifications
> > async, this is simply unnecessary for the required functionality. We should
> > not be notifying observers about missing fonts per reflow.
> 
> Doing "per page load" instead of "per reflow" does not seem to address the
> case mentioned in comment 100 of dynamic pages. There are many pages using
> ASCIIMathML.js or MathJax containing LaTeX or AsciiMath source code that is
> only converted into MathML via Javascript. How would we handle this case?

I think my point is that the mechanism should be coarse-grained, not something that calls observers per reflow. Basically, during a browser run I don't see the need to call an observer more than once for a given script which lacks a font.
Flags: needinfo?(jdaggett)
(In reply to Jonathan Kew (:jfkthame) from comment #112)
> (In reply to Behdad Esfahbod from comment #110)
> > Excuse my ignorance; I haven't studied the patch / approach, but IMO font
> > selection / fallback / etc should be per language, not per script.  Here's
> > what I wrote in response to Frédéric's email on fontconfig list:
> > 
> > On 14-04-30 03:45 AM, Frédéric WANG wrote:
> > >
> > > I've recently worked on automatic font installation in Gecko-based
> > > applications using PackageKit [1]. The current patch to add notifications for
> > > missing glyphs in Gecko relies on script tags as defined in iso15924 [2] [3].
> > 
> > The problem with choosing fonts based on script tags is that it doesn't work.
> > 
> > Really.  Find me a font that supports script=Arab.  I don't think you can
> > find
> > any font right now that supports **all** Unicode characters for Arabic
> > scripts.  If you did, it probably would be a junk font like GNU Unifont or
> > some other "quantity-over-quality" font that should be buried.
> > 
> > Fonts can only be meaningfully chosen per language tag, not script tag.
> > 
> > Fix your design.  You should resolve to a language tag given the script and
> > any hinted languages (HTML tags, etc).  Pango has code for that, and it's
> > easy
> > to replicate.  I'd be happy to elaborate if you care.
> 
> Font preferences should be configured (or at least configurable) per
> language, yes. And language is an input to the font selection process.
> 
> But here we're at the final fallback level, where the question comes down to
> whether the browser has access to *any* font that supports the characters,
> or is going to render missing-glyph hexboxes. At this level, what we want is
> one reasonable default font that can render the Unicode codepoints present -
> regardless of any language "hints" - and if the device doesn't have one, we
> want to try and download one.
> 
> (Choosing fonts based on script *is* the correct approach, in general. As a
> user, I want to specify a font for Latin script, not a font for English, one
> for French, one for German, one for Polish, one for Vietnamese, one for Ewe,
> etc., etc.
> 
> Overriding the script default for specific languages - e.g. Urdu or Sindhi
> or Fulfulde may prefer a different style than the default Arabic-script font
> - is a valuable refinement when it comes to specifying font preferences, but
> this bug is about providing a single "last-resort" font that can render each
> script, rather than ending up with hexboxes.)

Fair enough.  Feel free to hook into Noto by default.  We can provide metadata for it but the mapping from Unicode scripts to files are rather trivial.
(In reply to John Daggett (:jtd) from comment #117)
> (In reply to Frédéric Wang (:fredw) from comment #116)
> > (In reply to John Daggett (:jtd) from comment #114)
> > > As detailed in comment 98, this is much too fine-grained an approach. Even
> > > with event coalescing and careful mechanisms for handling notifications
> > > async, this is simply unnecessary for the required functionality. We should
> > > not be notifying observers about missing fonts per reflow.
> > 
> > Doing "per page load" instead of "per reflow" does not seem to address the
> > case mentioned in comment 100 of dynamic pages. There are many pages using
> > ASCIIMathML.js or MathJax containing LaTeX or AsciiMath source code that is
> > only converted into MathML via Javascript. How would we handle this case?
> 
> I think my point is that the mechanism should be coarse-grained, not
> something that calls observers per reflow. Basically, during a browser run I
> don't see the need to call an observer more than once for a given script
> which lacks a font.

So what about having a

static uint32_t gNotifiedMissingFonts[kNumScriptBitsWords];

in gfxFont.cpp that will keep track of the scripts that have already been notified at least once during a browser run so that we can ignore them in gfxMissingFontRecorder::Flush?

This will ensure that we never "call an observer more than once for a given script which lacks a font" and will certainly reduce the number of calls to NotifyObservers. However, if missingScript1 is used in the initial page and missingScript2 is used after some later DOM insertion we will still notify observers twice, which I believe is the right thing to do.

Actually, this is even better than a per page load notification, since it's also very common that people open several tabs with a same missing script.
Flags: needinfo?(jfkthame)
Flags: needinfo?(jdaggett)
(In reply to Frédéric Wang (:fredw) from comment #119)
> So what about having a
> 
> static uint32_t gNotifiedMissingFonts[kNumScriptBitsWords];
> 
> in gfxFont.cpp that will keep track of the scripts that have already been
> notified at least once during a browser run so that we can ignore them in
> gfxMissingFontRecorder::Flush?

Yeah, that seems reasonable.
Flags: needinfo?(jdaggett)
Attachment #8413843 - Attachment is obsolete: true
Attachment #8417058 - Attachment is obsolete: true
Attachment #8512933 - Attachment description: Bug 619521: Part 1 - send a notification of any scripts for which font coverage is lacking. → Part 1 - send a notification of any scripts for which font coverage is lacking.
Attachment #8512934 - Attachment is obsolete: true
Comment on attachment 8512933 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking.

I tried to unbitrot Jonathan's patch. Hopefully, I've not broken anything. From a quick test, the font-needed notification still seems to work and is no longer repeated for a given script.

Does such a patch look good?
Attachment #8512933 - Flags: feedback?(jdaggett)
Comment on attachment 8512933 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking.

At a quick glance, looks okay.
Attachment #8512933 - Flags: feedback?(jdaggett) → feedback+
There is a small conflict in part 1 and some tests are failing

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2be88ac56488

I'll check why today.
Component: General → Graphics: Text
Product: Firefox for Android → Core
OS: Android → All
Hardware: ARM → All
OK, I think the problem was at

aTextRun->SetMissingGlyph(aOffset + index, SURROGATE_TO_UCS4(ch,		SURROGATE_TO_UCS4(ch, aString[index + 1]) ...

where "aOffset" was added since Jonathan's last patch. The reftests pass on my machine.
Attachment #8512933 - Attachment is obsolete: true
Comment on attachment 8514916 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9fc7fba0e3f6
Attachment #8514916 - Flags: review?(jdaggett)
Attachment #8512985 - Flags: review?(karlt)
No longer blocks: 631159
No longer blocks: 788450
No longer blocks: 797662
No longer blocks: 1001570
No longer blocks: 653911
Flags: needinfo?(jfkthame)
Summary: need a way to provide fonts for scripts/languages not supported by the standard fonts on the device → Send font-needed notification for scripts/languages not supported by the default system fonts
Comment on attachment 8514916 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking

Review of attachment 8514916 [details] [diff] [review]:
-----------------------------------------------------------------

r+ but I think this should be off by default, since it's only useful when an observer exists.

::: modules/libpref/init/all.js
@@ +559,5 @@
>  #endif
>  
> +// Do we fire a notification about missing fonts, so the front-end can decide
> +// whether to try and do something about it (e.g. download additional fonts)?
> +pref("gfx.missing_fonts.notify", true);

I think this should be off by default, since some form of observer is required for this to be useful. An addon can enable this when necessary.
Attachment #8514916 - Flags: review?(jdaggett) → review+
(In reply to John Daggett (:jtd) from comment #129)
> > +// Do we fire a notification about missing fonts, so the front-end can decide
> > +// whether to try and do something about it (e.g. download additional fonts)?
> > +pref("gfx.missing_fonts.notify", true);
> 
> I think this should be off by default, since some form of observer is
> required for this to be useful. An addon can enable this when necessary.

That makes sense. Thanks.
Unbitrot again and disable the notification by default.
Attachment #8514916 - Attachment is obsolete: true
Attachment #8413230 - Attachment is obsolete: true
Attachment #8512985 - Flags: review?(jkitch.bug)
Comment on attachment 8512985 [details] [diff] [review]
Part 3 - send a notification for missing math fonts

Review of attachment 8512985 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/MathMLTextRunFactory.cpp
@@ +680,5 @@
>            // We fallback to the original character.
>            ch2 = ch;
> +          if (aMFR) {
> +            aMFR->RecordScript(MOZ_SCRIPT_MATHEMATICAL_NOTATION);
> +          }

This will only get triggered for bold, italic and bold-italic mathvariants.  

Is this the intention?  Missing Arabic math fonts will never be detected here.
(In reply to jkitch.bug@internode.on.net from comment #133)
> This will only get triggered for bold, italic and bold-italic mathvariants.  
> 
> Is this the intention?

Yes, this was the intention because these are the most important missing MathAlphaNum characters and because we do a special handling to redirect to non-plane 1 char, so we can't detect them otherwise. For the others, I expect the code to follow the normal path and end up in gfxFontGroup::InitScriptRun (attachment 8518913 [details] [diff] [review]).

For the Arabic mathvariants this will raise a notification for the "Arab" script. I expect the frontend code to suggest Khaled's Amiri fonts (which are the only one I know covering the math Arabic chars).

For other mathvariants (Fraktur) as well as other math symbols we should ideally raise a "Zmth" notification but this is not handled by these patches (because "Zmth" is not an official unicode script and we would have to decide which unicode char to put into that category). We can do that in a follow-up bug.
Comment on attachment 8512985 [details] [diff] [review]
Part 3 - send a notification for missing math fonts

Review of attachment 8512985 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for clarifying that.

r=me
Attachment #8512985 - Flags: review?(karlt)
Attachment #8512985 - Flags: review?(jkitch.bug)
Attachment #8512985 - Flags: review+
s/karl/jkitch/
Attachment #8512985 - Attachment is obsolete: true
Whiteboard: [testday-2011-06-24][leave open] → [testday-2011-06-24]
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #140)
> ASAN too:
> https://treeherder.mozilla.org/ui/logviewer.
> html#?job_id=3770173&repo=mozilla-inbound

mmh, this didn't happen when I submitted

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2be88ac56488

Probably I badly fix these merge conflicts again :-(
Comment on attachment 8518913 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking

Review of attachment 8518913 [details] [diff] [review]:
-----------------------------------------------------------------

@Jonathan: do you have any idea why your patch is leaking? I tried

https://treeherder.mozilla.org/#/jobs?repo=try&revision=826b766713fc and
https://treeherder.mozilla.org/#/jobs?repo=try&revision=86c90f93dd33

but that didn't help...
Attachment #8518913 - Flags: feedback?(jfkthame)
Comment on attachment 8518913 [details] [diff] [review]
Part 1 - send a notification of any scripts for which font coverage is lacking

Review of attachment 8518913 [details] [diff] [review]:
-----------------------------------------------------------------

I think I've tracked down the leaks; see below. Will post an updated patch after confirming on tryserver.

::: layout/generic/nsTextFrame.cpp
@@ +2141,5 @@
>      const char16_t* text = static_cast<const char16_t*>(textPtr);
>      if (transformingFactory) {
>        textRun = transformingFactory->MakeTextRun(text, transformedLength, &params,
> +                                                 fontGroup, textFlags, styles.Elements(),
> +                                                 mMissingFonts);

Here's the bug that's causing the leak: nsTransformingTextRunFactory::MakeTextRun doesn't take a missing-fonts recorder parameter, but it does take a trailing ownsFactory bool (with a default value of true), and so this didn't result in a compile-time error.

Here, the mMissingFonts pointer is silently converted to a bool (which will be false if the pref is off), and this means we lose track of factory ownership: hence, a leak.

This demonstrates what a footgun default parameter values can be. I suggest we should eliminate the default "= true" from that param (and pass the desired value explicitly), making it less likely we'll make similar mistakes in future.

I've pushed a try run with this bug fixed, to confirm that it solves the leaks.

@@ +2156,5 @@
>      textFlags |= gfxFontGroup::TEXT_IS_8BIT;
>      if (transformingFactory) {
>        textRun = transformingFactory->MakeTextRun(text, transformedLength, &params,
> +                                                 fontGroup, textFlags, styles.Elements(),
> +                                                 mMissingFonts);

Same issue here.
Attachment #8518913 - Flags: feedback?(jfkthame)
It seems that these calls to transformingFactory->MakeTextRun were already in attachment 669073 [details] [diff] [review], so it's weird that we didn't notice that before... But thank you!
(In reply to Frédéric Wang (:fredw) from comment #145)
> It seems that these calls to transformingFactory->MakeTextRun were already
> in attachment 669073 [details] [diff] [review], so it's weird that we didn't
> notice that before...

Yes, it's been a bug in Part 1 all along. But until you added the gfx.missing_fonts.notify pref and switched it to false by default, there was always a missing-fonts recorder. So the pointer was never null, and the bool parameter always ended up being |true|, accidentally resulting in the correct behavior. Then, turning the pref off exposed the bug.
(In reply to Jonathan Kew (:jfkthame) from comment #146)
> Yes, it's been a bug in Part 1 all along. But until you added the
> gfx.missing_fonts.notify pref and switched it to false by default, there was
> always a missing-fonts recorder. So the pointer was never null, and the bool
> parameter always ended up being |true|, accidentally resulting in the
> correct behavior. Then, turning the pref off exposed the bug.

Right, that makes sense to me now!
Flags: needinfo?(careydavis42)
Flags: needinfo?(careydavis42)
Flags: in-testsuite-
Flags: needinfo?(careydavis42)
Flags: needinfo?(careydavis42)
Flags: needinfo?(careydavis42)
Flags: needinfo?(careydavis42)
Attachment #8518913 - Attachment is obsolete: true
I verified the two patches and submitted a try build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6fc4e68e41e1

I actually just noticed that the add-ons I attached to the bug only seemed to work with non-e10s Firefox windows (I have previously been testing with a SeaMonkey build, which does not have that issue). For some reason, e10s windows are not able to receive the notifications...
Documentation:

These changes add a new "gfx.missing_fonts.notify" preference option, disabled by default. When enabled, the graphics code will check characters that can not be drawn (replaced with "Tofu" boxes) for lack of appropriate fonts and send notifications via the nsIObserverService service to inform observers about missing characters.

Each Unicode character is assigned a Unicode script identified by 4 letters (https://hg.mozilla.org/mozilla-central/file/9696d1c4b3ba/intl/unicharutil/util/nsUnicodeScriptCodes.h#l89, https://en.wikipedia.org/wiki/Script_%28Unicode%29#Table_of_scripts_in_Unicode). The notifications sent have aSubject=null, aTopic="font-needed" and aData is a stringified array of 4-letter scripts corresponding to the missing characters detected. In order to limit the number of notifications, a missing script is only notified once per browser session (see discussion in this bug).

Clients willing to observe the notification should register via the nsIObserverService. Note that for e10s applications, the notifications are sent from the content process. See https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/The_message_manager#Loading_scripts_per_child_process and https://github.com/mrbkap/e10sbug/tree/observer for a way to register observers from that process.
Keywords: dev-doc-needed
Whiteboard: [read comment 151 for documentation]
Oh, and I forgot to say: the MathML code also has heuristics to detect missing math fonts (missing glyphs for bold/italic/bold-italic mathvariants or missing OpenType MATH table for stretchy <mo>). To notify this case, we use the 4-letter code "Zmth" (Mathematical notation).
Whiteboard: [read comment 151 for documentation] → [read from comment 151 for documentation]
Blocks: 1134141
User Story: (updated)
Depends on: 1194497
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: