Closed Bug 1220502 Opened 9 years ago Closed 8 years ago

Firefox hang/delay when opening a youtube.com page, due to large list element

Categories

(Core :: Layout, defect)

41 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla47
Tracking Status
firefox45 + verified
firefox46 + fixed
firefox47 --- fixed
firefox-esr45 --- verified

People

(Reporter: lamiejang, Assigned: surkov)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 1 obsolete file)

Attached file signedin_yt.json
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0
Build ID: 20151014143721

Steps to reproduce:

Visit any youtube.com page where the toggle-able Sidebar is present on the left. You must be logged into an account that is subscribed to a large number of channels (i.e. 200+)


Actual results:

During layout of the page, the population of the Subscriptions section within the sidebar causes extreme load and locking up of the Firefox interface for several seconds. Once completely loaded the page responds as normal.


Expected results:

I believe the layout engine should be able to quickly handle a large styled UL element, each LI only having a name SPAN and small logo IMG.

Using the "Stylish" addon for Firefox to allow a custom CSS rule that sets display:none to the UL in the sidebar, there is no lag on loading any Youtube pages.

I believe there are many reports of this behavior on the Mozilla Firefox user support site, with people confused as to what is causing this (eg. https://support.mozilla.org/en-US/questions/1083820, https://support.mozilla.org/en-US/questions/1084042). It's often referred to as being triggered after the user signs in because the subscription list is not present when signed out.

Attempted to isolate the list to a standalone .html file for you - but it's difficult for me to see the same delay in effect there.

I have included a json recording using the F12 Performance tab illustrating the delay, there appear to be several block-waiting Graphics phases without the list hidden.
the isolated subscription list from the Youtube sidebar, unsure if the delay is noticable here
Confirmed on a clean profile.
Component: Untriaged → Layout
Product: Firefox → Core
Status: UNCONFIRMED → NEW
Ever confirmed: true
We should probably find someone to look at this.
Flags: needinfo?(bugs)
I can also confirm. Using a script to remove the subscription list solves the problem. This should definitely be looked at since many people migrate to different browsers since this issue is overlooked.
I revisited this briefly, can anyone confirm the hang is also mitigated by forcing the font Youtube uses to arial instead of "Roboto"?

Using developer tools to record Performance of the pageload, this also removes the 3 Graphics phases.

with Stylish to edit CSS the rule is:
@-moz-document domain("youtube.com") {
  body, input, button, textarea, select{
    font-family: arial !important;
  }
}
(In reply to lamiejang from comment #5)
> I revisited this briefly, can anyone confirm the hang is also mitigated by
> forcing the font Youtube uses to arial instead of "Roboto"?
> 
> Using developer tools to record Performance of the pageload, this also
> removes the 3 Graphics phases.
> 
> with Stylish to edit CSS the rule is:
> @-moz-document domain("youtube.com") {
>   body, input, button, textarea, select{
>     font-family: arial !important;
>   }
> }

This method did not resolve the hang for me.
(In reply to David Baron [:dbaron] ⌚UTC-8 from comment #3)
> We should probably find someone to look at this.

Xidorn's having a look at the profile for this one.
Flags: needinfo?(bugs) → needinfo?(quanxunzhen)
I dug into this for a while, but I did not actually observe any significant hang/delay with Youtube with ~300 channels present. Probably need a less powerful machine to reproduce the performance issue.

And I failed to find any performance hot spot during profiling either. The overall performance of restyle and layout could be an issue but as far as I can see, there wasn't any single part seriously slow things down.

(Probably want performance team to have a look here.)
Flags: needinfo?(quanxunzhen)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #8)
> I dug into this for a while, but I did not actually observe any significant
> hang/delay with Youtube with ~300 channels present. Probably need a less
> powerful machine to reproduce the performance issue.
> 
> And I failed to find any performance hot spot during profiling either. The
> overall performance of restyle and layout could be an issue but as far as I
> can see, there wasn't any single part seriously slow things down.
> 
> (Probably want performance team to have a look here.)
AMD FX-6300 @4GHz here and the issue is present on my system. Please note that this bug report has been created only after numerous report on support Mozilla and elsewhere. I'll try to reproduce this ug again on FF 43 this weekend and I'll report back.
I was not denying that this bug exists. I just failed to reproduce it and thus have no idea what can be done here.

Sorry, jet, could you ask someone else to profile this issue?
Flags: needinfo?(bugs)
I installed Nightly (46.01) and do not see the same locking up, so something may already be different. It does still happen in Firefox 43.01.

Gecko profiler is foreign to me but I tried: hit start/analyze, view tab, reload the YT subscriptions page, wait for the clunky loading, then stop and share. Without, and then with Stylish hiding the list element. (can only really tell it has finished lagging when it responds to my scrollwheel)

without stylish:
https://cleopatra.io/#report=bdb736a1359703d7a2f79d4e034e920ab7cfea26
with stylish:
https://cleopatra.io/#report=19878422a35c8ad0e0e57cc7b4c9722ff2892f04

I can't see anything in those tbh, the 3 big "Graphics" chunks in the F12 performance report was a lot more obvious. I'm using i5-3230M 2.6Ghz, 4000M/650M ANGLE, W10.
I can confirm that the problem does occur on a Intel Xeon E3 1231 v3 @ 3.40GHz in Firefox 43.0.1. 
Additionally, I could NOT recreate the issue within Nightly version 46.0a1 on the same machine.
Nightly by default enables e10s, which puts web content and the browser UI in two different processes, and thus the UI won't be frozen even if the page is slow. So not happening on Nightly doesn't mean this bug has been fixed. You probably could try it with "Non-e10s window" which you can open from the main menu.
In the options my "Enable multi-process Nightly" checkbox is greyed out so guessing it's off? (disabled: An accessibility tool is or was active. Bug 1198459).
Troubleshooting info says "Multiprocess Windows 	0/1 (default: false)", and looks like one process in task manager, unlike Chrome.
¡Hola!

FWIW YouTube folks have allegedly been trying to fix this for a while per https://www.reddit.com/r/youtube/comments/3lcxvj/firefox_started_freezing_on_youtube_a_lot_in/

¡Gracias!
ni? jst to make sure that this is on his radar
Flags: needinfo?(jst)
FWIW, I observed hanging on another website which also has a long list with images (like Youtube's sidebar), and I found that hanging is related to HTTPS Everywhere extension. After I disable this addon, things become much better. But interesting thing is that, a fresh profile with a fresh HTTPS Everywhere installed doesn't have the same issue shows up.

Here is my sampling: https://cleopatra.io/#report=87c320e51b8616c4097c9437aa0196718f552935 and from this, it seems the majority of the time is spent on findRootInChain() function of HTTPS Everywhere.

If you also have HTTPS Everywhere installed, you probably could try disabling it (from the addon manager, not the menu item the addon provides), and see whether it becomes better.
Unfortunately, the hang occurs for me even on a fresh Firefox install without any extensions.
I am experiencing hang when visiting Youtube with a relatively unpopulated subscription bar.
(In reply to jt.jag from comment #20)
> I am experiencing hang when visiting Youtube with a relatively unpopulated
> subscription bar.

This bug is about performance issues on youtube due to a large list element, unless perhaps you have a very large amount of Playlists, you may be experiencing another problem.
Please consider posting to https://support.mozilla.org/ or troubleshooting on IRC (as bugzilla is not good for troubleshooting).
adding www.youtube.com###guide-container to any adblocker solved the freezing Problem and disabled the left side Menu. Found this on redit to anyone having issues, I cam across this post in google. So tired of FF freezing. Latest FF, latest AMD drivers, Windows 10, 24 gigs of ram and it can't even play a video. Come on Mozilla - this shouldn't even be a bug
I have just helped another user having a very poor experience on youtube to fix this problem.

Do we have any new data on this issue?

Additionally, on the workaround solution side, blocking www.youtube.com###guide-container will block the entire sidebar, which is unnecessary. However blocking www.youtube.com###guide-subscriptions-section will block just the offending subscription list.

I have posted a list of ways for users to mitigate this problem here: https://www.reddit.com/r/firefox/comments/3zmu55/poor_performance_when_youtube_is_open_it_may_be/
Could anyone help providing a profiling shows what actually happens when it freezes? I have a testing Youtube account with ~360 subscriptions, but do not observe any hanging issue.

Please follow https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler to profile, and paste the url here, thanks.
I saw there are some profiling data in comment 11... but don't see any big difference between the two, so still have no idea what's wrong.

If there is anything significant in the performance report, please save the data and upload it here as well, which could also be helpful.
As mentioned there was no lockup using Nightly. Here are two reports loading subscriptions page using v43.04 with no css shim:
https://cleopatra.io/#report=380409828b6dff34044e0eee5c483151fe12e931
https://cleopatra.io/#report=46f570021f7981d6858b30b6ea4d1433e3c9dfdd

The most obvious thing I've seen was still the F12 Performance json attached to the first post, where it seemed the blue Graphics phases in the JS Flame Chart corresponded to the delay.
In these reports, I do see something might be useful. When the hanging happens, the majority of the time is spent in nsRefreshDriver::Tick. Under which, there are four major subcalls:
* ExplicitChildIterator::GetNextChild()
* FramePropertyTable::Get()
* PLDHashTable::Search()
* nsAccessibilityService::GetOrCreateAccessible()

The complete stack is lost here, so it is hard to guess what function in-between is causing this. But it is a good start point anyway.
Ryan, do you have any QA resources that might be able to try to reproduce this? The reports consistently mention having at least 200 subscribed channels.
Flags: needinfo?(ryanvm)
OK I think I've successfully reproduced this issue.

And I guess the freezing issue could actually be from different reason for different people. For some people, it might due to some problematic extensions (HTTPS Everywhere, Skype Click, etc.), and for some other people, it could be related to accessibility tools. The reporter and probably Stanley (in comment 19) could be affected by the latter.

When I enable Narrator on Windows, that Youtube make the browser freeze like what is reported. And if I block the subscription list, it becomes normal again, like what is reported as well.

I wonder is there any other widely-used software which could trigger accessibility feature on Firefox, and whether it is really the main reason of the massive complaints on Reddit.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #29)
> I wonder is there any other widely-used software which could trigger
> accessibility feature on Firefox, and whether it is really the main reason
> of the massive complaints on Reddit.

We know for a fact that touchscreen-enabled Windows tends to enable a11y for the virtual keyboard.

See bug 1163004 and bug 1108607
That sounds like a serious issue then...
Well, that might be the case! I'm using a Wacom tablet and the touch keyboard turns on automatically when I plug it in. I'll do some tests and report back if that was the issue.
Here is a rather useful profiling report which is generated from Nightly 46.0a1 (2016-01-21) for Win64:
https://cleopatra.io/#report=3b820502ef3f8fdd8db19ec801f43997fe87b63d&filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A53506,%22end%22%3A59901}]

In this we can see clearly that the refresh observer a11y::NotificationController takes the majority (>98%) of the time in the hanging.

I'm not quite familiar with a11y part, but I think with this profiling, there is something actionable anyway. davidb, could you find someone to look at this issue? I think it should have a higher priority as Windows users with touch screen could be widely affected.
Flags: needinfo?(dbolter)
Okay, I can confirm that upon unplugging the graphical tablet and restarting the machine, pen and touch mode is disabled and the Firefox hang is gone. Upon connecting a device with pen input, Windows' pen and toch status becomes "Pen input" and the hang reoccurs.
Thanks lamiejang for the reports in comment 26 which provide important clue for locating the actual issue.
Nice; fyi I also have a Wacom tablet attached.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #33)
> Here is a rather useful profiling report which is generated from Nightly
> 46.0a1 (2016-01-21) for Win64:
> https://cleopatra.io/
> #report=3b820502ef3f8fdd8db19ec801f43997fe87b63d&filter=[{%22type%22%3A%22Ran
> geSampleFilter%22,%22start%22%3A53506,%22end%22%3A59901}]

I couldn't open this. Is it uploaded?
(In reply to David Bolter [:davidb] from comment #38)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #33)
> > Here is a rather useful profiling report which is generated from Nightly
> > 46.0a1 (2016-01-21) for Win64:
> > https://cleopatra.io/
> > #report=3b820502ef3f8fdd8db19ec801f43997fe87b63d&filter=[{%22type%22%3A%22Ran
> > geSampleFilter%22,%22start%22%3A53506,%22end%22%3A59901}]
> 
> I couldn't open this. Is it uploaded?

OK I was able to load after I cropped the link to https://cleopatra.io/#report=3b820502ef3f8fdd8db19ec801f43997fe87b63d

Choosing a busy range by hand I do see a11y in the stacks. Alex can you take a look?
Flags: needinfo?(dbolter) → needinfo?(surkov.alexander)
(In reply to David Bolter [:davidb] from comment #39)
> (In reply to David Bolter [:davidb] from comment #38)
> > (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #33)
> > > Here is a rather useful profiling report which is generated from Nightly
> > > 46.0a1 (2016-01-21) for Win64:
> > > https://cleopatra.io/
> > > #report=3b820502ef3f8fdd8db19ec801f43997fe87b63d&filter=[{%22type%22%3A%22Ran
> > > geSampleFilter%22,%22start%22%3A53506,%22end%22%3A59901}]
> > 
> > I couldn't open this. Is it uploaded?
> 
> OK I was able to load after I cropped the link to
> https://cleopatra.io/#report=3b820502ef3f8fdd8db19ec801f43997fe87b63d
> 
> Choosing a busy range by hand I do see a11y in the stacks. Alex can you take
> a look?

basically it's tree traversal, for example, nsLayoutUtils::GetBeforeFrameContent takes 9% and nsAccessibilityService::GetOrCreateAccessible takes 12%, a lot of time is spent on hash tables lookup. I guess the problem there's a huge amount of content insertions that we have to process.

I think we can stop processing if it takes time to continue it on a next tick (or after timeout).

I'd be good to have opinion from layout/content folks who was fighting with performance there.
Flags: needinfo?(surkov.alexander)
Blocks: a11yperf
Flags: needinfo?(ryanvm)
I would assume there is some intrinsic algorithm issue which costs O(N^2) to handle the content insertion rather than O(N) which it should be. Otherwise, it cannot justify why ~300 items could freeze the browser for several seconds.

After a quick glance at functions involved, I guess this loop in NotificationController::WillRefresh() is the first N:
> for (auto iter = mTextHash.Iter(); !iter.Done(); iter.Next())
Basically it iterates over each content node which has a text frame reflowed. 

And this loop in Accessible::CacheChildren() is the second N:
> while ((child = walker.NextChild()) && AppendChild(child));

I'll try to gather more data to prove my hypothesis tomorrow.
Flags: needinfo?(quanxunzhen)
I was wrong.

I did a simple call counting, and this is the result:
> 2016-01-25 03:23:56.531000 UTC (NotificationController::WillRefresh takes 11.28s):
> 	DocAccessible::ProcessContentInserted: 2463
> 	DocAccessible::UpdateTreeOnInsertion: 2463
> 	Accessible::EnsureChildren: 2463
> 	HyperTextAccessible::CacheChildren: 2463
> 	TreeWalker::NextChild: 2463
> 	TreeWalker::Next: 50166384
> 2016-01-25 03:23:56.722000 UTC (NotificationController::WillRefresh takes 0.01s):
> 	DocAccessible::ProcessContentInserted: 1
> 	DocAccessible::UpdateTreeOnInsertion: 1
> 	Accessible::EnsureChildren: 1
> 	HyperTextAccessible::CacheChildren: 1
> 	TreeWalker::NextChild: 1
> 	TreeWalker::Next: 20368
> 2016-01-25 03:24:08.897000 UTC (NotificationController::WillRefresh takes 11.53s):
> 	DocAccessible::ProcessContentInserted: 2475
> 	DocAccessible::UpdateTreeOnInsertion: 2475
> 	Accessible::EnsureChildren: 2475
> 	HyperTextAccessible::CacheChildren: 2475
> 	TreeWalker::NextChild: 2503
> 	TreeWalker::Next: 50146165

This counter result clearly shows that, the bottleneck is TreeWalker::NextChild(), which needs to calls TreeWalker::Next() for roughly all nodes each time, which is definitely unacceptable. There must be something wrong.

surkov, could you fix it?
Flags: needinfo?(quanxunzhen) → needinfo?(surkov.alexander)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #42)

> This counter result clearly shows that, the bottleneck is
> TreeWalker::NextChild(), which needs to calls TreeWalker::Next() for roughly
> all nodes each time, which is definitely unacceptable. There must be
> something wrong.
> 
> surkov, could you fix it?

We have to traverse all DOM nodes to create the accessible tree. If you appended a node A then we traverse its whole subtree.
Flags: needinfo?(surkov.alexander)
Okay, but why do you have to traverse the whole tree 2000+ times in a single flush?
Flags: needinfo?(surkov.alexander)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #44)
> Okay, but why do you have to traverse the whole tree 2000+ times in a single
> flush?

We don't have to traverse same node twice for a single flush, but we run all nodes that layout gives us. Do you have any numbers about repeating runs? We probably need some coalescence algorithm on our site.
Flags: needinfo?(surkov.alexander)
My data in comment 42 shows that, NextChild() calls Next() 20k+ times when itself is called. And in some cases, NextChild() can be called ~2.5k times in a flush for handling inserted content. This whole makes Next() be called ~50M times in the single flush.
s/when itself is called/every time itself is called
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #46)
> My data in comment 42 shows that, NextChild() calls Next() 20k+ times when
> itself is called. And in some cases, NextChild() can be called ~2.5k times
> in a flush for handling inserted content. This whole makes Next() be called
> ~50M times in the single flush.

I'm not sure what solution you keep in mind. If the user has inserted a 2.5k DIVs, and each of them has 20k child DIVs then you will get the numbering you have.
Xidorn, Alex, I believe my bug 1223826 might be very closely related, if not a dupe, of this bug. As I mentioned in that bug, I am seeing a big hang and further slowdown of Firefox with NVDA when I am in IRCCloud and switch to a channel that has lots of members, like #developers. The freezing goes away if I switch back to a channel with only a small number of members, like #accessibility.
I can also reliably reproduce the performance getting progressively worse if I go to twitter.com, sign in, and just continue to add new tweets to the top or read through the timeline, which by infinite scrolling, adds more items as I move down the list with the j key. The bigger the list grows, and sub lists are added, the slower it gets. At about a hundred items, it becomes literally unusable.
Flags: needinfo?(surkov.alexander)
(In reply to alexander :surkov from comment #45)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #44)
> > Okay, but why do you have to traverse the whole tree 2000+ times in a single
> > flush?
> 
> We don't have to traverse same node twice for a single flush, but we run all
> nodes that layout gives us. Do you have any numbers about repeating runs? We
> probably need some coalescence algorithm on our site.

Realy I think we should stop going through all the child content of the container looking for new children.  We should be able to just look for children in the newly inserted content.


(In reply to alexander :surkov from comment #48)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #46)
> > My data in comment 42 shows that, NextChild() calls Next() 20k+ times when
> > itself is called. And in some cases, NextChild() can be called ~2.5k times
> > in a flush for handling inserted content. This whole makes Next() be called
> > ~50M times in the single flush.
> 
> I'm not sure what solution you keep in mind. If the user has inserted a 2.5k
> DIVs, and each of them has 20k child DIVs then you will get the numbering
> you have.

I haven't looked, but it seems rather unlikely that youtube is using a couple hundred k divs.  Really it seems pretty odd that we look at 20k elements just for one insertion.
(In reply to Trevor Saunders (:tbsaunde) from comment #50)

> Realy I think we should stop going through all the child content of the
> container looking for new children.  We should be able to just look for
> children in the newly inserted content.

indeed, it's a long standing problem, it may require huge rework, see bug 690417
Flags: needinfo?(surkov.alexander)
(In reply to alexander :surkov from comment #51)
> (In reply to Trevor Saunders (:tbsaunde) from comment #50)
> 
> > Realy I think we should stop going through all the child content of the
> > container looking for new children.  We should be able to just look for
> > children in the newly inserted content.
> 
> indeed, it's a long standing problem, it may require huge rework, see bug
> 690417

Yeah we do too much walking. Marking bug 690417 a dependency (at least for the a11y side).
Depends on: 690417
Attached patch patch (obsolete) — Splinter Review
keep content insertions in a hash rather than in an array, should help with o(n^2) for children insertions under same container.
Attachment #8711764 - Flags: review?(dbolter)
Comment on attachment 8711764 [details] [diff] [review]
patch

Review of attachment 8711764 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to swap in Trevor as he is closer to this code.

::: accessible/base/NotificationController.cpp
@@ +123,5 @@
> +    // this case early.
> +    if (node->GetPrimaryFrame()) {
> +      if (list->AppendElement(node))
> +        needsProcessing = true;
> +    }

Please brace the inner if as well.
Attachment #8711764 - Flags: review?(dbolter) → review?(tbsaunde+mozbugs)
(In reply to alexander :surkov from comment #48)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #46)
> > My data in comment 42 shows that, NextChild() calls Next() 20k+ times when
> > itself is called. And in some cases, NextChild() can be called ~2.5k times
> > in a flush for handling inserted content. This whole makes Next() be called
> > ~50M times in the single flush.
> 
> I'm not sure what solution you keep in mind. If the user has inserted a 2.5k
> DIVs, and each of them has 20k child DIVs then you will get the numbering
> you have.

The number is from loading Youtube with ~300 subscriptions. It is certainly not true that there are so many nodes. I believe there are only 20k nodes in the page in total, and 2.5k new nodes inserted.
(In reply to alexander :surkov from comment #53)
> Created attachment 8711764 [details] [diff] [review]
> patch
> 
> keep content insertions in a hash rather than in an array, should help with
> o(n^2) for children insertions under same container.

This patch doesn't fix this issue. Next() is still called 20k times every time NextChild() is called.

FWIW, I'm not sure what happened between yesterday and today, but there was two big hanging yesterday, but only one today. This is not affected by this patch anyway.
Attached file reduced testcase
OK now I have a reduced testcase for this issue.

Steps to reproduce:
1. enable accessibility
2. open the page and click the "Add Content" button


This issue happens because nodes are added into an element with 'visibility: hidden'. Since the invisible element doesn't have its accessible object setup properly, this element is traversed for each new text node inside it.

In this testcase, there are 5000 new text nodes. The invisible element has 5000 children, and each of the child has one text node inside. So TreeWalker::Next() is called (5000 children + 5000 * (1 text node + 1 nullptr)) * 5000 new text nodes = 75,000,000 times in one flush.


So then I have a very simple one-line fix for this issue:
> diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
> --- a/accessible/base/NotificationController.cpp
> +++ b/accessible/base/NotificationController.cpp
> @@ -228,17 +228,17 @@ NotificationController::WillRefresh(mozi
>      nsINode* containerNode = textNode->GetParentNode();
>      if (!containerNode) {
>        NS_ASSERTION(!textAcc,
>                     "Text node was removed but accessible is kept alive!");
>        continue;
>      }
> 
>      nsIFrame* textFrame = textNode->GetPrimaryFrame();
> -    if (!textFrame) {
> +    if (!textFrame || !textFrame->StyleVisibility()->IsVisible()) {
>        NS_ASSERTION(!textAcc,
>                     "Text node isn't rendered but accessible is kept alive!");
>        continue;
>      }
> 
>      nsIContent* containerElm = containerNode->IsElement() ?
>        containerNode->AsElement() : nullptr;
> 

It works, but I'm not quite sure whether it is the right way to fix this issue. :surkov?
Flags: needinfo?(surkov.alexander)
I'm no longer CC'ed for this bug and yet I keep getting email notifications on each new comment. How can I disable that?
(In reply to rimbotede from comment #58)
> I'm no longer CC'ed for this bug and yet I keep getting email notifications
> on each new comment. How can I disable that?

You voted for it.

Check the box to "Ignore Bugmail (never email me about this bug)" below the CC list and hit save (or remove your vote from the bug).
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #56)
> (In reply to alexander :surkov from comment #53)
> > Created attachment 8711764 [details] [diff] [review]
> > patch
> > 
> > keep content insertions in a hash rather than in an array, should help with
> > o(n^2) for children insertions under same container.
> 
> This patch doesn't fix this issue. Next() is still called 20k times every
> time NextChild() is called.

yes, but I think it may reduce the number of times CacheChildren() is called which is good since it means NextChild() is called less even if it doesn't completely fix this particular issue.

(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #57)
> Created attachment 8712012 [details]
> reduced testcase

ah thanks :) I wondered if it was something like that.

> It works, but I'm not quite sure whether it is the right way to fix this
> issue. :surkov?

I suspect with out thinking much that its not, because it doesn't handle somebody adding a subtree like <div id="foo"><div style="visibility: overflow">f</div></div>
(In reply to Trevor Saunders (:tbsaunde) from comment #60)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #56)
> > (In reply to alexander :surkov from comment #53)
> > > Created attachment 8711764 [details] [diff] [review]
> > > patch
> > > 
> > > keep content insertions in a hash rather than in an array, should help with
> > > o(n^2) for children insertions under same container.
> > 
> > This patch doesn't fix this issue. Next() is still called 20k times every
> > time NextChild() is called.
> 
> yes, but I think it may reduce the number of times CacheChildren() is called
> which is good since it means NextChild() is called less even if it doesn't
> completely fix this particular issue.

Well, I don't see a significant reduction on the amount of calls there. (And actually the main source of the massive calling in this bug is not the loop touched in this patch.)

I would suggest that, if it isn't proven to improve the actual performance, we shouldn't replace an array with a hash table, since iterating and adding item to the latter are considered more expensive.

I'm not a peer anyway, so you can decide. It's just my opinion.

> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #57)
> > It works, but I'm not quite sure whether it is the right way to fix this
> > issue. :surkov?
> 
> I suspect with out thinking much that its not, because it doesn't handle
> somebody adding a subtree like <div id="foo"><div style="visibility:
> overflow">f</div></div>

What is "visibility: overflow"? AFAICS, visibility only has three possible values: visible, hidden, and collapse.

With this patch, newly added text nodes are skipped if they are not visible, which matches the behavior in nsAccessibilityService::GetOrCreateAccessible. [1]

[1] https://dxr.mozilla.org/mozilla-central/rev/aa90f482e16db77cdb7dea84564ea1cbd8f7f6b3/accessible/base/nsAccessibilityService.cpp#1060-1065
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #57)

> >      nsIFrame* textFrame = textNode->GetPrimaryFrame();
> > -    if (!textFrame) {
> > +    if (!textFrame || !textFrame->StyleVisibility()->IsVisible()) {
> >        NS_ASSERTION(!textAcc,
> >                     "Text node isn't rendered but accessible is kept alive!");
> >        continue;
> >      }
> > 
> >      nsIContent* containerElm = containerNode->IsElement() ?
> >        containerNode->AsElement() : nullptr;
> > 
> 
> It works, but I'm not quite sure whether it is the right way to fix this
> issue. :surkov?

that should works I think, I didn't realized that not visible text frame may have rendered text. Can you upload a patch please, and make sure that our tests feel ok?
Flags: needinfo?(surkov.alexander)
(In reply to Trevor Saunders (:tbsaunde) from comment #60)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #56)
> > (In reply to alexander :surkov from comment #53)
> > > Created attachment 8711764 [details] [diff] [review]
> > > patch
> > > 
> > > keep content insertions in a hash rather than in an array, should help with
> > > o(n^2) for children insertions under same container.
> > 
> > This patch doesn't fix this issue. Next() is still called 20k times every
> > time NextChild() is called.
> 
> yes, but I think it may reduce the number of times CacheChildren() is called
> which is good since it means NextChild() is called less even if it doesn't
> completely fix this particular issue.

maybe I should have a new bug for it, since it doesn't improve things here
(In reply to alexander :surkov from comment #62)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #57)
> 
> > >      nsIFrame* textFrame = textNode->GetPrimaryFrame();
> > > -    if (!textFrame) {
> > > +    if (!textFrame || !textFrame->StyleVisibility()->IsVisible()) {
> > >        NS_ASSERTION(!textAcc,
> > >                     "Text node isn't rendered but accessible is kept alive!");
> > >        continue;
> > >      }
> > > 
> > >      nsIContent* containerElm = containerNode->IsElement() ?
> > >        containerNode->AsElement() : nullptr;
> > > 
> > 
> > It works, but I'm not quite sure whether it is the right way to fix this
> > issue. :surkov?
> 
> that should works I think, I didn't realized that not visible text frame may
> have rendered text. Can you upload a patch please, and make sure that our
> tests feel ok?

I'm gonna sleep now... Could you do that for me please?

Since elements with visibility: hidden still affects layout, so we must have frames for them.

You reminds me that, alternatively we can just not add them to the list if they are invisible from the very beginning in nsTextFrame::Reflow. That should make things even faster.
Flags: needinfo?(surkov.alexander)
(In reply to alexander :surkov from comment #63)
> (In reply to Trevor Saunders (:tbsaunde) from comment #60)
> > (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #56)
> > > (In reply to alexander :surkov from comment #53)
> > > > Created attachment 8711764 [details] [diff] [review]
> > > > patch
> > > > 
> > > > keep content insertions in a hash rather than in an array, should help with
> > > > o(n^2) for children insertions under same container.
> > > 
> > > This patch doesn't fix this issue. Next() is still called 20k times every
> > > time NextChild() is called.
> > 
> > yes, but I think it may reduce the number of times CacheChildren() is called
> > which is good since it means NextChild() is called less even if it doesn't
> > completely fix this particular issue.
> 
> maybe I should have a new bug for it, since it doesn't improve things here

Yeah I think it makes sense to have a separate bug and discuss the performance influence of this patch there.
(In reply to alexander :surkov from comment #63)
> (In reply to Trevor Saunders (:tbsaunde) from comment #60)
> > (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #56)
> > > (In reply to alexander :surkov from comment #53)
> > > > Created attachment 8711764 [details] [diff] [review]
> > > > patch
> > > > 
> > > > keep content insertions in a hash rather than in an array, should help with
> > > > o(n^2) for children insertions under same container.
> > > 
> > > This patch doesn't fix this issue. Next() is still called 20k times every
> > > time NextChild() is called.
> > 
> > yes, but I think it may reduce the number of times CacheChildren() is called
> > which is good since it means NextChild() is called less even if it doesn't
> > completely fix this particular issue.
> 
> maybe I should have a new bug for it, since it doesn't improve things here

sure, though it would be nice to get some numbers about how many merges it does.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #61)
> (In reply to Trevor Saunders (:tbsaunde) from comment #60)
> > (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #56)
> > > (In reply to alexander :surkov from comment #53)
> > > > Created attachment 8711764 [details] [diff] [review]
> > > > patch
> > > > 
> > > > keep content insertions in a hash rather than in an array, should help with
> > > > o(n^2) for children insertions under same container.
> > > 
> > > This patch doesn't fix this issue. Next() is still called 20k times every
> > > time NextChild() is called.
> > 
> > yes, but I think it may reduce the number of times CacheChildren() is called
> > which is good since it means NextChild() is called less even if it doesn't
> > completely fix this particular issue.
> 
> Well, I don't see a significant reduction on the amount of calls there. (And
> actually the main source of the massive calling in this bug is not the loop
> touched in this patch.)

yeah I see what's going on now, and I wouldn't expect it to help this test case.

> I would suggest that, if it isn't proven to improve the actual performance,
> we shouldn't replace an array with a hash table, since iterating and adding
> item to the latter are considered more expensive.

On first thought it should help with somethings, but actually I wonder if layout usually only notifies a11y once for those cases already.

> > (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #57)
> > > It works, but I'm not quite sure whether it is the right way to fix this
> > > issue. :surkov?
> > 
> > I suspect with out thinking much that its not, because it doesn't handle
> > somebody adding a subtree like <div id="foo"><div style="visibility:
> > overflow">f</div></div>
> 
> What is "visibility: overflow"? AFAICS, visibility only has three possible
> values: visible, hidden, and collapse.

clearly I need more caffeine before trying to work.  visible would work, but I see now we're talking about only text nodes so its not a concern.
Comment on attachment 8711764 [details] [diff] [review]
patch

filed bug 1242989
Flags: needinfo?(surkov.alexander)
Attachment #8711764 - Flags: review?(tbsaunde+mozbugs)
Attached patch patchSplinter Review
Attachment #8711764 - Attachment is obsolete: true
Attachment #8712175 - Flags: review?(tbsaunde+mozbugs)
Comment on attachment 8712175 [details] [diff] [review]
patch

>   inline void ScheduleTextUpdate(nsIContent* aTextNode)
>   {
>-    if (mTextHash.PutEntry(aTextNode))
>+    // Ignore nodes that are not in the DOM tree or not visible.
>+
>+    // If the text node is not in tree or doesn't have frame then this case should
>+    // have been handled already by content removal notifications.
>+    nsINode* containerNode = aTextNode->GetParentNode();
>+    if (!containerNode) {
>+      NS_ASSERTION(!mDocument->HasAccessible(aTextNode),
>+                   "Text node was removed but accessible is kept alive!");
>+      return;
>+    }

Can the containerNode ever be null? it would mean we are reflowing a node layout already told us it removed, that doesn't make much sense.

>+    nsIFrame* textFrame = aTextNode->GetPrimaryFrame();
>+    if (!textFrame || !textFrame->StyleVisibility()->IsVisible()) {

I don't see how textFrame can be null, and it seems like it would be faster to avoid calling nsAccessibilityService::UpdateText at all if the frame is hidden.

>+    if (mTextHash.PutEntry(aTextNode)) {

that can never return null.
(In reply to Trevor Saunders (:tbsaunde) from comment #71)
> Comment on attachment 8712175 [details] [diff] [review]
> patch
> 
> >   inline void ScheduleTextUpdate(nsIContent* aTextNode)
> >   {
> >-    if (mTextHash.PutEntry(aTextNode))
> >+    // Ignore nodes that are not in the DOM tree or not visible.
> >+
> >+    // If the text node is not in tree or doesn't have frame then this case should
> >+    // have been handled already by content removal notifications.
> >+    nsINode* containerNode = aTextNode->GetParentNode();
> >+    if (!containerNode) {
> >+      NS_ASSERTION(!mDocument->HasAccessible(aTextNode),
> >+                   "Text node was removed but accessible is kept alive!");
> >+      return;
> >+    }
> 
> Can the containerNode ever be null? it would mean we are reflowing a node
> layout already told us it removed, that doesn't make much sense.
> 
> >+    nsIFrame* textFrame = aTextNode->GetPrimaryFrame();
> >+    if (!textFrame || !textFrame->StyleVisibility()->IsVisible()) {
> 
> I don't see how textFrame can be null, and it seems like it would be faster
> to avoid calling nsAccessibilityService::UpdateText at all if the frame is
> hidden.

if you are sure those are never null then I'm good to convert them to fatal assertions.

> 
> >+    if (mTextHash.PutEntry(aTextNode)) {
> 
> that can never return null.

ok
(In reply to alexander :surkov from comment #72)
> (In reply to Trevor Saunders (:tbsaunde) from comment #71)
> > Comment on attachment 8712175 [details] [diff] [review]
> > patch
> > 
> > >   inline void ScheduleTextUpdate(nsIContent* aTextNode)
> > >   {
> > >-    if (mTextHash.PutEntry(aTextNode))
> > >+    // Ignore nodes that are not in the DOM tree or not visible.
> > >+
> > >+    // If the text node is not in tree or doesn't have frame then this case should
> > >+    // have been handled already by content removal notifications.
> > >+    nsINode* containerNode = aTextNode->GetParentNode();
> > >+    if (!containerNode) {
> > >+      NS_ASSERTION(!mDocument->HasAccessible(aTextNode),
> > >+                   "Text node was removed but accessible is kept alive!");
> > >+      return;
> > >+    }
> > 
> > Can the containerNode ever be null? it would mean we are reflowing a node
> > layout already told us it removed, that doesn't make much sense.
> > 
> > >+    nsIFrame* textFrame = aTextNode->GetPrimaryFrame();
> > >+    if (!textFrame || !textFrame->StyleVisibility()->IsVisible()) {
> > 
> > I don't see how textFrame can be null, and it seems like it would be faster
> > to avoid calling nsAccessibilityService::UpdateText at all if the frame is
> > hidden.
> 
> if you are sure those are never null then I'm good to convert them to fatal
> assertions.

I really suspect the first one is never null, and a fatal assert would only hurt debug builds so that seems fine, in the worst case we'll learn something.

For the second one it'll just go away if you move the check to nsTextFrame right?
(In reply to Trevor Saunders (:tbsaunde) from comment #73)

> For the second one it'll just go away if you move the check to nsTextFrame
> right?

I don't follow you. I wasn't going to modify nsTextFrame.
(In reply to alexander :surkov from comment #74)
> (In reply to Trevor Saunders (:tbsaunde) from comment #73)
> 
> > For the second one it'll just go away if you move the check to nsTextFrame
> > right?
> 
> I don't follow you. I wasn't going to modify nsTextFrame.

why not? it'd be simpler and faster.
Attached patch patchSplinter Review
Attachment #8712692 - Flags: review?(tbsaunde+mozbugs)
Attachment #8712692 - Flags: review?(roc)
Comment on attachment 8712692 [details] [diff] [review]
patch

>diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
>--- a/accessible/base/NotificationController.cpp
>+++ b/accessible/base/NotificationController.cpp
>@@ -225,42 +225,35 @@ NotificationController::WillRefresh(mozi
>     }
>   }
>   mContentInsertions.Clear();
> 
>   // Process rendered text change notifications.
>   for (auto iter = mTextHash.Iter(); !iter.Done(); iter.Next()) {
>     nsCOMPtrHashKey<nsIContent>* entry = iter.Get();
>     nsIContent* textNode = entry->GetKey();
>-    Accessible* textAcc = mDocument->GetAccessible(textNode);
> 
>-    // If the text node is not in tree or doesn't have frame then this case should
>-    // have been handled already by content removal notifications.
>     nsINode* containerNode = textNode->GetParentNode();
>     if (!containerNode) {
>-      NS_ASSERTION(!textAcc,
>-                   "Text node was removed but accessible is kept alive!");
>       continue;
>     }
>-
>     nsIFrame* textFrame = textNode->GetPrimaryFrame();
>     if (!textFrame) {
>-      NS_ASSERTION(!textAcc,
>-                   "Text node isn't rendered but accessible is kept alive!");
>       continue;
>     }

I'm confused why are you changing this at all now, and since you are what is the point in keeping these checks? can they ever happen?

>   inline void ScheduleTextUpdate(nsIContent* aTextNode)
>   {
>-    if (mTextHash.PutEntry(aTextNode))
>-      ScheduleProcessing();
>+    // Make sure we are not called with a node that is not in the DOM tree or
>+    // not visible.
>+    MOZ_ASSERT(aTextNode->GetParentNode() && aTextNode->GetPrimaryFrame() &&
>+               aTextNode->GetPrimaryFrame()->StyleVisibility()->IsVisible(),
>+               "A text node is not in DOM or invisible");

I think it'd be better if these were separate asserts so it'd be easier to know which failed.

>-  ReflowTextA11yNotifier(presContext, mContent);
>+  if (StyleVisibility()->IsVisible()) {
>+    ReflowTextA11yNotifier(presContext, mContent);

side note it seems like this class is kind of useless now and we might as well just inline it here.
(In reply to Trevor Saunders (:tbsaunde) from comment #78)
> Comment on attachment 8712692 [details] [diff] [review]
> patch
> 
> >diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
> >--- a/accessible/base/NotificationController.cpp
> >+++ b/accessible/base/NotificationController.cpp
> >@@ -225,42 +225,35 @@ NotificationController::WillRefresh(mozi
> >     }
> >   }
> >   mContentInsertions.Clear();
> > 
> >   // Process rendered text change notifications.
> >   for (auto iter = mTextHash.Iter(); !iter.Done(); iter.Next()) {
> >     nsCOMPtrHashKey<nsIContent>* entry = iter.Get();
> >     nsIContent* textNode = entry->GetKey();
> >-    Accessible* textAcc = mDocument->GetAccessible(textNode);
> > 
> >-    // If the text node is not in tree or doesn't have frame then this case should
> >-    // have been handled already by content removal notifications.
> >     nsINode* containerNode = textNode->GetParentNode();
> >     if (!containerNode) {
> >-      NS_ASSERTION(!textAcc,
> >-                   "Text node was removed but accessible is kept alive!");
> >       continue;
> >     }
> >-
> >     nsIFrame* textFrame = textNode->GetPrimaryFrame();
> >     if (!textFrame) {
> >-      NS_ASSERTION(!textAcc,
> >-                   "Text node isn't rendered but accessible is kept alive!");
> >       continue;
> >     }
> 
> I'm confused why are you changing this at all now, and since you are what is
> the point in keeping these checks? can they ever happen?

I can keep them. Nothing prevents them from happening.


> >   inline void ScheduleTextUpdate(nsIContent* aTextNode)
> >   {
> >-    if (mTextHash.PutEntry(aTextNode))
> >-      ScheduleProcessing();
> >+    // Make sure we are not called with a node that is not in the DOM tree or
> >+    // not visible.
> >+    MOZ_ASSERT(aTextNode->GetParentNode() && aTextNode->GetPrimaryFrame() &&
> >+               aTextNode->GetPrimaryFrame()->StyleVisibility()->IsVisible(),
> >+               "A text node is not in DOM or invisible");
> 
> I think it'd be better if these were separate asserts so it'd be easier to
> know which failed.

either way you have to debug it
> > >diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
> > >--- a/accessible/base/NotificationController.cpp
> > >+++ b/accessible/base/NotificationController.cpp
> > >@@ -225,42 +225,35 @@ NotificationController::WillRefresh(mozi
> > >     }
> > >   }
> > >   mContentInsertions.Clear();
> > > 
> > >   // Process rendered text change notifications.
> > >   for (auto iter = mTextHash.Iter(); !iter.Done(); iter.Next()) {
> > >     nsCOMPtrHashKey<nsIContent>* entry = iter.Get();
> > >     nsIContent* textNode = entry->GetKey();
> > >-    Accessible* textAcc = mDocument->GetAccessible(textNode);
> > > 
> > >-    // If the text node is not in tree or doesn't have frame then this case should
> > >-    // have been handled already by content removal notifications.
> > >     nsINode* containerNode = textNode->GetParentNode();
> > >     if (!containerNode) {
> > >-      NS_ASSERTION(!textAcc,
> > >-                   "Text node was removed but accessible is kept alive!");
> > >       continue;
> > >     }
> > >-
> > >     nsIFrame* textFrame = textNode->GetPrimaryFrame();
> > >     if (!textFrame) {
> > >-      NS_ASSERTION(!textAcc,
> > >-                   "Text node isn't rendered but accessible is kept alive!");
> > >       continue;
> > >     }
> > 
> > I'm confused why are you changing this at all now, and since you are what is
> > the point in keeping these checks? can they ever happen?
> 
> I can keep them. Nothing prevents them from happening.

That doesn't answer the first question at all.

I guess the case that they happen is that the text frame gets reflowed and then later gets removed?  Do we have a test for that case?

> > >   inline void ScheduleTextUpdate(nsIContent* aTextNode)
> > >   {
> > >-    if (mTextHash.PutEntry(aTextNode))
> > >-      ScheduleProcessing();
> > >+    // Make sure we are not called with a node that is not in the DOM tree or
> > >+    // not visible.
> > >+    MOZ_ASSERT(aTextNode->GetParentNode() && aTextNode->GetPrimaryFrame() &&
> > >+               aTextNode->GetPrimaryFrame()->StyleVisibility()->IsVisible(),
> > >+               "A text node is not in DOM or invisible");
> > 
> > I think it'd be better if these were separate asserts so it'd be easier to
> > know which failed.
> 
> either way you have to debug it

not necessarily in conditions where its easy to figure out which failed, and what does it hurt?
(In reply to Trevor Saunders (:tbsaunde) from comment #80)
> > > >diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
> > > >--- a/accessible/base/NotificationController.cpp
> > > >+++ b/accessible/base/NotificationController.cpp
> > > >@@ -225,42 +225,35 @@ NotificationController::WillRefresh(mozi
> > > >     }
> > > >   }
> > > >   mContentInsertions.Clear();
> > > > 
> > > >   // Process rendered text change notifications.
> > > >   for (auto iter = mTextHash.Iter(); !iter.Done(); iter.Next()) {
> > > >     nsCOMPtrHashKey<nsIContent>* entry = iter.Get();
> > > >     nsIContent* textNode = entry->GetKey();
> > > >-    Accessible* textAcc = mDocument->GetAccessible(textNode);
> > > > 
> > > >-    // If the text node is not in tree or doesn't have frame then this case should
> > > >-    // have been handled already by content removal notifications.
> > > >     nsINode* containerNode = textNode->GetParentNode();
> > > >     if (!containerNode) {
> > > >-      NS_ASSERTION(!textAcc,
> > > >-                   "Text node was removed but accessible is kept alive!");
> > > >       continue;
> > > >     }
> > > >-
> > > >     nsIFrame* textFrame = textNode->GetPrimaryFrame();
> > > >     if (!textFrame) {
> > > >-      NS_ASSERTION(!textAcc,
> > > >-                   "Text node isn't rendered but accessible is kept alive!");
> > > >       continue;
> > > >     }
> > > 
> > > I'm confused why are you changing this at all now, and since you are what is
> > > the point in keeping these checks? can they ever happen?
> > 
> > I can keep them. Nothing prevents them from happening.
> 
> That doesn't answer the first question at all.
> 
> I guess the case that they happen is that the text frame gets reflowed and
> then later gets removed?  

that's a possible scenario

> Do we have a test for that case?

not sure, I guess it might be not trivial to catch.

> > > >   inline void ScheduleTextUpdate(nsIContent* aTextNode)
> > > >   {
> > > >-    if (mTextHash.PutEntry(aTextNode))
> > > >-      ScheduleProcessing();
> > > >+    // Make sure we are not called with a node that is not in the DOM tree or
> > > >+    // not visible.
> > > >+    MOZ_ASSERT(aTextNode->GetParentNode() && aTextNode->GetPrimaryFrame() &&
> > > >+               aTextNode->GetPrimaryFrame()->StyleVisibility()->IsVisible(),
> > > >+               "A text node is not in DOM or invisible");
> > > 
> > > I think it'd be better if these were separate asserts so it'd be easier to
> > > know which failed.
> > 
> > either way you have to debug it
> 
> not necessarily in conditions where its easy to figure out which failed, and
> what does it hurt?

then I guess you are assignee of the next bug when it asserts. It doesn't hurt of course.
Comment on attachment 8712692 [details] [diff] [review]
patch

Review of attachment 8712692 [details] [diff] [review]:
-----------------------------------------------------------------

::: accessible/base/NotificationController.h
@@ +125,5 @@
> +    // Make sure we are not called with a node that is not in the DOM tree or
> +    // not visible.
> +    MOZ_ASSERT(aTextNode->GetParentNode() && aTextNode->GetPrimaryFrame() &&
> +               aTextNode->GetPrimaryFrame()->StyleVisibility()->IsVisible(),
> +               "A text node is not in DOM or invisible");

I agree with Trevor, there's no harm and some benefit to splitting this up into separate assertions. r+ with that.
Attachment #8712692 - Flags: review?(roc) → review+
(In reply to alexander :surkov from comment #81)
> (In reply to Trevor Saunders (:tbsaunde) from comment #80)
> > > > >diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
> > > > >--- a/accessible/base/NotificationController.cpp
> > > > >+++ b/accessible/base/NotificationController.cpp
> > > > >@@ -225,42 +225,35 @@ NotificationController::WillRefresh(mozi
> > > > >     }
> > > > >   }
> > > > >   mContentInsertions.Clear();
> > > > > 
> > > > >   // Process rendered text change notifications.
> > > > >   for (auto iter = mTextHash.Iter(); !iter.Done(); iter.Next()) {
> > > > >     nsCOMPtrHashKey<nsIContent>* entry = iter.Get();
> > > > >     nsIContent* textNode = entry->GetKey();
> > > > >-    Accessible* textAcc = mDocument->GetAccessible(textNode);
> > > > > 
> > > > >-    // If the text node is not in tree or doesn't have frame then this case should
> > > > >-    // have been handled already by content removal notifications.
> > > > >     nsINode* containerNode = textNode->GetParentNode();
> > > > >     if (!containerNode) {
> > > > >-      NS_ASSERTION(!textAcc,
> > > > >-                   "Text node was removed but accessible is kept alive!");
> > > > >       continue;
> > > > >     }
> > > > >-
> > > > >     nsIFrame* textFrame = textNode->GetPrimaryFrame();
> > > > >     if (!textFrame) {
> > > > >-      NS_ASSERTION(!textAcc,
> > > > >-                   "Text node isn't rendered but accessible is kept alive!");
> > > > >       continue;
> > > > >     }
> > > > 
> > > > I'm confused why are you changing this at all now, and since you are what is
> > > > the point in keeping these checks? can they ever happen?
> > > 
> > > I can keep them. Nothing prevents them from happening.
> > 
> > That doesn't answer the first question at all.
> > 
> > I guess the case that they happen is that the text frame gets reflowed and
> > then later gets removed?  
> 
> that's a possible scenario

In what case could they happen? It seems to me with this patch, we further restrict what text node would get here, so previous assertions shouldn't be broken, should they?

IIRC I didn't observe the assertions even with my simple patch in comment 57. So I suppose it is fine to keep the assertions. Optionally, we could probably even strengthen the assertions to MOZ_ASSERT so that we can really catch any violation of it.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #83)

> > > I guess the case that they happen is that the text frame gets reflowed and
> > > then later gets removed?  
> > 
> > that's a possible scenario
> 
> In what case could they happen? It seems to me with this patch, we further
> restrict what text node would get here, so previous assertions shouldn't be
> broken, should they?

If something triggers the text reflow, and then a script removes the DOM node from the tree, then we process the DOM node removal, and then after timeout the text reflow notification. Anyway for after-timeout processings nobody can guarantee the node is still in DOM, I guess Firefox is not that smart yet.

> IIRC I didn't observe the assertions even with my simple patch in comment
> 57. So I suppose it is fine to keep the assertions. Optionally, we could
> probably even strengthen the assertions to MOZ_ASSERT so that we can really
> catch any violation of it.

it asserts when text node is no longer valid but it has accessible object, which means we have some a tree update problem.
> > IIRC I didn't observe the assertions even with my simple patch in comment
> > 57. So I suppose it is fine to keep the assertions. Optionally, we could
> > probably even strengthen the assertions to MOZ_ASSERT so that we can really
> > catch any violation of it.
> 
> it asserts when text node is no longer valid but it has accessible object,
> which means we have some a tree update problem.

are you going to fix that? or just paper over it by removing the asserts?  at least now I have an explanation of why that part of the patch is there.

(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #83)
> (In reply to alexander :surkov from comment #81)
> > (In reply to Trevor Saunders (:tbsaunde) from comment #80)
> > > > > >diff --git a/accessible/base/NotificationController.cpp b/accessible/base/NotificationController.cpp
> > > > > >--- a/accessible/base/NotificationController.cpp
> > > > > >+++ b/accessible/base/NotificationController.cpp
> > > > > >@@ -225,42 +225,35 @@ NotificationController::WillRefresh(mozi
> > > > > >     }
> > > > > >   }
> > > > > >   mContentInsertions.Clear();
> > > > > > 
> > > > > >   // Process rendered text change notifications.
> > > > > >   for (auto iter = mTextHash.Iter(); !iter.Done(); iter.Next()) {
> > > > > >     nsCOMPtrHashKey<nsIContent>* entry = iter.Get();
> > > > > >     nsIContent* textNode = entry->GetKey();
> > > > > >-    Accessible* textAcc = mDocument->GetAccessible(textNode);
> > > > > > 
> > > > > >-    // If the text node is not in tree or doesn't have frame then this case should
> > > > > >-    // have been handled already by content removal notifications.
> > > > > >     nsINode* containerNode = textNode->GetParentNode();
> > > > > >     if (!containerNode) {
> > > > > >-      NS_ASSERTION(!textAcc,
> > > > > >-                   "Text node was removed but accessible is kept alive!");
> > > > > >       continue;
> > > > > >     }
> > > > > >-
> > > > > >     nsIFrame* textFrame = textNode->GetPrimaryFrame();
> > > > > >     if (!textFrame) {
> > > > > >-      NS_ASSERTION(!textAcc,
> > > > > >-                   "Text node isn't rendered but accessible is kept alive!");
> > > > > >       continue;
> > > > > >     }
> > > > > 
> > > > > I'm confused why are you changing this at all now, and since you are what is
> > > > > the point in keeping these checks? can they ever happen?
> > > > 
> > > > I can keep them. Nothing prevents them from happening.
> > > 
> > > That doesn't answer the first question at all.
> > > 
> > > I guess the case that they happen is that the text frame gets reflowed and
> > > then later gets removed?  
> > 
> > that's a possible scenario
> 
> In what case could they happen? It seems to me with this patch, we further
> restrict what text node would get here, so previous assertions shouldn't be
> broken, should they?

Well, one obvious case is that there is a WillRefresh observer before the a11y on that runs script, and of course that could cause this to happen.
(In reply to Trevor Saunders (:tbsaunde) from comment #85)
> > > IIRC I didn't observe the assertions even with my simple patch in comment
> > > 57. So I suppose it is fine to keep the assertions. Optionally, we could
> > > probably even strengthen the assertions to MOZ_ASSERT so that we can really
> > > catch any violation of it.
> > 
> > it asserts when text node is no longer valid but it has accessible object,
> > which means we have some a tree update problem.
> 
> are you going to fix that?

there's nothing to fix here, I'm not aware of those assertions were ever triggered

> or just paper over it by removing the asserts? 

the latest patch doesn't have those assertions, but still it would be good to have them, I need to revert those changes
Attached patch patchSplinter Review
Attachment #8714392 - Flags: review?(tbsaunde+mozbugs)
Attachment #8714392 - Flags: review?(tbsaunde+mozbugs) → review+
Flags: needinfo?(jst)
Flags: needinfo?(bugs)
https://hg.mozilla.org/mozilla-central/rev/c3dd7dab0b00
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Attachment #8712692 - Flags: review?(tbsaunde+mozbugs)
Attachment #8712175 - Flags: review?(tbsaunde+mozbugs)
Alexander, I've been told by one or two devs that this would likely be safe for uplift to 45 beta.  Would that be prudent to request?
Flags: needinfo?(surkov.alexander)
Since it affects quite a lot of Youtube users, I'd suggest we try uplifting it anyway.

It seems to me that even this would somehow regress a11y behavior, it would be much better than letting Youtube users who have touch screen suffered from this performance issue for additional months.
Assignee: nobody → surkov.alexander
I meant, "even if this would ...".
I also meant to mention that I was spurred to comment because I received a reply from someone who found my month+ old reddit post on the workaround for the issue thanking me because they'd been experiencing it for months.
It's unknown how many users may be affected or have even left because of it.
[Tracking Requested - why for this release]:
See comment #92 and comment #93.

I support uplifting this. Accessibility regressions don't appear to be present, judging from daily use of nightly builds with screen readers and not finding any regressions since this landed.
Comment on attachment 8714392 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]:none
[User impact if declined]:the browser hangs for youtube users
[Describe test coverage new/current, TreeHerder]: all a11y mochitest don't regress
[Risks and why]: low - extra condition to ignore early not relevant changes
[String/UUID change made/needed]:no
Flags: needinfo?(surkov.alexander)
Attachment #8714392 - Flags: approval-mozilla-aurora?
Alexander, any reason why you didn't requested uplift to 45? It seems a good patch.
Jean-Yves, would you agree?
Flags: needinfo?(surkov.alexander)
Flags: needinfo?(jyavenard)
Comment on attachment 8714392 [details] [diff] [review]
patch

Taking in aurora anyway.
Attachment #8714392 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Caspy7 from comment #23)
> I have just helped another user having a very poor experience on youtube to
> fix this problem.
> 
> Do we have any new data on this issue?
> 
> Additionally, on the workaround solution side, blocking
> www.youtube.com###guide-container will block the entire sidebar, which is
> unnecessary. However blocking www.youtube.com###guide-subscriptions-section
> will block just the offending subscription list.
> 
> I have posted a list of ways for users to mitigate this problem here:
> https://www.reddit.com/r/firefox/comments/3zmu55/
> poor_performance_when_youtube_is_open_it_may_be/

Yes I can also confirm a fix after adding 'www.youtube.com###guide-subscriptions-section' in adblocker plus.
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #30)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #29)
> > I wonder is there any other widely-used software which could trigger
> > accessibility feature on Firefox, and whether it is really the main reason
> > of the massive complaints on Reddit.
> 
> We know for a fact that touchscreen-enabled Windows tends to enable a11y for
> the virtual keyboard.
> 
> See bug 1163004 and bug 1108607

My computer is a Fujitsu TS901 touchscreen laptop.
Not my area of expertise other than dealing with YouTube often, but seems to be worth uplifting to beta...
Flags: needinfo?(jyavenard)
(In reply to Sylvestre Ledru [:sylvestre] from comment #99)
> Alexander, any reason why you didn't requested uplift to 45? It seems a good
> patch.
> Jean-Yves, would you agree?

definitely, however I'm not sure what is uplifting procedure now, whether we should give the patch a couple spins making sure it doesn't regress.
Flags: needinfo?(surkov.alexander)
Had yet another reddit user a few minutes ago simply wondering why Chrome is so much better on Youtube - it turned out this was their problem.

Have we determine what might be the procedure for uplifting this to 45? It's been on 46 for 6 days now. I just don't want time to slip by and miss the 45 release window.
Alex can you request uplift to 45?
Flags: needinfo?(surkov.alexander)
This is too late for 45, release day is tomorrow.
I find this very frustrating. 
Since my last comment expressing concern about missing the 45 window, I've seen additional users with the same issue.  We have no idea how many users are affected by this or are leaving as a result (at the very least getting a very poor experience, adding to the argument to leave).

I had previously encountered more than one developer who believed this issue would be safe to fix in 45, but nothing happened towards an uplift and now we have six more weeks of this - likely shedding some unknown number of users.

Another user having very poor performance (after upgrading to Win10) reports that the youtube workaround didn't fix their issue but disabling accessibility did. This makes me think that this is biting users on more sites than youtube.  https://www.reddit.com/r/firefox/comments/49g5jo/firefox_performance_killed_after_clean_install_of/d0srpey?context=3

On another topic, I'm guessing we didn't have any testing that would have detected this issue? Should a bug be filed to ensure that such a similar regression is avoided?  It would also be nice if we could go back and see when this regression was introduced.
We could consider this patch as a ride along if we do a dot release for 45.

I understand your frustration but, afaik, the impact is limited. We had this issue for most of the 45 cycle and the impact on the 45 beta cycle was none or limited.
This patch missed 45 (my fault). David and others, I am considering taking it in 45.0.1 and it didn't cause any regressions in 46. Any objections? Thanks
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(dbaron)
Sounds fine to me.
Flags: needinfo?(dbaron)
Comment on attachment 8714392 [details] [diff] [review]
patch



[Triage Comment]
Attachment #8714392 - Flags: approval-mozilla-release+
Attachment #8714392 - Flags: approval-mozilla-esr45+
Added to the release notes with "Fix a potential performance regression (Youtube for example) (1220502)" as wording

By the way, does this impact fennec too?
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0

Hi guys,

I have verified the fix made on 45.0.1 candidate and 45.0.1 ESR candidate builds (both x32&x64) and definitely there is an improvement with it. Used the reduced testcase from comment 57 and while I had the virtual keyboard opened from windows accessibility features there was no hang or delay after tapping the "Add Content" button as seen on 45.0 / 45.0 ESR builds.

However, I have managed to produce some short periods of "Not Responding" on the ESR candidate builds while spamming the button. But this probably happened because of high amount of used RAM, reached 1.3GB memory used by Firefox while spamming.

Considering this I'm inclining to mark the issue as verified. Is this the desired behavior Alexander ?
(In reply to Paul Oiegas [:pauloiegas] from comment #117)
> However, I have managed to produce some short periods of "Not Responding" on
> the ESR candidate builds while spamming the button. But this probably
> happened because of high amount of used RAM, reached 1.3GB memory used by
> Firefox while spamming.
> 
> Considering this I'm inclining to mark the issue as verified. Is this the
> desired behavior Alexander ?

Clicking that button many times could certainly make things slow enough to be "Not Responding". Each click of the button adds 5k new elements to the document, which is quite a lot. So I'd say it is expected behavior, though probably not desired (we always hope to be able to handle whatever amount of data as fast as possible... which is simply impossible :)

Spamming the button could easily make the number of elements in the page exceed a normal reasonable amount. (As a reference, single page HTML spec has ~145k elements, which is sometimes considered as an upper bound of a normal page.)

You would probably observe a linear increase of time / memory used with each click after a certain thresold. (If it is not increased linearly, but bilinear or even worse, that probably indicates another potential performance issue.)
agree, it'd be good to have another bug for "not responding" issue, at least to investigate it.
Flags: needinfo?(surkov.alexander)
Update, verified this on 45.0.1 ESR candidate (build2) on Windows 10 x64 and I can confirm that reported behavior is no longer occurring.
I'm marking this as verified and I will log the "Not responding" issue separately and provide the bug number.
Status: RESOLVED → VERIFIED
The release notes for 45.0.1 currently make no mention of this fix:
https://www.mozilla.org/en-US/firefox/45.0.1/releasenotes/

Per comment 116 this should show up.

Sylvestre, can you clarify on this? Did this not land or is it just not in the notes?
Flags: needinfo?(sledru)
Alexander, I have logged the "Not responding" issue in bug 1257788 and added you at CC.
Fixed, thanks for the ping!
Flags: needinfo?(sledru)
Flags: needinfo?(tbsaunde+mozbugs)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: