Closed Bug 1368434 Opened 7 years ago Closed 7 years ago

Missing page load events between frame script reloaded and load listener registration

Categories

(Remote Protocol :: Marionette, defect, P1)

Version 3
defect

Tracking

(firefox-esr52 fix-optional, firefox55 wontfix, firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- fix-optional
firefox55 --- wontfix
firefox56 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [spec][17/07])

Attachments

(2 files)

It looks like that there is a small chance to hit a race when a navigation command is active and the framescript gets reloaded.

Reason is that the loadListener isn't getting initialized immediately, but via the driver once the framescript is done registering. Also it might take some extra milliseconds to send the request to the listener. 

Maybe we should make the loadListener global, and always active. Such a move might not be that easy due it would need some kind of syncing.

Andreas, what do you think?
Flags: needinfo?(ato)
(In reply to Henrik Skupin (:whimboo) from comment #0)

> It looks like that there is a small chance to hit a race when a
> navigation command is active and the framescript gets reloaded.

Maybe I’m missing some context here, but why is the message to begin
navigation sent from chrome space before the content frame script has
registered?  What is your definition of ‘active’ here?

> Reason is that the loadListener isn't getting initialized immediately,
> but via the driver once the framescript is done registering. Also
> it might take some extra milliseconds to send the request to the
> listener.

Not entirely sure what you mean that is initialised “via
the driver”.  Do you mean that a navigation attempt is
in progress when driver.js calls get/back/forward through
this.curBrowser.pendingCommands?

> Maybe we should make the loadListener global, and always active.

That sounds like a good idea.  Navigation may in principle happen at any
time, and not as a direct result of a Marionette-initiated command.  We
want to be able to detect this.

Ideally, we would be able to receive all the relevant navigation events
in chrome by attaching a web progress listener to the <xul:browser>, but
I believe you had problems with this in the past.  I don’t remember
the details, but as far as I understood we received some events, but not
everything?  Is attaching a web progress listener to <xul:browser> in
chrome supposed to have feature parity with doing it in a frame script?
Is there a bug tracking the missing events?

> Such a move might not be that easy due it would need some kind of
> syncing.

Maybe a combination of Marionette-internal CustomEvents and a way to
query the current state would be a good idea.  We could tie this state
and the events to the various page load strategies we need to fulfill.
Flags: needinfo?(ato)
(In reply to Andreas Tolfsen ‹:ato› from comment #1)
> > It looks like that there is a small chance to hit a race when a
> > navigation command is active and the framescript gets reloaded.
> 
> Maybe I’m missing some context here, but why is the message to begin
> navigation sent from chrome space before the content frame script has
> registered?  What is your definition of ‘active’ here?

Active means that we currently run any command which causes a navigation like `get`, or `back`. And in such a case the framescript gets reloaded, which can happen due to a remoteness change.

> > Reason is that the loadListener isn't getting initialized immediately,
> > but via the driver once the framescript is done registering. Also
> > it might take some extra milliseconds to send the request to the
> > listener.
> 
> Not entirely sure what you mean that is initialised “via
> the driver”.  Do you mean that a navigation attempt is
> in progress when driver.js calls get/back/forward through
> this.curBrowser.pendingCommands?

Correct. Right after the framescript has been registered itself.

> > Maybe we should make the loadListener global, and always active.
> 
> That sounds like a good idea.  Navigation may in principle happen at any
> time, and not as a direct result of a Marionette-initiated command.  We
> want to be able to detect this.
> 
> Ideally, we would be able to receive all the relevant navigation events
> in chrome by attaching a web progress listener to the <xul:browser>, but
> I believe you had problems with this in the past.  I don’t remember
> the details, but as far as I understood we received some events, but not
> everything?  Is attaching a web progress listener to <xul:browser> in
> chrome supposed to have feature parity with doing it in a frame script?
> Is there a bug tracking the missing events?

I wouldn't do the move to the parent process together with making it global. I would first like to care about the global running state. Then we can check what happens in the parent process. As mentioned on another bug I haven't tested the new loadListener with the `contentBrowser` yet, so I cannot say what's exactly missing. 

> > Such a move might not be that easy due it would need some kind of
> > syncing.
> 
> Maybe a combination of Marionette-internal CustomEvents and a way to
> query the current state would be a good idea.  We could tie this state
> and the events to the various page load strategies we need to fulfill.

As long as we are performing the checks in the content script, I don't see a requirement to have custom events. It only makes it more difficult.
Blocks: 1360446
I did a check yesterday by already defining every possible event when the frame script gets reloaded. But as it looks like not a singly load event is recognized. I will have a chat with Olli next week about our page load algorithm and what could be improved.
So I may have more info now. While working on bug 1332122 I noticed that in some cases the listener gets reloaded a couple of times during a single navigation command! I could imagine this happens because of multiple content processes, and that the tab gets moved between processes. I still cannot reproduce it locally!

What could happen is that with each reload the listener gets a new outer window id, and the currently active navigation command sends messages with the previously known curFrame id. That will end-up in the nirvana. With my work on bug 1332122 I will have to change when a new browser gets registered. So it will not only happen in case of a remoteness change, but also when the listener just gets reloaded. This ensures that we always track the current instance of the listener and not a non-existent anymore. Lets see if that helps.
Depends on: 1332122
Blocks: 1332122
No longer depends on: 1332122
No longer blocks: 1332122
Depends on: 1332122
Actually bug 1378810 could have an influence here, but not so sure. Lets wait after it got landed.
Depends on: 1378810
Blocks: 1379061
Blocks: 1379133
Blocks: 1379206
Raising to P1 because we are getting more and more test failures due to reloads of the listener script. Probably this also happens due to the enabling of multiple content processes, and the frame script gets reloaded more often for a better performance.

I will have a look into this bug soon.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: -- → P1
I can reproduce this very rarely locally on OS X. In most cases after 55 repeats of the navigation tests specifically for file:// urls. Lets see what's actually causing this.
So I caught the same situation again today when loading a file:// URL. As attached you can find the tracing output and some additional debug output from Marionette, which I added locally.

In both cases the framescript gets reloaded right after triggering the load request. And as such once it has been registered by the parent process the page load events are getting added again (REGISTERED PAGELOAD LISTENERS). But that happens kinda late in the failing case (600ms) compared to the passing test (300ms). To see if we miss any events between the framescript got reloaded and the listeners attached, I directly registered listeners in `registerSelf`. As you can see in the failing case we clearly miss the necessary `DOMContentLoaded` and `pageshow` events by around 300ms.
So there are two possible options for us:

1) Make the load listener to work globally, so that page load events can be received and handled at any time.

2) Move the page load handling completely to the parent process. That's something which I tried already on bug 1333458 but failed at this time because of missing events. Since then lots of changes have been done, and it might be helpful to check again.

Olli, if we would go with option 2) here, would we get all the following events from the contentBrowser?

* hashchange
* beforeunload, pagehide, unload
* DOMContentLoaded, pageshow
* The above for specific frames if a frameset is open in the contentBrowser
* if one of those is not available would a webprogress listener be more effective, or is that too much overhead?
Flags: needinfo?(bugs)
Summary: Possible race between frame script reloaded and load listener registration → Missing page load events between frame script reloaded and load listener registration
It looks like that in failing cases our registration of the framescript takes way too long. Compared to the ~30ms in usual cases it takes up to 250ms, with the additional time to register the load listener which also adds ~40ms on top:

1500049142847	Marionette	DEBUG	Register listener.js for window 313532612609
1500049143087	Marionette	DEBUG	Registered listener.js
1500049143124	Marionette	INFO	 *********** REGISTERED PAGELOAD LISTENERS *********

So this explains the ~300ms delay. I just wonder if we can speed up the registration by killing all the back and forth messaging before it is ready. But well, even then we might see this race happening when something slows down Firefox.
I have no idea what frame script reloading means here. There isn't such thing in Gecko. Is that something some code explicitly does?
If one loads framescript before loading any page to a tab and adds event listeners to the global object of the framescript, one would get all the events.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #11)
> I have no idea what frame script reloading means here. There isn't such
> thing in Gecko. Is that something some code explicitly does?

Well, if it's not reloading it's something related to the outer window id change. In case of loading file:// URLs we always use a restricted sandbox now, and as such our loaded framescript gets re-initiated (reloaded). Not sure what the correct wording is actually. Maybe that explains it better for you? Also an another case are remoteness changes, which also causes the same.

> If one loads framescript before loading any page to a tab and adds event
> listeners to the global object of the framescript, one would get all the
> events.

Right, until eg. a remoteness change happens. Then the event listeners have to be registered again.
Flags: needinfo?(bugs)
I think another easier fix for now is when we explicitly check the document readyState before attaching any kind of listeners in case of a known frame script reload. In such a case we are always aware that we are in a navigation request for the given commands. 

Not sure if this could break behavior for redirects, but that is something we can check once I work on those bugs.
oh, sure, if a new child process is created to load file:// or so, then frame scripts are loaded to that process.

It is still unclear to me when you load the frame scripts. Since you should be able to load them before any page has been loaded to a tab.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #14)
> It is still unclear to me when you load the frame scripts. Since you should
> be able to load them before any page has been loaded to a tab.

We load the frame script explicitly once at the beginning and let Firefox do the loading of it into new child processes and browsers itself. This is done via

> mm.loadFrameScript(FRAME_SCRIPT, true);

Sadly Marionette takes longer to register such a new content process, and because of that we miss the events. I could refactor all that, but before I do so it would be good to know the answers of my questions posted on comment 9. Thanks!
Flags: needinfo?(bugs)
So I checked how we could handle the situation with checking the readyState, and so far I was able to get rid of this race! I think that this solution has the least impact in terms of causing regressions. 

The only thing which is annoying is that we cannot rely on the `interactive` state because for a new content process we still load `about:blank` first, which would trigger an early return. So we will have to wait for a `complete` ready state, in such a situation.
Attachment #8887460 - Flags: review?(ato)
Comment on attachment 8887460 [details]
Bug 1368434 - Fix race condition for slow frame script registrations.

https://reviewboard.mozilla.org/r/158298/#review163688

::: commit-message-5e73b:2
(Diff revision 1)
> +Bug 1368434 - Missing page load events between frame script reloaded and load listener registration
> +

I don't have the time today anymore to add more details here, but I will do tomorrow. I would like to get your review as early as possible so just requesting it now.
Blocks: 1381939
Attachment #8887460 - Flags: review?(ato) → review?(dburns)
Given that this is blocking so many intermittent failures, I am excited to see a review request posted on this bug.
Comment on attachment 8887460 [details]
Bug 1368434 - Fix race condition for slow frame script registrations.

https://reviewboard.mozilla.org/r/158298/#review164768
Attachment #8887460 - Flags: review?(dburns) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4f240a4f635b
Fix race condition for slow frame script registrations. r=automatedtester
https://hg.mozilla.org/mozilla-central/rev/4f240a4f635b
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
David, not sure if you want that for the 55.0 release. IMO it might be kinda helpful. But in this case it would need a backport patch because it is in conflict when bug 1332122 is not uplifted. The latter I feel is too risky to uplift that late in the cycle. So if we want that for 55, someone else has to create the backport patch. When I'm back from PTO it will be too late.
Flags: needinfo?(dburns)
Due to PTOs this will not be backported.
Flags: needinfo?(dburns)
Too late for 55. Mark 55 won't fix.
Blocks: webdriver
Whiteboard: [spec][17/07]
Flags: needinfo?(bugs)
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: