Closed Bug 1347458 Opened 7 years ago Closed 4 years ago

In case of remoteness changes the driver has to re-establish the communication with the framescript and not hang

Categories

(Remote Protocol :: Marionette, defect, P3)

Version 3
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: whimboo, Unassigned)

References

Details

As seen at least for bug 1347456 from Treeherder but also noticed locally a couple of times, we have hangs in the driver when a remoteness change occurred and the communiction with the framescript happened via the dispatcher while the command hasn't queued in pendingCommands.

This is clearly a serious issue and can happen at various times when a command triggers a page load and Marionette doesn't wait for it to be finished. Please note that right now we only wait for that in get(), goBack(), and goForward(). But nowhere else.

I will try to setup a testcase for that which actually might not be that hard.
Blocks: 1347112
Trying to reproduce with current code is hard, but if we exclude a page load for file:// urls in navigate.js, the following snippet will promptly cause a hang all the times:

    def test(self):
        self.marionette.navigate("file:///")
        self.marionette.get_url()

The call to get_url() never returns, and Marionette runs into the socket timeout.
Btw. there seem to be events we might be able to use to get informed about a remoteness change. Those are BeforeTabRemotenessChange and TabRemotenessChange.

Maybe we should listen to the events and re-send the message to the listener once the remoteness change has been completed.
Blocks: 1360466
Something which also helps here is to ensure that every possible navigation request is clearly routed through the loadListener. But this is something we cannot always expect, eg. if the user presses enter on a submit button. In such a case a new page is most likely to get loaded, and the following command will fail to execute due to a possible remoteness change.
No longer blocks: 1362744
Priority: -- → P3
The underlying issue here is that at any point, although more likely
during navigation, we cannot trust the content frame script to send
back a reply to the IPC message because the content frame might have
suffered a remoteness change and the frame script will have been reloaded.

When the frame script gets reloaded it loses all context.  After it
is has finished re-registering we are forced to re-send the IPC
message so that it can finish whatever work it was doing before,
and then send back an ack to the original IPC request, resolving
the promise in testing/marionette/proxy.js.

The window tracking refactoring is not going to solve this fundamental
limitation of the IPC system.  Instead I think we need to look at
other alternatives, such as perhaps building in a message queue
system in testing/marionette/proxy.js that always re-sends IPC
messages when we notice a remoteness change occurring.  This means
we would need a reliable way of detecting remoteness changes as
well as reworking Marionette:waitForPageLoaded because it is not
exactly re-sending the same Marionette:get IPC message.

Because the window tracking patches will contain some of the
tools we need to build such a system, I will leave the dependency
in place.  I should end by saying that I’m not 100% sure this is
the right approach, but it seems plausible that we can get rid of
the this.curBrowser.pendingCommands.push(() => { … }) paradigm
by enqueing _all_ IPC messages.
Please note that we can NOT send the original IPC message multiple times. At least not for those types of commands which actually trigger an activity by the browser. Sending the same message again would trigger that action again, and should clearly cause unexpected behavior.

This is indeed tricky, and I'm also not sure without thinking too much about all that, what's best to do.
(In reply to Henrik Skupin (:whimboo) from comment #5)

> Please note that we can NOT send the original IPC message multiple
> times. At least not for those types of commands which actually
> trigger an activity by the browser. Sending the same message
> again would trigger that action again, and should clearly cause
> unexpected behavior.

Indeed, that’s not exactly what I meant.  It is true that the two
messages are distinct.  In other words, certain commands have a
two-step round trip between chrome and the frame script before they
succeed.
Blocks: 1664968
No longer blocks: 1664968

We are not going to fix the framescript case. Switching to JSWindowActor should actually make this work. See also bug 1519354 as reference.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.