WebDriver:NewWindow with type "tab" hangs in upstream wpt stability jobs
Categories
(Remote Protocol :: Marionette, defect, P2)
Tracking
(firefox97 fixed)
Tracking | Status | |
---|---|---|
firefox97 | --- | fixed |
People
(Reporter: whimboo, Assigned: whimboo)
References
Details
(Whiteboard: [bidi-m3-mvp])
Attachments
(3 files)
See https://github.com/web-platform-tests/wpt/issues/31499 for the upstream issue.
The problem seems to only happen with the tests test_no_top_browsing_context
for most of the WebDriver commands. In these cases the New Window
command with type tab
doesn't return. Interestingly this only happens upstream and not in our own CI.
Here an excerpt:
0:29.61 pid:1401 1636025190236 Marionette DEBUG 0 -> [0,241,"WebDriver:NewWindow",{"type":"tab"}]
0:29.62 pid:1401 1636025190241 Marionette TRACE Received observer notification domwindowopened
0:29.74 pid:1401 1636025190366 Marionette TRACE Received event beforeunload for about:blank
0:29.75 pid:1401 1636025190376 Marionette TRACE Received event pagehide for about:blank
0:29.76 pid:1401 1636025190381 Marionette TRACE Received event DOMContentLoaded for about:blank
0:29.76 pid:1401 1636025190381 Marionette TRACE Received event pageshow for about:blank
0:59.37 TEST_END: TIMEOUT, expected OK
Most likely we hang in waiting for the load event of the newly opened tab:
Maybe it's caused by a remoteness change that we miss to detect and wait forever.
Assignee | ||
Comment 1•3 years ago
|
||
Sadly we print out the load events before checking the browsing context:
https://searchfox.org/mozilla-release/rev/67bec9f7c690bbc72cfefe652a9d5fffb961b698/remote/marionette/sync.js#550-557
As such it's possible that the browsing context has been changed and we did not detect that. Maybe it would be good to also use the WebProgressListener here similar to what we have in mind for bug 1739008.
Assignee | ||
Comment 2•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
I uploaded a small patch to improve the logging. We need to get it landed first and reach Nightly before I can test upstream again.
Assignee | ||
Comment 4•3 years ago
|
||
Lets wait for the code from bug 1739008 has been landed. It will make things easier also for this command.
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/00c3528c6e15 [marionette] Improve logging for load events in waitForLoadEvent. r=webdriver-reviewers,jdescottes
Comment 6•3 years ago
|
||
bugherder |
Assignee | ||
Comment 7•3 years ago
|
||
With the patch landed and on Nightly I ran another nightly stability job on the wpt CI and the WebDriver:NewWindow
command hangs with:
0:26.87 pid:1407 1636132910074 Marionette DEBUG 0 -> [0,241,"WebDriver:NewWindow",{"type":"tab"}]
0:26.88 pid:1407 1636132910080 Marionette TRACE Received observer notification domwindowopened
0:56.60 TEST_END: TIMEOUT, expected OK
As such it's clear that we do not listen on the correct browsing context for the page load events and as such wait forever for the tab to be loaded.
It's not clear why we do not see that in our own CI and cannot reproduce locally, but a fix is needed and it can be done equally to bug 1739008.
Assignee | ||
Comment 8•2 years ago
|
||
I pushed a try build for that bug:
https://treeherder.mozilla.org/jobs?repo=try&revision=9d21c3194a35a139258bd02a44b94a3360ac034d&selectedTaskRun=IGXt3JFJSs6nENfbCNtyJw.0
Sadly it's always failing the test test_navigation.py TestPageLoadStrategy.test_none
with the first patch applied. But that one only moves the already used code from driver.js
into the navigate.js
module, and doesn't change anything else. So this failure shouldn't actually happen.
Olli, are there any side effects with webProgress
listeners that we should be aware of? We register a listener but the onStateChange
method is never called, and as such our Promise doesn't resolve.
Comment 9•2 years ago
|
||
Is the listener registered early enough, and is it registered on the right object? Not sure what else I could say about any side effects.
Assignee | ||
Comment 10•2 years ago
|
||
Based on Olli's reply I had another look at these changes, and AFAICS there is actually a race here...
Without my patch we run the browsingContext.webProgress.isLoadingDocument
check in driver.js and that returns true
. It means that we triggered the page load, immediately destroyed the Marionette session, and the check in session creation triggers the code path that actually handles onStateChange
events and finally resolves the promise:
1639469808970 Marionette DEBUG 4 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
** url: http://127.0.0.1:60099/slow_resource.html
** initial document: false
** is loading: true
** await for navigation
By moving the code into navigate.js the situation is different. While all the state as printed is completely the same, there is no onStateChange
event fired and as such the promise never resolved:
** url: http://127.0.0.1:60198/slow_resource.html
** initial document: false
** is loading: true
** await for navigation
The URL is correct and as such I assume we attach the progress listener to the correct browsing context. But why it doesn't fire events even that I can see that the page finished loading is unclear.
Olli, what do you mean with registered early enough
? We attach it when the page is already loading, but also before it has been finished loading. Are there any MOZ_LOG capabilities that could be used to log more detailed information?
Assignee | ||
Comment 11•2 years ago
|
||
Even when subscribing for all notifications for this WebProgress I never receive an onStateChange
event. But when using the BCWebProgress:5
logger I can clearly see that there should be an incoming state change:
1640207319580 Marionette DEBUG 5 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
** add webprogress to browsing context 44 for http://127.0.0.1:62932/slow_resource.html
** initial document: false
** is loading: true
** await for navigation
[Parent 62110: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Transferring data from 127.0.0.1…") on {top:1, id:2c, url:http://127.0.0.1:62932/slow_resource.html}
[Parent 62110: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:62932/slow_resource.html, originalURI:http://127.0.0.1:62932/slow_resource.html}, STATE_STOP|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2c, url:http://127.0.0.1:62932/slow_resource.html}
Here is what we have without my patch:
1640207843183 Marionette DEBUG 4 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
[Parent 67098: Main Thread]: I/BCWebProgress OnStatusChange(<null>, <null>, NS_OK, "Transferring data from 127.0.0.1…") on {top:1, id:2c, url:http://127.0.0.1:63410/slow_resource.html}
[Parent 67098: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:1}, {URI:http://127.0.0.1:63410/slow_resource.html, originalURI:http://127.0.0.1:63410/slow_resource.html}, STATE_STOP|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:2c, url:http://127.0.0.1:63410/slow_resource.html}
1640207847136 Marionette DEBUG 4 <- [1,1,null,{"sessionId":"45f27c35-d06f-43ee-ba40-6f021c4b0a5a","capabilities":{"browserName":"firefox","browserV
So it's exactly the same but why does it not work when the handling of the webProgress is part of a different module (navigate.js instead of driver.js)? Maybe this is somewhat related to bug 1746483.
Assignee | ||
Comment 12•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #11)
Maybe this is somewhat related to bug 1746483.
Actually not. In this case we do not even see an onStateChange
with the logging module.
Assignee | ||
Comment 13•2 years ago
|
||
Assignee | ||
Comment 14•2 years ago
|
||
Depends on D134541
Assignee | ||
Comment 15•2 years ago
•
|
||
Hi Steve and Jon, could one of you please have a look at comment 11? With this patch as attached as WIP I'm moving the handling of a webProgress into an async method. The registered webProgressListener is then defined within this async method. But that is causing problems because the listener is a weak reference and get GC'ed before the page that the progresslistener is waiting for finishes loading.
I would be interested to know why the GC actually garbage collects the listener inside the async method which at this point is still running and the included promise hasn't been resolved?
Thanks a lot!
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 16•2 years ago
|
||
Running on try shows issues on Android where we seem to hang with the isInitialDocument
check included. We could leave it out or just disable it for WebDriver:NewSession
(see the proposal from Julian).
But before I can make a decision I would like to test myself locally on Android. Sadly various things are currently broken so that I cannot run the tests, and at least bug 1748626 is still required.
Updated•2 years ago
|
Comment 17•2 years ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c90687f11d9b [marionette] Move wait for initial page load logic into shared navigate module. r=webdriver-reviewers,jdescottes https://hg.mozilla.org/integration/autoland/rev/ebff6a7ca24b [marionette] Use webProgress to listen for the initial page being loaded in "WebDriver:NewWindow". r=webdriver-reviewers,jdescottes
Comment 18•2 years ago
|
||
bugherder |
Assignee | ||
Comment 19•2 years ago
|
||
Marking as fixed for now given that all necessary patches got landed. We see low frequent regressions that I'm going to observe the next days.
Assignee | ||
Comment 20•2 years ago
|
||
I checked the stability jobs of wpt and it looks to work fine now:
0:31.60 pid:1442 1641829191459 Marionette DEBUG 0 -> [0,23,"WebDriver:NewWindow",{"type":"tab"}]
0:31.61 pid:1442 1641829191466 Marionette TRACE Received DOM event TabOpen for [object XULElement]
0:31.63 pid:1442 1641829191493 RemoteAgent TRACE Initial navigation already completed
0:31.64 pid:1442 1641829191494 Marionette DEBUG 0 <- [1,23,null,{"handle":"d9522b07-d575-403b-8242-8bf144cb0365","type":"tab"}]
We still have the regressions, which might let us return a bit too early. I'll keep an eye on it.
Assignee | ||
Comment 21•2 years ago
|
||
An upcoming patch on bug 1747359 should fix all the known regressions.
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Comment 22•3 months ago
|
||
Looks like this got resolved.
Description
•