Marionette incompatiblities during page navigation with multiple content processes
Categories
(Remote Protocol :: Marionette, defect, P3)
Tracking
(firefox58 affected, firefox59 affected)
People
(Reporter: whimboo, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
1.44 KB,
text/plain
|
Details |
The failure happens mainly for test_navigation.py, and here in `setUp()`:
> self.marionette.navigate(self.marionette.absolute_url("windowHandles.html"))
> --> self.new_tab = self.open_tab(open_with_link)
In `open_tab` a click on an element is performed which should be immediately present when the loading of "windowHandles.html" has been done. But the following is visible:
05:21:34 INFO - [Parent 1750, Main Thread] WARNING: [nsFrameLoader] ReallyStartLoadingInternal tried but couldn't show remote browser.
05:21:34 INFO - : file /builds/worker/workspace/build/src/dom/base/nsFrameLoader.cpp, line 904
It means the content browser is not there.
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 1•7 years ago
|
||
I added a screenshot as attachment 8934897 [details] accidentally to bug 1422741. Mike, can you help us with this failure in Firefox?
Reporter | ||
Comment 2•7 years ago
|
||
Since my patch on bug 1417051 has been landed we see it kinda often now, mostly for debug builds across platforms for the test: https://dxr.mozilla.org/mozilla-central/rev/a928be5dacc3b544e29c0612b3f8cda6447df802/testing/marionette/harness/marionette_harness/tests/unit/test_navigation.py#571 Due to the high intermittent failure rate I have to skip this test now via bug 1422741.
Reporter | ||
Comment 3•7 years ago
|
||
I can now finally reproduce it locally by just running TestPageLoadStrategy.test_strategy_after_remoteness_change in a loop. So this test is doing something which following tests stumble over and cause Firefox to misbehave.
Reporter | ||
Comment 4•7 years ago
|
||
Meanwhile I'm not sure if the warning as logged by Firefox is the real underlying issue. I base this on the following trace:
> 1512560287116 Marionette TRACE 183 -> [0,11,"findElement",{"using":"id","value":"new-blank-tab"}]
> 1512560287136 Marionette TRACE 183 <- [1,11,null,{"value":{"element-6066-11e4-a52e-4f735466cecf":"b926a9dd-6be0-4b47-b63b-b05d3e4079b8","ELEMENT":"b926a9dd-6be0-4b47-b63b-b05d3e4079b8"}}]
> 1512560287139 Marionette TRACE 183 -> [0,12,"clickElement",{"id":"b926a9dd-6be0-4b47-b63b-b05d3e4079b8"}]
> **** Failed with: Web element reference not seen before: b926a9dd-6be0-4b47-b63b-b05d3e4079b8
> stacktrace:
> WebDriverError@chrome://marionette/content/error.js:172:5
> NoSuchElementError@chrome://marionette/content/error.js:400:5
> get@chrome://marionette/content/element.js:231:13
> clickElement@chrome://marionette/content/listener.js:1318:14
As you can see we find the element with the id `new-blank-tab` but then fail to click on the exact same element. Note that we are still in the same window - there was no call to `switch_to_window` happening.
Now the interesting fact... When this failure gets thrown we have 2(!!) tab open. The first one with the wanted URL containing the element we want to click on, and a second blank tab. More importantly this second tab is selected and in the foreground. It feels that we have raised this exception, but at the same time successfully clicked on the element, which should have opened a new blank tab.
I will add more logging to the test, so that we might get an idea what's up with it.
Reporter | ||
Comment 5•7 years ago
|
||
Something is broken in the element store so the exception gets raised, but we still click the link, so that a new tab with about:blank opens right away: 1512562569959 Marionette TRACE 57 -> [0,11,"getWindowHandles",{}] 1512562569962 Marionette TRACE 57 <- [1,11,null,["2147483649"]] 1512562569983 Marionette TRACE 57 -> [0,12,"getWindowHandle",{}] 1512562569985 Marionette TRACE 57 <- [1,12,null,{"value":"2147483649"}] 1512562569988 Marionette TRACE 57 -> [0,13,"findElement",{"using":"id","value":"new-blank-tab"}] 1512562570018 Marionette TRACE 57 <- [1,13,null,{"value":{"element-6066-11e4-a52e-4f735466cecf":"e1efd178-568b-e748-bed1-3a682fda5f06","ELEMENT":"e1efd178-568b-e748-bed1-3a682fda5f06"}}] 1512562570023 Marionette TRACE 57 -> [0,14,"getWindowHandles",{}] 1512562570025 Marionette TRACE 57 <- [1,14,null,["2147483649"]] 1512562570029 Marionette TRACE 57 -> [0,15,"getWindowHandle",{}] 1512562570031 Marionette TRACE 57 <- [1,15,null,{"value":"2147483649"}] 1512562570044 Marionette TRACE 57 -> [0,16,"clickElement",{"id":"e1efd178-568b-e748-bed1-3a682fda5f06"}] 1512562570069 Marionette TRACE 57 <- [1,16,{"error":"no such element","message":"Web element reference not seen before: e1efd178-568b-e748-bed1-3a682fda5f06","sta ... :5\nget@chrome://marionette/content/element.js:231:13\nclickElement@chrome://marionette/content/listener.js:1318:14\n"},null] **** Failed with: Web element reference not seen before: e1efd178-568b-e748-bed1-3a682fda5f06 stacktrace: WebDriverError@chrome://marionette/content/error.js:172:5 NoSuchElementError@chrome://marionette/content/error.js:400:5 get@chrome://marionette/content/element.js:231:13 clickElement@chrome://marionette/content/listener.js:1318:14 1512562570082 Marionette TRACE 57 -> [0,17,"getWindowHandles",{}] 1512562570084 Marionette TRACE 57 <- [1,17,null,["2147483649"]] [Parent 93387, Main Thread] WARNING: [nsFrameLoader] ReallyStartLoadingInternal tried but couldn't show remote browser. : file /builds/worker/workspace/build/src/dom/base/nsFrameLoader.cpp, line 904 1512562570160 Marionette TRACE 57 -> [0,18,"getWindowHandle",{}] 1512562570161 Marionette TRACE 57 <- [1,18,null,{"value":"2147483649"}] 1512562570161 Marionette DEBUG Register listener.js for window 2147483728 1512562573171 Marionette TRACE 57 -> [0,19,"getWindowHandles",{}] 1512562573173 Marionette TRACE 57 <- [1,19,null,["2147483649","2147483728"]] 1512562573177 Marionette TRACE 57 -> [0,20,"getWindowHandle",{}] 1512562573178 Marionette TRACE 57 <- [1,20,null,{"value":"2147483649"}] In that case we should not throw the exception because the element has to be present. So I wonder why the element cannot be found. Most likely because it was wrongly or not stored. On the other side the click should not happen at all if the element retrieval fails! I don't think that this is a problem with Firefox anymore, but that Marionette is misbehaving here. I will further investigate. Given that this causes misbehavior for users of Marionette/Geckodriver, and is the reason for one of the top intermittents of Mn jobs since yesterday, this should be a P1.
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 6•7 years ago
|
||
Ok, so it seems that this is only the click command which sometimes fail: 1512563284055 Marionette TRACE 189 -> [0,16,"getElementText",{"id":"63ef8e25-5f7d-824b-bcf1-07f147f26788"}] New blank Tab 1512563284115 Marionette TRACE 189 <- [1,16,null,{"value":"New blank Tab"}] 1512563284119 Marionette TRACE 189 -> [0,17,"clickElement",{"id":"63ef8e25-5f7d-824b-bcf1-07f147f26788"}] 1512563284139 Marionette TRACE 189 <- [1,17,{"error":"no such element","message":"Web element reference not seen before: 63ef8e25-5f7d-824b-bcf1-07f147f26788","sta ... :5\nget@chrome://marionette/content/element.js:231:13\nclickElement@chrome://marionette/content/listener.js:1318:14\n"},null] I added a dump statement to listener.js:clickElement to check if it is present, and got the following: 1512565871143 Marionette TRACE 153 -> [0,17,"clickElement",{"id":"cf44f498-6faf-c04d-89a1-c229d7eb742b"}] ** has element [object ContentWebElement uuid=cf44f498-6faf-c04d-89a1-c229d7eb742b]: true ** has element [object ContentWebElement uuid=cf44f498-6faf-c04d-89a1-c229d7eb742b]: false It means that `clickElement` in listener.js is getting run twice right after each other. The first time we have a valid window reference and element store, but for the second time it fails. Interestingly I also see duplicated DEBUG output of the page load listener: 1512565870337 Marionette DEBUG Received DOM event "pagehide" for "http://127.0.0.1:64698/windowHandles.html" 1512565870338 Marionette DEBUG Received DOM event "pagehide" for "http://127.0.0.1:64698/windowHandles.html" 1512565870340 Marionette DEBUG Received DOM event "unload" for "http://127.0.0.1:64698/windowHandles.html" 1512565870341 Marionette DEBUG Received DOM event "unload" for "http://127.0.0.1:64698/windowHandles.html" ++DOMWINDOW == 19 (0x11173b400) [pid = 94161] [serial = 207] [outer = 0x110547e20] 1512565870366 Marionette DEBUG Received DOM event "DOMContentLoaded" for "http://127.0.0.1:64698/windowHandles.html" 1512565870367 Marionette DEBUG Received DOM event "DOMContentLoaded" for "http://127.0.0.1:64698/windowHandles.html" 1512565870375 Marionette DEBUG Received DOM event "pageshow" for "http://127.0.0.1:64698/windowHandles.html" 1512565870376 Marionette DEBUG Received DOM event "pageshow" for "http://127.0.0.1:64698/windowHandles.html" Going further back (specifically to the beginning of the test) that output is shown: 1512565870044 Marionette DEBUG Register listener.js for window 2147483649 1512565870078 Marionette DEBUG Register listener.js for window 2147483656 1512565870090 Marionette TRACE 153 <- [1,1,null,{"sessionId":"4f16ee80-6943-5e45-9c3f-0ede90db2e90","capabilities":{"browserName":"firefox","browserVersion":"59.0a ... ssID":94159,"moz:profile":"/var/folders/4k/sf4gz5fn3kl9hr3nd7pzbvhc0000gn/T/tmpyGHFva.mozrunner","moz:webdriverClick":true}}] 1512565870102 Marionette TRACE 153 -> [0,2,"setTimeouts",{"script":30000}] 1512565870106 Marionette TRACE 153 <- [1,2,null,{}] 1512565870115 Marionette DEBUG Register listener.js for window 2147483649 1512565870116 Marionette TRACE 153 -> [0,3,"setTimeouts",{"pageLoad":300000}] 1512565870120 Marionette TRACE 153 <- [1,3,null,{}] 1512565870128 Marionette TRACE 153 -> [0,4,"setTimeouts",{"implicit":0}] 1512565870131 Marionette TRACE 153 <- [1,4,null,{}] 1512565870136 Marionette DEBUG Register listener.js for window 2147483656 1512565870141 Marionette TRACE 153 -> [0,5,"getCurrentChromeWindowHandle",{}] 1512565870143 Marionette TRACE 153 <- [1,5,null,{"value":"3"}] 1512565870151 Marionette TRACE 153 -> [0,6,"getChromeWindowHandles",{}] 1512565870155 Marionette TRACE 153 <- [1,6,null,["3"]] 1512565870159 Marionette TRACE 153 -> [0,7,"getWindowHandle",{}] 1512565870160 Marionette TRACE 153 <- [1,7,null,{"value":"2147483649"}] There are multiple lines for "Register listener.js" which means that our framescript is loaded more than once while the maybe the tab is getting moved around between different content processes?
Reporter | ||
Comment 7•7 years ago
|
||
Here the different registrations for our framescript: 1512567917769 Marionette DEBUG Register listener.js for window 2147483649 - http://127.0.0.1:50369/windowHandles.html 1512567917852 Marionette DEBUG Register listener.js for window 2147483656 - about:newtab 1512567917890 Marionette DEBUG Register listener.js for window 2147483649 - http://127.0.0.1:50369/windowHandles.html 1512567917915 Marionette DEBUG Register listener.js for window 2147483656 - about:newtab When everything works fine the following output is seen: 1512567915722 Marionette DEBUG Register listener.js for window 2147483649 - http://127.0.0.1:50369/windowHandles.html 1512567915734 Marionette DEBUG Register listener.js for window 2147483656 - about:newtab The framescript is getting loaded twice that much as usual in the failing case.
Reporter | ||
Comment 8•7 years ago
|
||
The failure here is clearly related to multiple content processes. When I set `dom.ipc.processCount` to `1` it's no longer visible, and as higher its value is as better I can reproduce it. So moving the tabs several times between content processes, which causes our framescript to get registered again and again, creates a situation where we start working with invalid references. As attached you can find a Marionette test, which should be executed with the following command as best with a debug build: `mach marionette test %path_to_test_file% --run-until-failure`
Reporter | ||
Comment 9•7 years ago
|
||
I have an explanation of what's going on here...
In Marionette we have the framescript (listener.js) which gets loaded and registered whenever a tab is opened, changes its remoteness state, or when it gets moved to another process. In all those situations the outer window id of the browser changes, and as such the id of the framescript, and all its message listeners. Further we use the broadcast message manager right now to communicate between the parent and the framescript, which is not ideal.
For commands which have to survive a re-registering of the framescript (like navigation commands, and click), a queue called pendingCommands is used. It stores data which has to be communicated to the framescript in case of a reload. This works fine for a single reload of the framescript. But with multiple content processes it can happen that a tab gets moved around between processes more than once during the command execution. In such a situation the pendingCommands queue has been flushed, and we no longer let the framescript know about a still ongoing command.
There are currently two different situations we end-up in:
1) For commands like execute_async_script() Marionette would wait forever and never send a response to the client, until it hits the global timeout and force kills Firefox.
2) Under some situations our framescript is getting loaded twice for the same tab. Here an example when calling newSession with two open tabs:
> 1512648283578 Marionette DEBUG Register listener.js for window 2147483649 ([object ContentFrameMessageManager @ 0x117df71c0 (native @ 0x117df14c0)]
> 1512648283585 Marionette DEBUG **** start listeners in listener.js
> 1512648283590 Marionette TRACE 84 <- [1,1,null,{"sessionId":"eff0b5c7-f894-9f48-bf75-eacdb7e94d92","capabilities":{"browserName":"firefox","browserVersion":"59.0a ... essID":1828,"moz:profile":"/var/folders/4k/sf4gz5fn3kl9hr3nd7pzbvhc0000gn/T/tmpF6xB7k.mozrunner","moz:webdriverClick":true}}]
> 1512648283593 Marionette DEBUG Register listener.js for window 2147483656 ([object ContentFrameMessageManager @ 0x121730280 (native @ 0x12172ad00)]
> 1512648283600 Marionette TRACE 84 -> [0,2,"setTimeouts",{"script":30000}]
> 1512648283601 Marionette TRACE 84 <- [1,2,null,{}]
> 1512648283602 Marionette DEBUG **** start listeners in listener.js
> 1512648283603 Marionette TRACE 84 -> [0,3,"setTimeouts",{"pageLoad":300000}]
> 1512648283606 Marionette TRACE 84 <- [1,3,null,{}]
> 1512648283610 Marionette TRACE 84 -> [0,4,"setTimeouts",{"implicit":0}]
> 1512648283612 Marionette TRACE 84 <- [1,4,null,{}]
> 1512648283612 Marionette DEBUG Register listener.js for window 2147483649 ([object ContentFrameMessageManager @ 0x117df71c0 (native @ 0x117df14c0)]
> 1512648283614 Marionette TRACE 84 -> [0,5,"getCurrentChromeWindowHandle",{}]
> 1512648283615 Marionette TRACE 84 <- [1,5,null,{"value":"3"}]
> 1512648283615 Marionette DEBUG **** start listeners in listener.js
> 1512648283619 Marionette TRACE 84 -> [0,6,"getChromeWindowHandles",{}]
> 1512648283620 Marionette TRACE 84 <- [1,6,null,["3"]]
> 1512648283622 Marionette DEBUG Register listener.js for window 2147483656 ([object ContentFrameMessageManager @ 0x121730280 (native @ 0x12172ad00)]
For both tabs the framescript is loaded twice, and as such all the event listeners are getting registered twice!
Reporter | ||
Comment 10•7 years ago
|
||
It looks similar to what Andreas fixed on bug 1239552, but that code is still in place. So I wonder what else could cause a reload of the framescript in the same tab without a window id change. Not sure what the best here is. Just to wait for bug 1311041 being fixed, or even to temporarily disable multiple content processes in Marionette until then. Andreas, given that you refactor that much code in this area right now, could you have a look?
Comment 11•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [mostly away Dec 11th - Jan 3rd] from comment #10) > It looks similar to what Andreas fixed on bug 1239552, but that > code is still in place. So I wonder what else could cause a reload > of the framescript in the same tab without a window id change. > > Not sure what the best here is. Just to wait for bug 1311041 being > fixed, or even to temporarily disable multiple content processes > in Marionette until then. First of all, thanks for doing the legwork on researching this. > Andreas, given that you refactor that much code in this area right > now, could you have a look? I don’t have a strong opinion… I do think that the window refactoring will fix, or make it easier to fix, this bug. If we think it will provide a better user experience to momentarily disable multiprocessing I would be fine with that. We could set a recommended preference in geckodriver’s testing/geckodriver/src/prefs.rs, or are you suggesting disabling e10s also for m-c?
Reporter | ||
Comment 12•6 years ago
|
||
With bug 1426154 being fixed I tried out my testcase as attached to the bug and was first happy that nothing failed. I retriggered a local test run with repeat 100 a couple of times and all was fine. Then I just wanted to see what --repeat 500 brings us. I started this command, and then it failed: > 1515668925204 Marionette TRACE 1116 -> [0,13,"get",{"url":"http://127.0.0.1:63310/windowHandles.html"}] > 1515668925209 Marionette DEBUG Received DOM event "beforeunload" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925217 Marionette DEBUG Received DOM event "beforeunload" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925218 Marionette DEBUG Received DOM event "beforeunload" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925228 Marionette DEBUG Received DOM event "pagehide" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925228 Marionette DEBUG Received DOM event "pagehide" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925229 Marionette DEBUG Received DOM event "unload" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925230 Marionette DEBUG Received DOM event "unload" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925235 Marionette DEBUG Received DOM event "DOMContentLoaded" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925247 Marionette DEBUG Received DOM event "DOMContentLoaded" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925253 Marionette DEBUG Received DOM event "pageshow" for "http://127.0.0.1:63310/windowHandles.html" > 1515668925253 Marionette DEBUG Received DOM event "pageshow" for "http://127.0.0.1:63310/windowHandles.html" > [..] > NoSuchElementException: Failed to trigger opening a new tab: Web element reference not seen before: d96649a4-a0ba-9a4b-9917-ce91b409665a > [..] > passed: 371 > failed: 1 > todo: 0 It is very very seldom that this failure is happening now, and this is fantasic! But Marionette still seems to fail, or maybe what we could also consider is a bug in IPC core. But that this hard to tell, and I don't have the time to further dig into this problem. I would say we first wait for window tracking patch to be landed. Then we can continue here.
Reporter | ||
Comment 13•6 years ago
|
||
I'm going to lower the priority given that this problem doesn't appear that often anymore.
Reporter | ||
Comment 14•6 years ago
|
||
I think that the underlying problem here is bug 1503274.
Reporter | ||
Comment 15•6 years ago
|
||
I will leave this bug open until we have bug 1311041 fixed, given that it will implement a sane tracking of open windows and tabs.
Reporter | ||
Comment 16•6 years ago
|
||
Note, that also with the just landed fix on bug 1509906 Firefox directly loads the target URL in the correct process type.
Reporter | ||
Comment 17•5 years ago
|
||
All issues as noticed for multiple processes so far were navigation related. With all the fixes I landed in the recent weeks I tried the attached testcase again, and wasn't able to let it fail even once for 500 iterations. Also the referenced test in test_navigation.py (bug 1422741) seems to work fine now.
I think that we can finally close this bug. There is nothing more which I can do for now, but will reopen it in case something else is left over.
Comment 18•5 years ago
|
||
I'm seeing a Marionette failure with multiple windows that has a related error trace.
Would changing dom.ipc.processCount
to 1
help debug the issue?
Reporter | ||
Comment 19•5 years ago
|
||
Feel free to attach the trace output of that failing test. I would be interested to see it. Also if you can reproduce it a testcase would be fantastic.
Updated•1 year ago
|
Description
•