Closed
Bug 1357634
Opened 7 years ago
Closed 7 years ago
[e10s] Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username (because beforeunload is not taken into account)
Categories
(Testing :: Marionette Client and Harness, defect)
Tracking
(firefox-esr52 unaffected, firefox53 unaffected, firefox54 fixed, firefox55 fixed)
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | fixed |
firefox55 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: whimboo)
References
Details
(Keywords: intermittent-failure, Whiteboard: [spec][17/04][stockwell fixed])
Attachments
(1 file, 1 obsolete file)
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=92549182&repo=autoland https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-macosx64-debug/autoland_yosemite_r7-debug_test-marionette-e10s-bm108-tests1-macosx-build432.txt.gz
Assignee | ||
Comment 1•7 years ago
|
||
The steps look fine: 1492570419436 Marionette TRACE 374 -> [0,3,"findElement",{"using":"link text","value":"333333"}] 1492570419450 Marionette TRACE 374 <- [1,3,null,{"value":{"element-6066-11e4-a52e-4f735466cecf":"fc8cf1dc-6f3e-5d40-a642-95b16574241d","ELEMENT":"fc8cf1dc-6f3e-5d40-a642-95b16574241d"}}] 1492570419452 Marionette TRACE 374 -> [0,4,"clickElement",{"id":"fc8cf1dc-6f3e-5d40-a642-95b16574241d"}] 1492570420145 Marionette TRACE 374 <- [1,4,null,{}] ++DOMWINDOW == 20 (0x1106cb000) [pid = 2675] [serial = 32] [outer = 0x112891000] 1492570420160 Marionette TRACE 374 -> [0,5,"findElement",{"using":"id","value":"username"}] But sadly due to bug 1357407 we don't see the page load events. Usually `clickElement` should have waited for the page being loaded. It's a regression in a unit test caused by my changes from bug 1335778. The volume is kinda low for now. So I would like to wait for some more examples of this failure before continuing to investigate.
Blocks: 1335778
Keywords: regression
Comment hidden (Intermittent Failures Robot) |
Comment 3•7 years ago
|
||
:whimboo, this occurred 20 times yesterday, I think the volume is high enough to work on this :)
Whiteboard: [stockwell needswork]
Assignee | ||
Comment 4•7 years ago
|
||
The patch on bug 1335778 was backed out yesterday. So this should have stopped this failure from happening.
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox54:
--- → unaffected
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Updated•7 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed]
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
status-firefox53:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Assignee | ||
Comment 6•7 years ago
|
||
Looks like that it is still around but with a low frequency now: https://treeherder.mozilla.org/#/jobs?repo=autoland&bugfiler&selectedJob=94750326
Status: RESOLVED → REOPENED
Keywords: regression
Resolution: FIXED → ---
Whiteboard: [stockwell fixed] → [stockwell needswork]
Assignee | ||
Comment 7•7 years ago
|
||
It's very strange that the click doesn't cause a navigation: 1493299007208 Marionette TRACE 374 -> [0,4,"clickElement",{"id":"8f744058-5aa7-4c32-86cd-43d0f09893f1"}] 1493299008420 Marionette DEBUG Canceled page load listener because no navigation has been detected 1493299008638 Marionette TRACE 374 <- [1,4,null,{}] Which means after 1.4s there is still no sight of a navigation visible. Looks like this is only happening for debug builds.
Assignee: nobody → hskupin
Status: REOPENED → ASSIGNED
Summary: Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username → [e10s] Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username
Comment hidden (mozreview-request) |
Comment 9•7 years ago
|
||
mozreview-review |
Comment on attachment 8862735 [details] Bug 1357634 - Temporarily skip test_click_number_link and test_clicking_on_a_multiline_link for intermittent failures. https://reviewboard.mozilla.org/r/134604/#review137590
Attachment #8862735 -
Flags: review+
Comment 10•7 years ago
|
||
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. hg error in cmd: hg rebase -s 66cbe191d7e6 -d 42e64e0c808f: rebasing 392321:66cbe191d7e6 "Bug 1357634 - Temporarily skip test_click_number_link and test_clicking_on_a_multiline_link for intermittent failures. r=Tomcat" (tip) merging testing/marionette/harness/marionette_harness/tests/unit/test_click.py warning: conflicts while merging testing/marionette/harness/marionette_harness/tests/unit/test_click.py! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)
Comment 11•7 years ago
|
||
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. hg error in cmd: hg rebase -s 66cbe191d7e6 -d 42e64e0c808f: rebasing 392321:66cbe191d7e6 "Bug 1357634 - Temporarily skip test_click_number_link and test_clicking_on_a_multiline_link for intermittent failures. r=Tomcat" (tip) merging testing/marionette/harness/marionette_harness/tests/unit/test_click.py warning: conflicts while merging testing/marionette/harness/marionette_harness/tests/unit/test_click.py! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)
Comment hidden (mozreview-request) |
Comment 13•7 years ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f9e4d7e4253e Temporarily skip test_click_number_link and test_clicking_on_a_multiline_link for intermittent failures. r=Tomcat
Assignee | ||
Comment 14•7 years ago
|
||
I got this failure triggered locally now. To investigate the amount of time it needs for a pagehide/unload to happen, I commented out the call to loadListener.stop() and sendOk(). The result is the following: 1493368900377 Marionette TRACE 40 -> [0,4,"clickElement",{"id":"3cc7fe3d-2b72-394e-bcb6-fe607dd2346b"}] 1493368900573 Marionette DEBUG Flushed event loop. 1493368900594 Marionette DEBUG Flushed event loop. 1493368900777 Marionette DEBUG Canceled page load listener because no navigation has been detected 1493368901026 Marionette DEBUG Received DOM event "pagehide" for "http://127.0.0.1:50065/clicks.html" 1493368901027 Marionette DEBUG Received DOM event "unload" for "http://127.0.0.1:50065/clicks.html" It means that after returning from the `flushEventLoop` call, it takes up to 450ms until the first event is fired.
Assignee | ||
Comment 15•7 years ago
|
||
I think we have to bring back the `beforeunload` listener: 1493369406323 Marionette TRACE 110 -> [0,4,"clickElement",{"id":"93df523e-749d-0d4d-a2da-9695029c07fc"}] 1493369406532 Marionette DEBUG Flushed event loop. 1493369407348 Marionette DEBUG Received DOM event "beforeunload" for "http://127.0.0.1:50349/clicks.html" 1493369407348 Marionette DEBUG Flushed event loop. 1493369407356 Marionette DEBUG Canceled page load listener because no navigation has been detected 1493369407362 Marionette DEBUG Received DOM event "pagehide" for "http://127.0.0.1:50349/clicks.html" 1493369407362 Marionette DEBUG Received DOM event "unload" for "http://127.0.0.1:50349/clicks.html" But we should not take is as measure that the page load is actually happening. Instead we might want to increase the timeout for the unload timer, so it can wait a bit longer for the `unload` event. With such a change we wouldn't cause any delay in test execution for normal clicks which do not cause a page load, but only delay those cases when a `beforeunload` handler is setup, and which cancels the page load.
Assignee | ||
Comment 16•7 years ago
|
||
And that attempt works: 1493370050924 Marionette TRACE 110 -> [0,4,"clickElement",{"id":"52559715-f64e-fe4c-8d00-b5162edc2741"}] 1493370051152 Marionette DEBUG Flushed event loop. 1493370051194 Marionette DEBUG Received DOM event "beforeunload" for "http://127.0.0.1:50650/clicks.html" 1493370051194 Marionette DEBUG Flushed event loop. 1493370051507 Marionette DEBUG Received DOM event "pagehide" for "http://127.0.0.1:50650/clicks.html" 1493370051508 Marionette DEBUG Received DOM event "unload" for "http://127.0.0.1:50650/clicks.html" `beforeunload` got fired 42ms after `flushEventLoop` returned. It extended the timeout for PageUnload by another 5000ms, which I have chosen because its the default value for the implicit wait duration. Then `pagehide` got fired 355ms after `flushEventLoop` returned. I think that we should leave the timeout value of 200ms for waiting of the `beforeunload` event. Lower values like 100ms could be a risk. Also by having changed `flushEventLoop` from `unload` to `beforeunload`, we basically return early from this call in every case. So overall we should even have reduced the time to send the final response to the driver.
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 19•7 years ago
|
||
mozreview-review |
Comment on attachment 8862735 [details] Bug 1357634 - Temporarily skip test_click_number_link and test_clicking_on_a_multiline_link for intermittent failures. https://reviewboard.mozilla.org/r/134604/#review137632 ::: testing/marionette/harness/marionette_harness/tests/unit/test_click.py:10 (Diff revision 3) > import urllib > > from marionette_driver.by import By > from marionette_driver import errors > > -from marionette_harness import MarionetteTestCase, run_if_e10s > +from marionette_harness import MarionetteTestCase, run_if_e10s, skip This commit has to be removed from this series before landing.
Assignee | ||
Comment 20•7 years ago
|
||
I triggered a try build with those changes: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a56348712961c4160bb3a2832446303798f9ad4d
Assignee | ||
Updated•7 years ago
|
Attachment #8862792 -
Flags: review?(ato)
Assignee | ||
Updated•7 years ago
|
Whiteboard: [stockwell needswork] → [spec][17/04][stockwell needswork]
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 ::: commit-message-9a2c8:3 (Diff revision 1) > +Using only the 'unload' event to detect a page load is unreliable because > +of a possible slowness of the application. By also using 'beforeunload' a Why is it unreliable? ::: commit-message-9a2c8:8 (Diff revision 1) > +There is an expected additional delay for those cases when the unload gets > +canceled due to a registered 'beforeunload' listener on the page itself. What are you implying? That the document navigation gets cancelled in beforeunload, or that beforeunload takes a long time? In the first case, we would expect never to see an unload at all because navigation would not happen. In the latter, it’s not obvious to me what we should do. ::: commit-message-9a2c8:10 (Diff revision 1) > +Given that this will happen very rarely, reliability should be considered > +more important, and so a delay of 5s should make that happen. Why will it only happen rarely? Generally, I’m lacking “the story” in this commit message. By reading it, I don’t really understand what the issue is. ::: testing/marionette/listener.js:214 (Diff revision 1) > + case "beforeunload": > + // An upcoming page load has been detected. Give the 'unload' event a bit > + // more time due to a possible slowness of the application. In the case > + // when the page load gets aborted due to a registered 'beforeunload' > + // handler, the load listener will wait until the unload timer times out. > + this.timerPageUnload.cancel(); > + this.timerPageUnload.initWithCallback(this, 5000, Ci.nsITimer.TYPE_ONE_SHOT) > + break; The way I read this, the notify function gets called again after five seconds. What if the timerPageUnload that has already been registered fires before this? Does that cancel this? Also, what if the document’s beforeunload cancels the navigation? Does your implementation handle that?
Attachment #8862792 -
Flags: review?(ato) → review+
Assignee | ||
Comment 22•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 > Why is it unreliable? Because we run into timeouts for the page unload timer. The unload events are getting fired way too late. beforeunload is fired right away. > What are you implying? That the document navigation gets cancelled in beforeunload, or that beforeunload takes a long time? > > In the first case, we would expect never to see an unload at all because navigation would not happen. > > In the latter, it’s not obvious to me what we should do. This paragraph is for the first case, means a registered handler for beforeunload is called and rejects the unload. In such a case the unload timer will fire after 5s, and we assume there is no page load. If beforeunload takes too long we simply have no chance in detecting a page load. But that should really not be possible as all the test results so far have been shown. Even in the worst case with the unload event taking more than 1s, beforeunload is fired immediately after we return from flushEventLoop. Does this answer your question? > Why will it only happen rarely? Generally, I’m lacking “the story” in this commit message. By reading it, I don’t really understand what the issue is. So what I mean here is that tests which specifically test the path to abort a page load with a registered beforeunload handler should be rare. If it's confusing I can remove this paragraph. > The way I read this, the notify function gets called again after five seconds. > > What if the timerPageUnload that has already been registered fires before this? Does that cancel this? > > Also, what if the document’s beforeunload cancels the navigation? Does your implementation handle that? > What if the timerPageUnload that has already been registered fires before this? Does that cancel this? Please see line 219 which cancels the currently running page unload timer. > Also, what if the document’s beforeunload cancels the navigation? Does your implementation handle that? As said earlier (which was unclear to you) the page unload timer will fire again after 5s, and if no unload event has been seen by that time, we assume there is no page load, and we call sendOk().
Assignee | ||
Comment 23•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 > > What if the timerPageUnload that has already been registered fires before this? Does that cancel this? > > Please see line 219 which cancels the currently running page unload timer. > > > Also, what if the document’s beforeunload cancels the navigation? Does your implementation handle that? > > As said earlier (which was unclear to you) the page unload timer will fire again after 5s, and if no unload event has been seen by that time, we assume there is no page load, and we call sendOk(). Something which I just noticed is that we do not check if the unload timer should be used or not. It's still optional for navigation requests. So it means that with the above change we would also fire a unload timer with 5s for get/backward/forward/refresh. It may not cause issues, but I should fix that.
Comment 24•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f9e4d7e4253e
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 26•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 > Something which I just noticed is that we do not check if the unload timer should be used or not. It's still optional for navigation requests. So it means that with the above change we would also fire a unload timer with 5s for get/backward/forward/refresh. It may not cause issues, but I should fix that. Thinking more about my last comment showed me that the current behavior is correct in ALL the cases, and that it also fixes a hang in the page load code for the other navigation commands like `get` itself. It will happen in those cases when `beforeunload` is raised, but the actual page navigation gets canceled. I think that I will add additional testcases for this.
Assignee | ||
Comment 27•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 > Thinking more about my last comment showed me that the current behavior is correct in ALL the cases, and that it also fixes a hang in the page load code for the other navigation commands like `get` itself. It will happen in those cases when `beforeunload` is raised, but the actual page navigation gets canceled. I think that I will add additional testcases for this. Let me revoke my last comment. It's not part of the spec to return early for eg. a `get` request which cancels the page load via a registered `beforeunload` handler. I will restrict this code to only get run when the command opted-in.
Assignee | ||
Comment 28•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 > Let me revoke my last comment. It's not part of the spec to return early for eg. a `get` request which cancels the page load via a registered `beforeunload` handler. I will restrict this code to only get run when the command opted-in. So what I just noticed while creating a test for `onbeforeunload` any command is returning immediately, so that the opened alert can be handled. As such the comment in the code needs an update, and I will still add two more tests for `get` and `click`.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8862735 -
Attachment is obsolete: true
Assignee | ||
Comment 30•7 years ago
|
||
Andreas, please check the latest changes, and remaining issues. I think that we can close them all. Thanks.
Flags: needinfo?(ato)
Comment hidden (mozreview-request) |
Comment 32•7 years ago
|
||
mozreview-review |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review138836
Comment 33•7 years ago
|
||
mozreview-review |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review138838 Please clarify commit message. Otherwise, everything is good.
Comment 34•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 > This paragraph is for the first case, means a registered handler for beforeunload is called and rejects the unload. In such a case the unload timer will fire after 5s, and we assume there is no page load. > > If beforeunload takes too long we simply have no chance in detecting a page load. But that should really not be possible as all the test results so far have been shown. Even in the worst case with the unload event taking more than 1s, beforeunload is fired immediately after we return from flushEventLoop. > > Does this answer your question? Thanks for clarifying. I would love for the commit message to be clearer on this. Especially “for those cases when the unload gets canceled due to a registered 'beforeunload' listener on the page itself” is unclear. An document does not get cancelled _because_ of the beforeunload listener, but because the beforeunload listener _aborts the navigation_, as you said in your clarification. > So what I mean here is that tests which specifically test the path to abort a page load with a registered beforeunload handler should be rare. > > If it's confusing I can remove this paragraph. I get what you’re saying now. Maybe it would make sense to rephrase this: > In the case when a navigation attempt is aborted in the document’s > beforeunload, the timeout waiting for the document to start navigating > is increased by 5000 ms to ensure a possible load event is not missed. > In the common case such an event should occur pretty soon after > interaction, and we optimise for this. > So what I just noticed while creating a test for `onbeforeunload` any command is returning immediately, so that the opened alert can be handled. As such the comment in the code needs an update, and I will still add two more tests for `get` and `click`. This all sounds good.
Updated•7 years ago
|
Flags: needinfo?(ato)
Assignee | ||
Comment 35•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8862792 [details] Bug 1357634 - Use 'beforeunload' to detect a possible page load. https://reviewboard.mozilla.org/r/134684/#review137654 > Thanks for clarifying. I would love for the commit message to be clearer on this. Especially “for those cases when the unload gets canceled due to a registered 'beforeunload' listener on the page itself” is unclear. An document does not get cancelled _because_ of the beforeunload listener, but because the beforeunload listener _aborts the navigation_, as you said in your clarification. Yeah, I missed to update this part of the commit message. I will rephrase it, so it's better understandable. > I get what you’re saying now. Maybe it would make sense to rephrase this: > > > In the case when a navigation attempt is aborted in the document’s > > beforeunload, the timeout waiting for the document to start navigating > > is increased by 5000 ms to ensure a possible load event is not missed. > > In the common case such an event should occur pretty soon after > > interaction, and we optimise for this. Your message is not completely correct. We will only extend the unload timer when the page load is NOT aborted. Otherwise it looks fine, and I will update. My proposal is: // In the case when a document has a beforeunload handler registered, // the currently active command will return immediately due to the // modal dialog observer in proxy.js. // Otherwise the timeout waiting for the document to start navigating // is increased by 5000 ms to ensure a possible load event is not missed. // In the common case such an event should occur pretty soon after // beforeunload, and we optimise for this.
Comment hidden (mozreview-request) |
Comment 37•7 years ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/29d676d23989 Use 'beforeunload' to detect a possible page load. r=ato
Comment 38•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/29d676d23989
Assignee | ||
Updated•7 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago → 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Assignee | ||
Comment 39•7 years ago
|
||
As it looks like the issue is still not completely fixed: 1493878218361 Marionette TRACE 374 -> [0,4,"clickElement",{"id":"e31a38ae-c0ef-4939-bf22-3ca7b52c247b"}] 1493878219108 Marionette DEBUG Received DOM event "beforeunload" for "http://127.0.0.1:38163/clicks.html" 1493878219318 Marionette DEBUG Canceled page load listener because no navigation has been detected 1493878219396 Marionette TRACE 374 <- [1,4,null,{}] Somehow we still cancel the page unload timer. I will file a follow-up bug for it.
Summary: [e10s] Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username → [e10s] Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username (because beforeunload is not taken into account)
Updated•7 years ago
|
Whiteboard: [spec][17/04][stockwell needswork] → [spec][17/04][stockwell fixed]
Comment 40•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/d880b3f5539a https://hg.mozilla.org/releases/mozilla-beta/rev/b99a30853f53
Assignee | ||
Updated•7 years ago
|
Updated•1 year ago
|
Product: Testing → Remote Protocol
Comment 41•1 year ago
|
||
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in
before you can comment on or make changes to this bug.
Description
•