Closed Bug 1447449 Opened 6 years ago Closed 6 years ago

Repeated pause action hangs after a few seconds

Categories

(Remote Protocol :: Marionette, defect, P1)

Version 3
defect

Tracking

(firefox-esr60 fixed, firefox62 fixed, firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr60 --- fixed
firefox62 --- fixed
firefox63 --- fixed

People

(Reporter: impossibus, Assigned: whimboo)

References

(Blocks 1 open bug, )

Details

(Keywords: hang)

Attachments

(3 files, 1 obsolete file)

Summary: Pause action hangs on linux opt → Repeated pause action hangs after a few seconds
Priority: -- → P3
We're hitting this same problem transitioning from using WebDriver 3 "legacy" mode with Firefox to non-legacy (geckodriver).  As part of that effort, we need to call pause() during a drag like so:

actions.clickAndHold(element).moveByOffset(200,0).pause(1000).release().perform();

(The pause() is needed to avoid the drag short-stroking when interacting with our JS Framework widgets, but we so far haven't been able to reproduce it in a standalone test case.  We plan to file a separate bug for that.)

We find that pause values > 100 have a high likelihood of causing the release() to be dropped, so that the browser thinks the mouse button is stuck down.  When this happens, not only does the current test fail, but all communication between WebDriver and the browser freezes.  Attempting to re-release has no effect, and moving the cursor into the browser causes the element to move with the mouse - since the browser thinks the button is stuck down.

You can see the report I made against geckodriver for this here:
https://github.com/mozilla/geckodriver/issues/1352
for which I was redirected to this bug report.

We would really appreciate this getting fixed before FF52ESR is EOL on 9/5/2018, since FF60ESR will only run non-legacy mode.  Is there any chance this can be bumped to a P2?  Seems to me that a hang of the entire browser driver is pretty serious, especially since the threshold of pause where it occurs is fuzzy.
I cannot promise yet when someone from us has the time to work on this bug. So far it's not that wide-spread and only you have that problem. There are other way more important bugs to fix. Sorry.

If you have time and interest I could guide you, so that we may investigate the problem together. This might speed-up fixing this bug.
Severity: normal → major
Keywords: hang
They're not actually the only one with this issue.

I'm on FF nightly 62.0b17, Selenium 3.14, and Geckodriver 0.21.0.

The web application I'm testing relies on pauses in the action chain to account for transitions which made the code compatible with Chrome/IE because they had more problems with the transitions than Firefox. We ended up solely focusing on Firefox as the testing browser so dropping the pauses as suggested in javascriptjedi's bug report wasn't a big issue.

My pauses aren't as long as 500, but it's enough to make it hang:

action = ActionChains(driver)
action.move_to_element(start_elem)
action.pause(1)
# Typical number of points in a line is three
for point in points:
    action.move_by_offset(*point)
    action.pause(1)
    action.click()
    action.pause(1)
# This secondary click will ensure the line ends
action.click()
action.perform()

All I'm doing is creating a line in the UI where each click drops a node to create connected line segments with the final node needing to be clicked twice to prevent adding more. Sometimes it appears to never get past the the first move offset + click, and sometimes it appears to never even accomplish the first click. And communication between Marionette and Geckodriver basically ends (my trace logs are very similar to the one javascriptjedi provided in the bug report). So it may not necessarily be just the mouse click and hold never releasing. 

I don't know if this is any help, but they're definitely not alone in the frustration.
Depends on: 1484161
So bug 1484161 accidentally fixed this hang as it looks like, at least when I test it with the attached testcase.

mnemyx, can you please also check yourself with your test by just downloading the latest Firefox Nightly from:

https://www.mozilla.org/en-US/firefox/channel/desktop/#nightly.

Please let us know if it also works for you. We are considering to uplift it to Firefox 62.
Flags: needinfo?(mnemyx)
With some local debugging the problem seems to be here:

https://hg.mozilla.org/mozilla-central/file/b84213ec5a4d/testing/marionette/action.js#l1428

> return new Promise(resolve =>
>   timer.initWithCallback(resolve, duration, Ci.nsITimer.TYPE_ONE_SHOT)
> );

In the case of a hang `resolve` is never being called. But I was able to to get working when setting a custom notifier as what `Sleep()` does like:

>  return new Promise(resolve => {
>    function handler(timer) {
>      resolve();
>    }
>
>    timer.initWithCallback(handler, duration, Ci.nsITimer.TYPE_ONE_SHOT);
>  });

I have no idea why this sometimes fails without that handler. It might even be a bug in Firefox itself.
Thanks.

I can confirm that the issue is resolved if I use the Firefox build from comment #5 above, as verified by my sample code from https://github.com/mozilla/geckodriver/issues/1352, configured with a pause of 2000ms and 1000 iterations.

Is there any chance we can get the fix ported back to FF60ESR as a stability improvement?  Otherwise our customers using ESR will be in a world of pain until mid 2019 when FF68ESR is released.  Presumably, we don't need the full change that "accidentally" fixed this issue, but just what was suggested in comment #6.

There's still time for this to be resolved in FF60ESR before customers currently using FF52ESR (EOL 9/5/2018) ever see it, which would probably be the ideal situation, and perhaps the bar is lower (for porting) before that transition point?
That is great to hear. Thank you for the feedback. In regards of uplifts we will try to get it at least on beta, but personally I would also prefer esr60. All this will be tracked by bug 1484161.
Assignee: nobody → ato
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Turns out the approach from bug 1484161 doesn't work here due to multi-level merge conflicts. As such we should indeed just go with the simplified case.

Before marking this bug as fixed, lets also re-enable the wdspec test!
Assignee: ato → hskupin
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Priority: P3 → P1
Attachment #9002569 - Flags: review?(ato)
Attachment #9002569 - Flags: review?(ato) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b14f6fa58a30
[wdspec] Re-enable mouse_pause_dblclick.py. r=ato
https://hg.mozilla.org/mozilla-central/rev/b14f6fa58a30
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Under some unknown circumstances an instance of nsITimer doesn't
call resolve(), and as such the pause action hangs forever.

Injecting a custom handler which itself calls resolve, prevents
this hang.

This is a workaround for older branches while bug 1484161 fixed
it for Firefox 63.
Comment on attachment 9003051 [details] [diff] [review]
[marionette] Prevent hang in pause action [workaround patch for older branches]

Once landed together with the other patch on beta, it works all fine for me.
Attachment #9003051 - Attachment description: [marionette] Prevent hang in pause action → [marionette] Prevent hang in pause action [workaround patch for older branches]
Flags: needinfo?(mnemyx)
Attachment #9003051 - Flags: review?(ato)
Comment on attachment 9003051 [details] [diff] [review]
[marionette] Prevent hang in pause action [workaround patch for older branches]

Review of attachment 9003051 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/marionette/action.js
@@ +1425,4 @@
>    const timer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
>    let duration = typeof a.duration == "undefined" ? tickDuration : a.duration;
>    return new Promise(resolve =>
> +      timer.initWithCallback(timer => resolve(), duration, Ci.nsITimer.TYPE_ONE_SHOT)

The first argument here is actually a nsITimerCallback, but it can
be a callback function.  Since "timer" isn’t used you could shorten
it to this:

> initWithCallback(() => resolve(), duration, …)
Attachment #9003051 - Flags: review?(ato) → review+
Under some unknown circumstances an instance of nsITimer doesn't
call resolve(), and as such the pause action hangs forever.

Injecting a custom handler which itself calls resolve, prevents
this hang.

This is a workaround for older branches while bug 1484161 fixed
it for Firefox 63.
Attachment #9003051 - Attachment is obsolete: true
Comment on attachment 9003110 [details] [diff] [review]
[marionette] Prevent hang in pause action

(In reply to Andreas Tolfsen ﹝:ato﹞ from comment #16)
> The first argument here is actually a nsITimerCallback, but it can
> be a callback function.  Since "timer" isn’t used you could shorten
> it to this:
> 
> > initWithCallback(() => resolve(), duration, …)

Correct. Fixed that now. Also taking over r+.
Attachment #9003110 - Flags: review+
For uplifting this teste-only patch set to beta please land the changesets in the following order:

* attachment 9003110 [details] [diff] [review] (workaround fix for Marionette)
* attachment 9002569 [details] [diff] [review] (re-enable wdspec test)
Whiteboard: [checkin-needed-beta]
I checked on esr60, and both commits should apply cleanly. Please don't miss to fold 0ede263f6310 into 55fc535ff4ce before landing. Thanks.
Whiteboard: [checkin-needed-esr60]
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: