Firefox does not exit when geckodriver is terminated
Categories
(Testing :: geckodriver, defect, P3)
Tracking
(Not tracked)
People
(Reporter: whimboo, Unassigned)
References
()
Details
Attachments
(5 files)
In case a Selenium test doesn't explicitly call "driver.quit()" at the end of the test, geckodriver will be terminated by Selenium. Sadly this results in a situation that Firefox continues to run, even without geckodriver running. It also means that the profile geckodriver own continues to be around. As Andreas mentioned on IRC this could be because of issues with signal handling in Rust.
Comment 1•6 years ago
|
||
Selenium advice has always been to call `quit()` at the end of a test. If people are doing this then they are getting themselves into weird edge cases that go against all advice. Dropping this to P5 and if a contributor wants to fix it then great
Comment 2•6 years ago
|
||
I would be interested in helping to get this resolved. There are certain cases where disaster prevents an application from properly calling "quit()", like a failure in Selenium Grid. In such a failure, you might be left with tens of browsers to cleanup. There are two crates I've earmarked that can handle interrupt or term signals: chan-signal and ctrlc. Ctrlc provides handling for interrupt and term, with the latter being a configurable option in Cargo.toml. With ctrlc, all signals must be handled with the same handler, and the process is rather opaque. Chan-signal provides finer-grained control of signal handling, with the option for separate signal handlers for separate signals, but currently only supports Unix. It looks like Windows support is in the scope, but there's been no forward development towards it. It's also based on the chan crate, which provides a more transparent mechanism for handling the signals. Chan-signal seems like the better option moving forward, but the lack of current Windows support is a miss. I would be interested in getting feedback on the decision before I move forward with a test patch.
Reporter | ||
Comment 3•6 years ago
|
||
I think Andreas can give some helpful feedback here given that he AFAIR had a quick look at signal handling lately. Maybe there is even another bug open.
Comment 4•6 years ago
|
||
We ran into another case where signal handling is necessary when discussing the patch for preserving minidump files when Firefox crashes. jgraham has summarised the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1433495#c4. It is also my impression that chan-signal is the right path forward, even if it currently does not give us Windows. Greg: Can you actually confirm that handling SIGINT and stopping Firefox solves this problem? whimboo recently recently changed geckodriver to more gracefully deal with stopping Firefox.
Comment 5•6 years ago
|
||
Sure! I'll get a first-pass patch going to see what impact it has on the browser being left over. I'll also reach out to the chan-signal dev to see how close Windows support is on the horizon.
Comment 6•6 years ago
|
||
Even the simplest examples of chan-signal don't seem to work when used in conjunction with hyper, like the following modification of main.rs in GeckoDriver: ``` let listening = webdriver::server::start(addr, handler, &extension_routes()[..]) .map_err(|err| (ExitCode::Unavailable, err.to_string()))?; info!("Listening on {}", listening.socket); let signal_channel = chan_signal::notify(&[Signal::INT, Signal::TERM]); let signal = signal_channel.recv().unwrap(); println!("Received signal {:?}", signal); ``` Sending a Ctrl+C or TERM doesn't result in the received-signal-message being printed out. Spinning up a Hyper 0.10 server on a test project results in the same behavior. I'll check into the ctrlc crate next, but the chan-signal crate has me scratching my in regards to why it won't work.
Comment 7•6 years ago
|
||
I believe that is what jgraham referred to in https://bugzilla.mozilla.org/show_bug.cgi?id=1433495#c4. See https://github.com/hyperium/hyper/issues/338 for more details.
Comment 8•6 years ago
|
||
So one idea I just had is to handle the signal, get Firefox to shut down and then just call `std::process::exit()` to exit without worrying about terminating the hyper event loop.
Comment 9•6 years ago
|
||
OK, so I just pushed my bits to mozreview and would appreciate feedback (I marked r=ato, but it's more like f?ato). I didn't extensively test this, but it seems to work at least a little. Also it's basically just what was in my git stash, so it's not a super-clean patch at this point.
Comment hidden (mozreview-request) |
Comment 11•6 years ago
|
||
Oh and, Greg, apologies for hijacking this bug; I just happened to have been looking at a related issue a few weeks ago and so had a nearly working patch. If you can find a way to make this work on Windows, that would be amazing.
Comment 12•6 years ago
|
||
No prob! I've been reading the chan-signal code/Microsoft docs to get an idea of how Windows support would play out. I don't know what timeframe it would take for me to come up with a solution, but I'll pay attention to this bug as a test point.
Comment 13•6 years ago
|
||
Windows support can be added pretty easily to chan-signal. I'm going to work with the author to get the functionality merged back into his version. I'd recommend sticking with the crate on crates.io, but if you want to test windows functionality in a pinch, my fork lives at https://github.com/gsfraley/chan-signal.git.
Comment 14•6 years ago
|
||
mozreview-review |
Comment on attachment 8958770 [details] Bug 1430064 - Use chan_signal to handle signals sent to geckodriver, https://reviewboard.mozilla.org/r/227680/#review234104 This is a pretty clever patch! I just have some minor comments and things you need to fix up. ::: commit-message-d6957:5 (Diff revision 1) > +Bug 1430064 - Use chan_signal to handle signals sent to geckodriver, r=ato > + > +This requires a refactor of the WebDriver main loop so that we are able to send > +a message from the geckodriver process in. Hyper doesn't allow us to > +exit it's handler loop cleanly so we take the approach of doing the its ::: testing/geckodriver/Cargo.toml:13 (Diff revision 1) > +chan = "0.1.12" > +chan-signal = "0.3.1" Let’s depend on chan "0.1" and chan-signal "0.3" so that it is easier to share the vendored crates with other Rust code in the tree. ::: testing/geckodriver/Cargo.toml:13 (Diff revision 1) > +chan = "0.1.12" > +chan-signal = "0.3.1" You also need to run "./mach vendor rust" because these are not currently vendored. ::: testing/geckodriver/src/main.rs:1 (Diff revision 1) > extern crate chrono; > +extern crate chan; > +extern crate chan_signal; Surely chan* should come before chrono? ::: testing/geckodriver/src/main.rs:213 (Diff revision 1) > + let server = webdriver::server::WebDriverServer::new(); > + > + let builder = thread::Builder::new().name("signal handler".to_string()); > + let chan = server.send_chan(); > + builder.spawn(move || { > + signal_notify(signal, chan) Missing semicolon. I don’t think this is supposed to return anything? ::: testing/geckodriver/src/main.rs:232 (Diff revision 1) > +{ > + > + // Blocks until this process is sent an INT or TERM signal. > + signal.recv().unwrap(); > + > + info!("Got a signal, quitting"); s/info/debug/ ::: testing/webdriver/src/server.rs:64 (Diff revision 1) > } > > fn run(&mut self, msg_chan: Receiver<DispatchMessage<U>>) { > loop { > - match msg_chan.recv() { > + let msg = msg_chan.recv(); > + info!("Got msg"); Drop this. ::: testing/webdriver/src/server.rs:92 (Diff revision 1) > if resp_chan.send(resp).is_err() { > error!("Sending response to the main thread failed"); > }; > } > - Ok(DispatchMessage::Quit) => break, > + Ok(DispatchMessage::Quit) => { > + info!("Got quit"); Drop or lower to debug. ::: testing/webdriver/src/server.rs:259 (Diff revision 1) > + let (msg_send, msg_recv) = channel(); > + WebDriverServer { > + send_chan: msg_send, > + recv_chan: msg_recv > + } If you name the return values from channel() send_chan and recv_chan, you can avoid the field assignments when constructing WebDriverServer: > let (send_chan, recv_chan) = channel(); > WebDriverServer { > send_chan, > recv_chan, > }
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 17•6 years ago
|
||
mozreview-review |
Comment on attachment 8960180 [details] Bug 1430064 - Vendor chan_signal crate, https://reviewboard.mozilla.org/r/228948/#review234652
Comment 18•6 years ago
|
||
mozreview-review |
Comment on attachment 8958770 [details] Bug 1430064 - Use chan_signal to handle signals sent to geckodriver, https://reviewboard.mozilla.org/r/227680/#review234654
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 19•6 years ago
|
||
Resetting ni? flag because it was unintentionally removed.
Comment 20•6 years ago
|
||
I’m going to pick this up and rebase jgraham’s patch. We now use a hyper version which _does_ support shutting down.
Updated•6 years ago
|
Reporter | ||
Comment 21•5 years ago
|
||
(In reply to Andreas Tolfsen ⦗:ato⦘ from comment #20)
I’m going to pick this up and rebase jgraham’s patch. We now use
a hyper version which does support shutting down.
Looks like you missed to follow-up here. Will you be able to do it soon? Otherwise we should move this bug to the next 0.25 release (bug 1520585).
Comment 22•5 years ago
|
||
No if you want to do it, please go ahead!
Reporter | ||
Updated•5 years ago
|
Comment 23•5 years ago
|
||
(In reply to Andreas Tolfsen ⦗:ato⦘ from comment #20)
I’m going to pick this up and rebase jgraham’s patch. We now use
a hyper version which does support shutting down.
Hello. I'm interested in helping resolve this issue. I have a Python script that uses Selenium+geckodriver+Firefox, and I'd like to gracefully handle ctrl+c events, but I run into this issue where the Firefox windows don't close. Is the current status still that the attached patch needs to be rebased?
Reporter | ||
Comment 24•5 years ago
|
||
James, could you please help Daniel given that were your patches?
Comment 25•5 years ago
|
||
chan-signal
/chan
have since been deprecated in favour of
signal-hook
/crossbeam-channel
, so the attached patches can’t
simply be rebased.
The webdriver
crate has since also moved to use a Tokio-based
hyper
and no longer does its own HTTP request routing but instead
relies on warp
.
The new APIs are however strikingly similar and the same fundamental
approach should still be applicable. You will need to spin up a
thread in geckodriver that blocks on receiving the signal and hook
that up to the webdriver::server::Dispatcher
thread and warp
in some way using a channel so they know to exit when the interrupt
arrives.
Unfortunately I don’t know the warp
API very well, but hyper
can be stopped with hyper::server::Server::with_graceful_shutdown()
.
Comment 26•5 years ago
|
||
Thanks! I'll start working on it.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 27•5 years ago
|
||
Daniel, do you have any further questions? Otherwise do you have an updated status?
Comment 28•5 years ago
|
||
No further questions at the moment. I have not had a significant amount of time to work on this yet due to my other responsibilities, but I am committed to fixing this bug. I, unexpectedly, will be on medical leave for 3-6 weeks starting on the 10th, so I may not have a patch before January. If someone else comes along and wants to fix it while I am out, they are welcome to, but, as I said, I am committed to fixing this since it blocks some of my own work and it is an interesting problem to me.
Reporter | ||
Comment 29•5 years ago
|
||
Good to hear, and until then best luck for you. I don't think that during that time anyone else would need it. As such the bug should be waiting for you. :)
Comment 30•4 years ago
|
||
Hello, I have hit a wall on this. I am able to spin up a thread that handles the signal and send a message to the webdriver::server::Dispatcher
thread, but I have yet to find a way to shutdown the warp
server that compiles. The compilation fails saying that I cannot await a future because we're not using Rust 2018. Is that true or have I made a simple mistake somewhere? I have a patch file (along with one for the vendoring of signal-hook
) that I can attach if you'd like to look at where I am.
Comment 31•4 years ago
|
||
The compilation fails saying that I cannot await a future because we're not using Rust 2018
This is unfortunately true. I don't know how hard it would be to update to the new edition. I"m guessing not that hard.
I have a patch file (along with one for the vendoring of signal-hook) that I can attach if you'd like to look at where I am.
Please! Or push to phabricator if you are able to do that.
Comment 32•4 years ago
|
||
This is my first time working in moz-central, so I don't know how to push to phabricator yet. I'll attach the patches now while I read up on how to do that.
Comment 33•4 years ago
|
||
Comment 34•4 years ago
|
||
And perhaps I am doing the attachment part wrong too because the vendoring patch is too big.
Comment 35•4 years ago
|
||
I don't know if the situation has improved recently but the docs for this were previously horrible. The tl;dr version is:
- Ensure you have Python 3 and pip 3 installed
pip3 install mozphab
(with--user
if you don't want a system-wide install)moz-phab submit <commits>
Don't worry about the vendoring patch.
Reporter | ||
Comment 36•4 years ago
|
||
Our code should compile for 2018. I fixed all what was broken via bug 1508670 a long time ago. Maybe something new sneaked in. Just test it out.
Comment 37•4 years ago
|
||
So features requiring Rust 2018 should now work on central. The patch you attached looks like full files rather than a diff; lets try to make the Phabricator route work.
Comment 38•4 years ago
|
||
Changing the edition did work though it uncovered new compilation errors. I have fixed those and am running the webdriver tests now. Then I'll test my script that reproduces the original problem, and see if I've actually got everything working. If I do, I'll push to Phabricator.
Comment 39•4 years ago
|
||
Updated•4 years ago
|
Comment 40•4 years ago
|
||
Okay. I have something that I think works. In my repro script, the Firefox window now closes when it didn't before. However, I cannot tell if the warp
-driven webdriver server thread is properly joining. If I put a print at the end of the webdriver server thread body, I don't see the output.
Reporter | ||
Comment 41•4 years ago
|
||
James, mind following up on Daniel's question from about 2 weeks ago? Thanks.
Comment 42•4 years ago
|
||
So with a diff like
diff --git a/testing/geckodriver/src/main.rs b/testing/geckodriver/src/main.rs
index cf844bfe2e8ac..bc4eae6a3e4e1 100644
--- a/testing/geckodriver/src/main.rs
+++ b/testing/geckodriver/src/main.rs
@@ -236,7 +236,7 @@ fn inner_main(app: &mut App) -> ProgramResult<()> {
})?;
}
}
-
+ println!("SHUTDOWN inner_main");
Ok(())
}
diff --git a/testing/webdriver/src/server.rs b/testing/webdriver/src/server.rs
index 2146969ab20fe..63b2981b43199 100644
--- a/testing/webdriver/src/server.rs
+++ b/testing/webdriver/src/server.rs
@@ -183,12 +183,14 @@ where
let builder = thread::Builder::new().name("webdriver server".to_string());
let handle = builder.spawn(move || {
tokio::run(server);
+ println!("SHUTDOWN webdriver server");
})?;
let builder = thread::Builder::new().name("webdriver dispatcher".to_string());
builder.spawn(move || {
let mut dispatcher = Dispatcher::new(handler);
dispatcher.run(&msg_recv);
+ println!("SHUTDOWN dispatcher");
})?;
Ok(Listener {
ctrl-c with a Firefox session started shuts down Firefox, and gives the following output:
^CSHUTDOWN webdriver server
SHUTDOWN inner_main
jgraham@flitwick:~/develop/gecko$ Exiting due to channel error.
`
Exiting due to channel error.
So I don't see the SHUTDOWN dispatcher
line, and something, either Firefox or geckodriver is producing the Exiting due to channel error.
output. So I think maybe something is wrong in the order of shutdown in this case, but basically the patch here works. I'm pretty happy to take it if we solve the excessive CPU usage of the busy loop.
Reporter | ||
Comment 43•4 years ago
|
||
How does it triggering Firefox to quit? Is it sending the Marionette:Quit
message? That is how it should be done because that causes a safe shutdown. Everything else will just be killing the process which is not really wanted.
Comment 44•4 years ago
|
||
Hello. I've had to put working on this on hold until a week or two from now to push another project out the door. James has been providing feedback on Phabricator and I do have next steps to solve the last remaining issues. The busy loop can be solved with a sleep in the thread handling the signal, but I would like to see if I can phrase it as a future so it plays along with the event loop. That said, I haven't seen that channel error message. When I get back to this, I'll see if I can reproduce it on my end.
Henrik, the signal handler sends a webdriver::server::DispatchMessage::Quit
message to the webdriver dispatcher which causes the dispatcher to call its delete_session()
function, which calls the MarionetteHandler
's delete_session()
function, which passes a WebDriverCommand::DeleteSession
message to the MarionetteHandler
's handle_command()
function, which sends that message to the MarionetteConnection
through MarionetteConnection
's send_command()
function which encodes that message into a Marionette:Quit
message. Or at least that's the execution path I traced by reading. I can't confirm that's actually how it works, so it should be confirmed by someone more familiar with the codebase.
Comment 45•4 years ago
|
||
Hello again. The current status on this is that shutdown of the dispatcher hangs while waiting for the MarionetteConnection
to close (via MarionetteConnection.close()
within MarionetteHandler.delete_session()
. The close function is an empty function, and I can't figure out what is actually happening with that function.
I was scheduled to go on family leave for a new child next week, but because of COVID-19, those plans have moved up a week, and tomorrow is my last work day for 3 months.
If the source of the hang can be identified, I think (hope) that is the last hurdle. Adding a 1s sleep to the signal handling thread makes CPU usage go back to the level it was before.
Reporter | ||
Comment 46•4 years ago
|
||
Thank you Daniel for the feedback! I haven't looked into that detail myself for several months so hard to say what's going on. Would you mind pushing your latest changes to phabricator so that we could test on our own? Maybe James has some additional feedback here...
Otherwise all best for you and your family. Especially for the upcoming new family member!
Comment 47•4 years ago
|
||
Okay, latest changes are pushed (just the addition of the sleep to address CPU usage). Thanks for the well wishes!
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 48•4 years ago
|
||
James, would you have the time to check the latest status of the patch? Not sure what's remaining here, and if it's not too much if it could be a potential fix for 0.28.0.
Updated•4 years ago
|
Comment 49•4 years ago
|
||
Comment 50•4 years ago
|
||
I rebased/partially rewrote this patch (the webdriver part; the geckodriver part is largely unchanged). I don't see a hang so afaict we can land this subject to review etc.
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 51•4 years ago
|
||
So what I didn't notice earlier is that with this patch we only handle SIGINT
, which only handles the Ctrl+C case. But we don't setup signal handlers for those cases when the geckodriver process gets killed via SIGTERM
and/or SIGKILL
. Both signals are limited to non-Windows platforms, but I feel that we should still add them both or only one to handle a graceful shutdown of Firefox when the geckodriver process gets killed.
Also there seems to be a race condition. In most of the cases we are going to quit the session before sendnig the Marionette:Quit
command. As such Firefox will still be not closed.
In a working case it looks like:
1601629124353 geckodriver DEBUG signal received
1601629124353 webdriver::server DEBUG The dispatcher got the Quit message, deleting session
1601629124353 webdriver::server DEBUG Deleting session
1601629124353 webdriver::server DEBUG shutdown!
1601629124355 Marionette DEBUG 0 -> [0,3,"Marionette:Quit",{"flags":["eForceQuit"]}]
1601629124355 Marionette INFO Stopped listening on port 61295
1601629124377 Marionette DEBUG Closed connection 0
And here a failing case:
1601629293053 geckodriver DEBUG signal received
1601629293053 webdriver::server DEBUG The dispatcher got the Quit message, deleting session
1601629293053 webdriver::server DEBUG Deleting session
1601629293054 webdriver::server DEBUG shutdown!
1601629293057 Marionette DEBUG Closed connection 0
^CDEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
James, what do you think?
Reporter | ||
Comment 52•4 years ago
|
||
It's too late for 0.28 and as such has to be moved to 0.29.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 53•2 years ago
|
||
Too late for the 0.31.0 release and James doesn't work on it at the moment.
Reporter | ||
Comment 54•2 years ago
|
||
We haven't had the time to continue on this particular issue and most likely we won't be able anytime soon. As such lets drop the bug from any milestone / release tracking.
Reporter | ||
Comment 55•2 months ago
|
||
Note that with bug 336193 fixed Firefox should handle SIGTERM/SIGINT/SIGHUP on all platforms now.
Description
•