Open Bug 1449482 Opened 6 years ago Updated 1 year ago

Rewrite serve.py to use multiprocessing.Queue instead of multiprocess.Pipe

Categories

(Testing :: Marionette Client and Harness, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

Details

This can be seen here:

https://treeherder.mozilla.org/logviewer.html#?job_id=170701741&repo=mozilla-inbound&lineNumber=750-774

Whenever an exception gets raised in the FixtureServer code, we hang forever. I can easily reproduce it locally. 

This is one other reason why we see bug 1391545 in automation.
So the problem here happens in `get_url()`:

https://dxr.mozilla.org/mozilla-central/rev/b906009d875d1f5d29b0d1252cdb43a9b1a5889c/testing/marionette/harness/marionette_harness/runner/serve.py#159

The parent process sends the request to the child, which didn't startup correctly due to the exception in the `init_func`, and Python's multiprocessing module already called `os._exit()` on it. It means that the call to `recv` from the parent hangs and never returns:

https://dxr.mozilla.org/mozilla-central/rev/b906009d875d1f5d29b0d1252cdb43a9b1a5889c/testing/marionette/harness/marionette_harness/runner/serve.py#54

Given by the docs I would assume that a `EOFError` exception is raised, but that doesn't happen and we just hang:
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Connection.recv

Andreas, do you have an idea what's going on here? Maybe it is a bug in the multiprocessing module?

If we would call the `init_func` earlier to ensure that the child is running properly a server instance before sending any request to it, it would fix the hang, and cause an immediate abort.
Flags: needinfo?(ato)
Btw I cannot dig into `Connection.recv` because it is part of the compiled module: lib-dynload/_multiprocessing.so
See my patch for init_func in
https://bugzilla.mozilla.org/show_bug.cgi?id=1321517 and in particular
by comment https://bugzilla.mozilla.org/show_bug.cgi?id=1321517#c22.
The patch doesn’t fix an exception occurring at an arbitrary
place in ServerProxy, but it does address the immediate startup problem.

I think in order to solve this in a good way, serve.py should
be made not to depend on rolling its own IPC system but instead
make use of a multiprocessing.Queue and other best practices for
multiprocessing in Python.
Flags: needinfo?(ato)
Ok, that fix looks fine. I will update this bug's summary for the ultimate goal in using multiprocessing.Queue and put it into the backlog. Thanks.
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Priority: P1 → P3
Summary: Infinite hang of Marionette when FixtureServer code raises an exception → Rewrite serve.py to use multiprocessing.Queue instead of multiprocess.Pipe
Severity: normal → S3
Product: Testing → Remote Protocol
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in before you can comment on or make changes to this bug.