Closed Bug 1321517 Opened 8 years ago Closed 6 years ago

Marionette hangs forever if test.cert or test.key do not exist

Categories

(Remote Protocol :: Marionette, defect, P3)

defect

Tracking

(firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: whimboo, Assigned: ato, Mentored)

References

Details

(Keywords: good-first-bug, Whiteboard: [lang=py][wptsync upstream])

Attachments

(2 files, 3 obsolete files)

As implemented in bug 1103196 we have default values for the test certificate and key now. It works fine when both exist. But once one of those files does not exist, Marionette hangs forever with the following trace:

Process ServerProxy-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Volumes/data/code/gecko/testing/marionette/harness/marionette_harness/runner/serve.py", line 73, in run
    server = self.init_func(*self.init_args, **self.init_kwargs)
  File "/Volumes/data/code/gecko/testing/marionette/harness/marionette_harness/runner/serve.py", line 149, in https_server
    ssl_cert=ssl_config["cert_path"])
  File "/Volumes/data/code/gecko/testing/marionette/harness/marionette_harness/runner/httpd.py", line 77, in __init__
    key_file=ssl_key)
  File "/Volumes/data/Users/henrik/.venvs/marionette/lib/python2.7/site-packages/wptserve-1.4.0-py2.7.egg/wptserve/server.py", line 403, in __init__
    assert os.path.exists(key_file)
AssertionError

It looks like we do not correctly handle assertions like that in the spawned server threads.
Flags: needinfo?(ato)
This sounds like a low-priority bug if you have to delete a file to reproduce it – although still a bug.  We are probably not stopping the process when the exception occurs and at some point it hangs, maybe when calling get_url.
Flags: needinfo?(ato)
Priority: -- → P3
See Also: → 1345274
We want to add a file stat check that these files exist and exit
early instead of raising an error.
Mentor: ato
Whiteboard: [lang=py]
Version: Version 3 → Trunk
Keywords: good-first-bug
Can I work on this bug?
Flags: needinfo?(ato)
Of course!  Please submit your patches and flag me as reviewer.
If you have any questions you can reach out to me in #ateam on IRC
or leave a comment here.
Flags: needinfo?(ato)
--- a/testing/marionette/harness/marionette_harness/runner/httpd.py
+++ b/testing/marionette/harness/marionette_harness/runner/httpd.py
@@ -101,6 +101,11 @@ class FixtureServer(object):
         if port is None:
             port = 0
 
+        if not os.path.exists(ssl_cert):
+            raise ValueError("Certificate file is not found at the given path.")
+        if not os.path.exists(ssl_key):
+            raise ValueError("Key file is not found at the given path.")
+
         routes = [("POST", "/file_upload", upload_handler),
                   ("GET", "/http_auth", http_auth_handler),
                   ("GET", "/slow", slow_loading_handler),


From what I understood reading the comments so far in this, I am thinking this patch can do the job.  Not sure if the global declarations of default_ssl_cert and default_ssl_key also need to be surrounded with checks/exceptions.
Flags: needinfo?(ato)
That looks correct, Venkatesh.  I haven’t tried it out locally, but
I assume it exists when the ValueError is raised?

If you submit the patch, please change the error messages to be more
similar to this, which is considered more “Pythonic”:

> raise ValueError("Key file not found: {}".format(ssl_key))
Flags: needinfo?(ato)
(In reply to Andreas Tolfsen ‹:ato› from comment #6)
Thank you for the quick feedback, Andreas.

> That looks correct, Venkatesh.  I haven’t tried it out locally, but
> I assume it exists when the ValueError is raised?
Sorry, I lost you here.  Can you explain please?

> 
> If you submit the patch, please change the error messages to be more
> similar to this, which is considered more “Pythonic”:
This being my first ever patch to Mozilla, how do I submit?  I have seen there is a try server with a wider range of OS and hardware configurations to test the submissions.  Once you okay the patch here, how can I submit to the try server?

> 
> > raise ValueError("Key file not found: {}".format(ssl_key))

diff --git a/testing/marionette/harness/marionette_harness/runner/httpd.py b/testing/marionette/harness/marionette_harness/runner/httpd.py
--- a/testing/marionette/harness/marionette_harness/runner/httpd.py
+++ b/testing/marionette/harness/marionette_harness/runner/httpd.py
@@ -101,6 +101,11 @@ class FixtureServer(object):
         if port is None:
             port = 0
 
+        if not os.path.exists(ssl_cert):
+            raise ValueError("SSL Certificate file not found: {}".format(ssl_cert))
+        if not os.path.exists(ssl_key):
+            raise ValueError("SSL Key file not found: {}".format(ssl_key))
+
         routes = [("POST", "/file_upload", upload_handler),
                   ("GET", "/http_auth", http_auth_handler),
                   ("GET", "/slow", slow_loading_handler),

Is this better?
Flags: needinfo?(ato)
(In reply to Venkatesh from comment #7)

> (In reply to Andreas Tolfsen ‹:ato› from comment #6)
> 
> > That looks correct, Venkatesh.  I haven’t tried it out
> > locally, but I assume it exists when the ValueError is raised?
> 
> Sorry, I lost you here.  Can you explain please?

I was just trying to check that the Python script actually exits
when these exceptions are raised, and that there is no loop that
prevents it from exiting.  I don’t think this is the case, so you
should be fine.

> > If you submit the patch, please change the error messages to be
> > more similar to this, which is considered more “Pythonic”:
>
> This being my first ever patch to Mozilla, how do I submit?  I
> have seen there is a try server with a wider range of OS and
> hardware configurations to test the submissions.  Once you okay
> the patch here, how can I submit to the try server?

You can upload a .patch file to this bug (there are various tools
to do this automatically from git and hg, but you can also do it
manually), or you can push it to mozreview.

mozreview is documented on
https://mozilla-version-control-tools.readthedocs.io/en/latest/mozreview.html.
moz-git-tools for uploading from git are found at
https://github.com/mozilla/moz-git-tools.

See these resources for new contributors:

	https://firefox-source-docs.mozilla.org/testing/marionette/marionette/NewContributors.html

I will grant you access to the try server once you’ve submitted a
patch or two to Mozilla.  In the mean time I will do a try run for
you.

> diff --git a/testing/marionette/harness/marionette_harness/runner/httpd.py
> b/testing/marionette/harness/marionette_harness/runner/httpd.py
> --- a/testing/marionette/harness/marionette_harness/runner/httpd.py
> +++ b/testing/marionette/harness/marionette_harness/runner/httpd.py
> @@ -101,6 +101,11 @@ class FixtureServer(object):
>          if port is None:
>              port = 0
>  
> +        if not os.path.exists(ssl_cert):
> +            raise ValueError("SSL Certificate file not found:
> {}".format(ssl_cert))
> +        if not os.path.exists(ssl_key):
> +            raise ValueError("SSL Key file not found: {}".format(ssl_key))
> +
>          routes = [("POST", "/file_upload", upload_handler),
>                    ("GET", "/http_auth", http_auth_handler),
>                    ("GET", "/slow", slow_loading_handler),
> 
> Is this better?

s/Certificate/certificate
s/Key/key
Flags: needinfo?(ato)
Attached patch Fix for the bug 1321517 (obsolete) — Splinter Review
Flags: needinfo?(ato)
Attachment #8962426 - Flags: review+
Attachment #8962426 - Flags: feedback+
(In reply to Andreas Tolfsen ‹:ato› from comment #8)
> (In reply to Venkatesh from comment #7)
> 
> > (In reply to Andreas Tolfsen ‹:ato› from comment #6)
> > 
> > > That looks correct, Venkatesh.  I haven’t tried it out
> > > locally, but I assume it exists when the ValueError is raised?
> > 
> > Sorry, I lost you here.  Can you explain please?
> 
> I was just trying to check that the Python script actually exits
> when these exceptions are raised, and that there is no loop that
> prevents it from exiting.  I don’t think this is the case, so you
> should be fine.
Ah, the https_server method in serve.py that calls httpd.FixtureServer method did not have exception handling.  I have not edited it too.

> 
> > > If you submit the patch, please change the error messages to be
> > > more similar to this, which is considered more “Pythonic”:
> >
> > This being my first ever patch to Mozilla, how do I submit?  I
> > have seen there is a try server with a wider range of OS and
> > hardware configurations to test the submissions.  Once you okay
> > the patch here, how can I submit to the try server?
> 
> You can upload a .patch file to this bug (there are various tools
> to do this automatically from git and hg, but you can also do it
> manually), or you can push it to mozreview.
Thank you Andreas, I have uploaded manually.

> 
> mozreview is documented on
> https://mozilla-version-control-tools.readthedocs.io/en/latest/mozreview.
> html.
> moz-git-tools for uploading from git are found at
> https://github.com/mozilla/moz-git-tools.
> 
> See these resources for new contributors:
> 
> 	https://firefox-source-docs.mozilla.org/testing/marionette/marionette/
> NewContributors.html
> 
> I will grant you access to the try server once you’ve submitted a
> patch or two to Mozilla.  In the mean time I will do a try run for
> you.
> 
> > diff --git a/testing/marionette/harness/marionette_harness/runner/httpd.py
> > b/testing/marionette/harness/marionette_harness/runner/httpd.py
> > --- a/testing/marionette/harness/marionette_harness/runner/httpd.py
> > +++ b/testing/marionette/harness/marionette_harness/runner/httpd.py
> > @@ -101,6 +101,11 @@ class FixtureServer(object):
> >          if port is None:
> >              port = 0
> >  
> > +        if not os.path.exists(ssl_cert):
> > +            raise ValueError("SSL Certificate file not found:
> > {}".format(ssl_cert))
> > +        if not os.path.exists(ssl_key):
> > +            raise ValueError("SSL Key file not found: {}".format(ssl_key))
> > +
> >          routes = [("POST", "/file_upload", upload_handler),
> >                    ("GET", "/http_auth", http_auth_handler),
> >                    ("GET", "/slow", slow_loading_handler),
> > 
> > Is this better?
> 
> s/Certificate/certificate
> s/Key/key
Flags: needinfo?(ato)
Attachment #8962426 - Flags: review?(ato)
Attachment #8962426 - Flags: review+
Attachment #8962426 - Flags: feedback+
Assignee: nobody → venkateshpitta
Status: NEW → ASSIGNED
Comment on attachment 8962426 [details] [diff] [review]
Fix for the bug 1321517

Review of attachment 8962426 [details] [diff] [review]:
-----------------------------------------------------------------

This change looks good, but it appears you have attached a .diff
file instead of a .patch file.  I could remedy this for you, but I
think it is a good exercise for you to learn how to export patches
yourself.

The next time you upload you can set the review? (r?) flag to my
email.
Attachment #8962426 - Flags: review?(ato) → review-
Attached patch fixing bug #1321517, patch file. (obsolete) — Splinter Review
Is this better?
Attachment #8962519 - Flags: review?(ato)
Attachment #8962519 - Flags: feedback+
Comment on attachment 8962519 [details] [diff] [review]
fixing bug #1321517, patch file.

Review of attachment 8962519 [details] [diff] [review]:
-----------------------------------------------------------------

This is very nearly done now, but first
we need to fix the commit message in
https://bug1321517.bmoattachments.org/attachment.cgi?id=8962519.

It should have this form:

> Bug 1321517 - Error when unable to find certificate or key. r=ato
Attachment #8962519 - Flags: review?(ato) → review-
Is there anything else to fix?  What is next?
Attachment #8962557 - Flags: review?(ato)
Attachment #8962557 - Flags: feedback?(ato)
Attachment #8962557 - Flags: review?(ato)
Attachment #8962557 - Flags: review+
Attachment #8962557 - Flags: feedback?(ato)
Attachment #8962519 - Attachment is obsolete: true
Attachment #8962426 - Attachment is obsolete: true
Pushed by atolfsen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3919fa7286d5
Error when unable to find certificate or key. r=ato
Backed out changeset 3919fa7286d5 (bug 1321517) for MnH and en-US failures on a CLOSED TREE
Problematic push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=3919fa7286d5e52bdf453ada871c0dd89100f915&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=success&filter-resultStatus=pending&filter-resultStatus=running&filter-searchStr=085c9a552f6e73e451bee8cd4548abf540f20929
Failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=success&filter-resultStatus=pending&filter-resultStatus=running&fromchange=7fc5f960fe377f9f2f3189a18dd8eb54cd1d2677&filter-searchStr=085c9a552f6e73e451bee8cd4548abf540f20929&selectedJob=170701741
Backout: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=c2968b0879602b07d6b1f03b852ffe4815700fc9&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified
Log: https://treeherder.mozilla.org/logviewer.html#?job_id=170701741&repo=mozilla-inbound&lineNumber=761

17:44:54     INFO -  TypeError: coercing to Unicode: need string or buffer, NoneType found
17:44:59     INFO -  --DOCSHELL 0x12a2cf000 == 4 [pid = 637] [id = {f616bb3a-767b-7a46-8fef-daac775caba9}]
17:45:04     INFO -  --DOMWINDOW == 11 (0x11eb52800) [pid = 637] [serial = 10] [outer = 0x0] [url = about:blank]
17:45:04     INFO -  --DOMWINDOW == 10 (0x11eb52c00) [pid = 637] [serial = 11] [outer = 0x0] [url = about:blank]
17:45:06     INFO -  --DOMWINDOW == 9 (0x129aa9c00) [pid = 637] [serial = 6] [outer = 0x0] [url = about:blank]
17:45:08     INFO -  --DOMWINDOW == 2 (0x11498bc00) [pid = 640] [serial = 2] [outer = 0x0] [url = about:blank]
17:45:11     INFO -  --DOMWINDOW == 2 (0x12139d000) [pid = 641] [serial = 2] [outer = 0x0] [url = about:blank]
17:45:13     INFO -  [Parent 637, StreamTrans #52] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/modules/libjar/nsJARChannel.cpp, line 428
17:45:13     INFO -  --DOMWINDOW == 8 (0x12a183400) [pid = 637] [serial = 7] [outer = 0x0] [url = about:blank]
18:01:53     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/Users/cltbld/tasks/task_1522192694/build/venv/bin/python', '-u', '/Users/cltbld/tasks/task_1522192694/build/tests/marionette/harness/marionette_harness/runtests.py', '--headless', '--gecko-log=-', '--log-raw=-', '-vv', '--log-raw=/Users/cltbld/tasks/task_1522192694/build/blobber_upload_dir/marionette_raw.log', '--log-errorsummary=/Users/cltbld/tasks/task_1522192694/build/blobber_upload_dir/marionette_errorsummary.log', '--log-html=/Users/cltbld/tasks/task_1522192694/build/blobber_upload_dir/report.html', '--binary=/Users/cltbld/tasks/task_1522192694/build/application/Firefox NightlyDebug.app/Contents/MacOS/firefox', '--address=localhost:2828', '--symbols-path=/Users/cltbld/tasks/task_1522192694/build/symbols', '/Users/cltbld/tasks/task_1522192694/build/tests/marionette/tests/testing/marionette/harness/marionette_harness/tests/unit-tests.ini']
18:01:53    ERROR - timed out after 1000 seconds of no output
18:01:53    ERROR - Return code: -15
18:01:53    ERROR - No checks run.
18:01:53    ERROR - No suite end message was emitted by this harness.
18:01:53     INFO - TinderboxPrint: marionette<br/><em class="testfail">T-FAIL</em>
18:01:53     INFO - gecko.log not found
18:01:53     INFO - TinderboxPrint: marionette<br/>0/0/0
18:01:53     INFO - Marionette exited with return code -15: FAILURE
18:01:53    ERROR - # TBPL FAILURE #
Flags: needinfo?(dburns)
(In reply to Stefan Hindli [:stefan_hindli] from comment #16)
> Backed out changeset 3919fa7286d5 (bug 1321517) for MnH and en-US failures
> on a CLOSED TREE
> Problematic push:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&revision=3919fa7286d5e52bdf453ada871c0dd89100f915&filter-
> resultStatus=testfailed&filter-resultStatus=busted&filter-
> resultStatus=exception&filter-resultStatus=success&filter-
> resultStatus=pending&filter-resultStatus=running&filter-
> searchStr=085c9a552f6e73e451bee8cd4548abf540f20929
> Failure:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-
> resultStatus=testfailed&filter-resultStatus=busted&filter-
> resultStatus=exception&filter-resultStatus=success&filter-
> resultStatus=pending&filter-
> resultStatus=running&fromchange=7fc5f960fe377f9f2f3189a18dd8eb54cd1d2677&filt
> er-searchStr=085c9a552f6e73e451bee8cd4548abf540f20929&selectedJob=170701741
> Backout:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&revision=c2968b0879602b07d6b1f03b852ffe4815700fc9&filter-
> resultStatus=testfailed&filter-resultStatus=busted&filter-
> resultStatus=exception&filter-classifiedState=unclassified
> Log:
> https://treeherder.mozilla.org/logviewer.html#?job_id=170701741&repo=mozilla-
> inbound&lineNumber=761
> 
> 17:44:54     INFO -  TypeError: coercing to Unicode: need string or buffer,
> NoneType found
> 17:44:59     INFO -  --DOCSHELL 0x12a2cf000 == 4 [pid = 637] [id =
> {f616bb3a-767b-7a46-8fef-daac775caba9}]
> 17:45:04     INFO -  --DOMWINDOW == 11 (0x11eb52800) [pid = 637] [serial =
> 10] [outer = 0x0] [url = about:blank]
> 17:45:04     INFO -  --DOMWINDOW == 10 (0x11eb52c00) [pid = 637] [serial =
> 11] [outer = 0x0] [url = about:blank]
> 17:45:06     INFO -  --DOMWINDOW == 9 (0x129aa9c00) [pid = 637] [serial = 6]
> [outer = 0x0] [url = about:blank]
> 17:45:08     INFO -  --DOMWINDOW == 2 (0x11498bc00) [pid = 640] [serial = 2]
> [outer = 0x0] [url = about:blank]
> 17:45:11     INFO -  --DOMWINDOW == 2 (0x12139d000) [pid = 641] [serial = 2]
> [outer = 0x0] [url = about:blank]
> 17:45:13     INFO -  [Parent 637, StreamTrans #52] WARNING: 'NS_FAILED(rv)',
> file /builds/worker/workspace/build/src/modules/libjar/nsJARChannel.cpp,
> line 428
> 17:45:13     INFO -  --DOMWINDOW == 8 (0x12a183400) [pid = 637] [serial = 7]
> [outer = 0x0] [url = about:blank]
> 18:01:53     INFO - Automation Error: mozprocess timed out after 1000
> seconds running
> ['/Users/cltbld/tasks/task_1522192694/build/venv/bin/python', '-u',
> '/Users/cltbld/tasks/task_1522192694/build/tests/marionette/harness/
> marionette_harness/runtests.py', '--headless', '--gecko-log=-',
> '--log-raw=-', '-vv',
> '--log-raw=/Users/cltbld/tasks/task_1522192694/build/blobber_upload_dir/
> marionette_raw.log',
> '--log-errorsummary=/Users/cltbld/tasks/task_1522192694/build/
> blobber_upload_dir/marionette_errorsummary.log',
> '--log-html=/Users/cltbld/tasks/task_1522192694/build/blobber_upload_dir/
> report.html',
> '--binary=/Users/cltbld/tasks/task_1522192694/build/application/Firefox
> NightlyDebug.app/Contents/MacOS/firefox', '--address=localhost:2828',
> '--symbols-path=/Users/cltbld/tasks/task_1522192694/build/symbols',
> '/Users/cltbld/tasks/task_1522192694/build/tests/marionette/tests/testing/
> marionette/harness/marionette_harness/tests/unit-tests.ini']
> 18:01:53    ERROR - timed out after 1000 seconds of no output
> 18:01:53    ERROR - Return code: -15
> 18:01:53    ERROR - No checks run.
> 18:01:53    ERROR - No suite end message was emitted by this harness.
> 18:01:53     INFO - TinderboxPrint: marionette<br/><em
> class="testfail">T-FAIL</em>
> 18:01:53     INFO - gecko.log not found
> 18:01:53     INFO - TinderboxPrint: marionette<br/>0/0/0
> 18:01:53     INFO - Marionette exited with return code -15: FAILURE
> 18:01:53    ERROR - # TBPL FAILURE #

What does this mean?  Does the caller need to pass the exception to its caller(s) and so on?  timed out after 1000 seconds of no output is the only line that stands out to me.
What I really find interesting in this case is that this triggers the 1000s timeout failure of mozprocess permanently. For me it means that if there is an exception around the changed code in httpd.py we always hang.

(In reply to Venkatesh from comment #17)
> What does this mean?  Does the caller need to pass the exception to its
> caller(s) and so on?  timed out after 1000 seconds of no output is the only
> line that stands out to me.

So the failure is in `os.path.exists(ssl_cert)`, which means it uses `None` as argument for the call. How can this be possible? It happens because for a HTTP server there is no SSL cert and key:

https://dxr.mozilla.org/mozilla-central/rev/b906009d875d1f5d29b0d1252cdb43a9b1a5889c/testing/marionette/harness/marionette_harness/runner/serve.py#143

It means we can only do this check when there has been a cert and key specified.
Flags: needinfo?(dburns)
(In reply to Henrik Skupin (:whimboo) from comment #18)
> What I really find interesting in this case is that this triggers the 1000s
> timeout failure of mozprocess permanently. For me it means that if there is
> an exception around the changed code in httpd.py we always hang.

I moved this out to bug 1449482.
(In reply to Henrik Skupin (:whimboo) from comment #18)
> What I really find interesting in this case is that this triggers the 1000s
> timeout failure of mozprocess permanently. For me it means that if there is
> an exception around the changed code in httpd.py we always hang.
> 
> (In reply to Venkatesh from comment #17)
> > What does this mean?  Does the caller need to pass the exception to its
> > caller(s) and so on?  timed out after 1000 seconds of no output is the only
> > line that stands out to me.
> 
> So the failure is in `os.path.exists(ssl_cert)`, which means it uses `None`
> as argument for the call. How can this be possible? It happens because for a
> HTTP server there is no SSL cert and key:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> b906009d875d1f5d29b0d1252cdb43a9b1a5889c/testing/marionette/harness/
> marionette_harness/runner/serve.py#143
> 
> It means we can only do this check when there has been a cert and key
> specified.

For this bit, will the diff below work?  I missed the not None check.  Or should I be posting all this in bug 144982 instead?  Also, how do I run this test locally?

diff --git a/testing/marionette/harness/marionette_harness/runner/httpd.py b/testing/marionette/harness/marionette_harness/runner/httpd.py
--- a/testing/marionette/harness/marionette_harness/runner/httpd.py
+++ b/testing/marionette/harness/marionette_harness/runner/httpd.py
@@ -101,9 +101,9 @@ class FixtureServer(object):
         if port is None:
             port = 0
 
-        if not os.path.exists(ssl_cert):
+        if ssl_cert is not None and not os.path.exists(ssl_cert):
             raise ValueError("SSL certificate file not found: {}".format(ssl_cert))
-        if not os.path.exists(ssl_key):
+        if ssl_key is not None and not os.path.exists(ssl_key):
             raise ValueError("SSL key file not found: {}".format(ssl_key))
 
         routes = [("POST", "/file_upload", upload_handler),
Flags: needinfo?(hskupin)
It should but I just noticed that wptserve which we hand-over those cert and key files actually checks for the existence itself:

https://dxr.mozilla.org/mozilla-central/rev/b906009d875d1f5d29b0d1252cdb43a9b1a5889c/testing/web-platform/tests/tools/wptserve/wptserve/server.py#401-404

Maybe we should better improve the failure message as thrown over there instead of in a customer like Marionette only. Andreas, what is your take?
Flags: needinfo?(hskupin) → needinfo?(ato)
So whimboo is right that the message should in fact be improved in
wptserve.  The more fundamental problem here is that we still don’t
exit serve.py when the ValueError (or, currently, AssertionError)
is raised.

When the subprocess for the HTTPS server errors on not finding the
certificate or key file, the exception is logged to stdout but because
we don’t communicate it back to the parent process we don’t take
any appropriate action as a consequence.  What we need to do is make
the parent process aware there was a problem when init_func is called
so that it can re-raise the exception and cause the parent process
to exit.  This will also cause the HTTP server to shut down as a result.

A digression here is that serve.py is making use of all the
wrong primitives.  Instead of using a multiprocessing.Queue
as it should be doing, it is trying to build these primitives
over a multiprocessing.Pipe, requiring it to make use of a lock.
I realise this code was written by me many years ago, so I can only
blame myself.  This really ought to be rewritten to be more in line
with good Python multiprocessing use.

In any case, I have come up with a patch that re-uses Venkatesh
original work, but fixes the hang.
Flags: needinfo?(ato)
Attachment #8962557 - Attachment is obsolete: true
Comment on attachment 8963161 [details]
Bug 1321517 - Improve error when unable to find SSL key or cert.

https://reviewboard.mozilla.org/r/232008/#review237558
Attachment #8963161 - Flags: review?(ato) → review+
Comment on attachment 8963162 [details]
Bug 1321517 - Propagate exception from server subprocess and re-raise.

https://reviewboard.mozilla.org/r/232010/#review237834

It's great to see that this prevents the hang. Thank you Andreas! 

Can you also please continue to work with Venkatesh so we can get proper assertion messages added to wptserve? Right now only an "AssertionError" is visible in the log output without any traceback. So it's pretty useless. Maybe we should move that to a different bug?

::: testing/marionette/harness/marionette_harness/runner/serve.py:122
(Diff revision 1)
>          self.proc.daemon = True
>          self.proc.start()
>  
> +        res, exc = self.recv()
> +        if res == "stop":
> +            raise exc

I don't see any exception details being available here. So the thrown exception is pretty useless. 

Also you might want to us reraise to keep the original traceback. See marionette.py in how to do it.
Attachment #8963162 - Flags: review?(hskupin) → review+
Comment on attachment 8963162 [details]
Bug 1321517 - Propagate exception from server subprocess and re-raise.

https://reviewboard.mozilla.org/r/232010/#review237834

I repurposed Venkatesh’s patch in the first commit.  Did you not
apply this when you tested it?

> I don't see any exception details being available here. So the thrown exception is pretty useless. 
> 
> Also you might want to us reraise to keep the original traceback. See marionette.py in how to do it.

What of marionette.py precisely?  Are you sure that will work across
processes?
Comment on attachment 8963161 [details]
Bug 1321517 - Improve error when unable to find SSL key or cert.

https://reviewboard.mozilla.org/r/232008/#review237874

::: testing/web-platform/tests/tools/wptserve/wptserve/server.py:406
(Diff revision 1)
>          if use_ssl:
>              if key_file is not None:
>                  assert os.path.exists(key_file)
>              assert certificate is not None and os.path.exists(certificate)
>  
> +        if use_ssl:

You missed to remove the old code.
Attachment #8963161 - Flags: review-
Comment on attachment 8963162 [details]
Bug 1321517 - Propagate exception from server subprocess and re-raise.

https://reviewboard.mozilla.org/r/232010/#review237834

Oh, I missed that because I only checked my review request at this time. I applied now and it didn't help because you missed to remove the old code. With having it removed it should all work.

> What of marionette.py precisely?  Are you sure that will work across
> processes?

Hm, this is an issue I definitely cleared before submitting my review comments. Strange.

With your changes to wptserve in the other commit the exception message will make it to the parent process and the logged exception will be fine. Thanks. It's only sad that the traceback won't be visible because pickle cannot serialize/deserialize it across processes.
Re-assigning to me so I don’t forget to fixup the patches.
Assignee: venkateshpitta → ato
Pushed by atolfsen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c7abc1d5b6ec
Improve error when unable to find SSL key or cert.  r=ato
https://hg.mozilla.org/integration/autoland/rev/1f158432b801
Propagate exception from server subprocess and re-raise. r=whimboo
https://hg.mozilla.org/mozilla-central/rev/c7abc1d5b6ec
https://hg.mozilla.org/mozilla-central/rev/1f158432b801
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Blocks: 1391545
Venkatech, thank you for your contribution. In case you want to learn more in Python please let us know and we can find another interesting bug for you.
Watching you two fix this bug was educational.  Thank you, Andreas and Henrik.  I am happy to work on any bug you suggest or recommend.
Whiteboard: [lang=py] → [lang=py][wptsync upstream error]
Created web-platform-tests PR https://github.com/w3c/web-platform-tests/pull/10416 for changes under testing/web-platform/tests
Whiteboard: [lang=py][wptsync upstream error] → [lang=py][wptsync upstream]
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: