Bugzilla

Updated

•

7 years ago

Depends on: 1409049

Comment 4

•

7 years ago

The orange factor report [1] seems to indicate that this is due to
a timeout, and because it mostly occurs on pgo and debug builds it
is probable to think this is because the timeout of the test is too
short.

I will try increasing it by marking it as a ‘long’ test in
https://bugzil.la/1409049.

  [1] https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&entireHistory=true&tree=all

Flags: needinfo?(ato)

Comment 5

•

7 years ago

(In reply to Andreas Tolfsen ‹:ato› from comment #4)

> The orange factor report [1] seems to indicate that this is due to
> a timeout, and because it mostly occurs on pgo and debug builds it
> is probable to think this is because the timeout of the test is
> too short.

Scratch that.  This is an overall 1000 seconds (16.6 minutes)
timeout from mozprocess, meaning wptrunner or something else stops
responding.  I would expect wptrunner to abort the test much sooner
and return with a TEST-TIMEOUT status.

Henrietta Maior [:henrietta_maior]

Updated

•

7 years ago

No longer depends on: 1409049

Comment hidden (Intermittent Failures Robot)

91 failures in 864 pushes (0.105 failures/push) were associated with this bug in the last 7 days. 

This is the #9 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 43
* mozilla-inbound: 22
* mozilla-central: 10
* try: 9
* mozilla-beta: 7

Platform breakdown:
* linux64: 85
* linux64-nightly: 2
* linux64-devedition: 2
* linux64-jsdcov: 1
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-10-16&endday=2017-10-22&tree=all

Comment 7

•

7 years ago

:ato Will you be able to look into this?

Flags: needinfo?(ato)

Comment 8

•

7 years ago

(In reply to Henrietta Maior [:henrietta_maior] from comment #7)

> :ato Will you be able to look into this?

I already replied to your needinfo request.  I believe this is
something more systemic to wptrunner or an infrastructure issue.

Cc’ing jgraham to see if he can shed some light on this, or at
least move the bug to the correct component.

Flags: needinfo?(ato)

Comment hidden (Intermittent Failures Robot)

15 failures in 196 pushes (0.077 failures/push) were associated with this bug yesterday.    

** This test has failed more than 200 times in the last 30 days. It should be disabled until it can be fixed. ** 

Repository breakdown:
* autoland: 9
* mozilla-inbound: 5
* try: 1

Platform breakdown:
* linux64: 15

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-10-26&endday=2017-10-26&tree=all

Assignee

Updated

•

7 years ago

Flags: needinfo?(james)

Comment hidden (Intermittent Failures Robot)

22 failures in 151 pushes (0.146 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 12
* mozilla-inbound: 5
* mozilla-central: 3
* try: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 21
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-10-27&endday=2017-10-27&tree=all

Comment hidden (Intermittent Failures Robot)

97 failures in 912 pushes (0.106 failures/push) were associated with this bug in the last 7 days. 

This is the #11 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 53
* mozilla-inbound: 21
* mozilla-central: 13
* mozilla-beta: 6
* try: 4

Platform breakdown:
* linux64: 87
* linux64-nightly: 5
* linux64-devedition: 2
* linux32-stylo-disabled: 2
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-10-23&endday=2017-10-29&tree=all

Pulsebot

Comment 12

•

7 years ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/879f2ee477b4
Disable /webdriver/tests/actions/key.py on linux for frequent failures. r=me, a=testonly

Comment 13

•

7 years ago

I disabled this test for having >200 failures in the last 30 days, please remember to enable this when testing a fix

Keywords: leave-open

Whiteboard: [stockwell disable-recommended] → [stockwell disabled]

Comment 14

•

7 years ago

maja_zf: Just to make you aware, the WPT key tests are now disabled
due to some problem in wptrunner.

Comment 15

•

7 years ago

To be clear, the diagnosis that the problem's in wptrunner doesn't strike me as very certain. wptrunner tries to run a timer thread that will stop waiting for the pytest results if they exceed the timeout. And presumably that works at least sometimes since we get TIMEOUT results. But in this case we apparently don't get that far, which probably is a bug in wptrunner in that it's not handling some behaviour of pytest, but I expect that something bad is happening when the test runs so that the python interpreter never gets to run the main thread and so we never report an EXTERNAL-TIMEOUT for this and so kill the TestRunner process and continue.

Without someone managing to debug the problem I don't think it's clear what the underlying fault is.

Flags: needinfo?(james)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Assignee

Updated

•

7 years ago

status-firefox58: --- → disabled

Keywords: test-disabled

Comment 16

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/879f2ee477b4

Maja Frydrychowicz :impossibus (was :maja_zf) (needinfo me)

Assignee

Comment 17

•

7 years ago

Please also see bug 1391545 for a similar output affecting Marionette tests.

Comment hidden (Intermittent Failures Robot)

16 failures in 180 pushes (0.089 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 8
* try: 6
* mozilla-inbound: 1
* mozilla-central: 1

Platform breakdown:
* linux64: 13
* linux64-nightly: 2
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-01&endday=2017-11-01&tree=all

Comment 19

•

7 years ago

I don't think this failure has anything to do with the test that got disabled (key.py). It just happened to be ordered as the first test to run. While investigating another intermittent, I'm still seeing this failure a lot despite key.py being disabled.

Maja Frydrychowicz :impossibus (was :maja_zf) (needinfo me)

Comment 20

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d545d2c66879a61a6d6fc1008604772e257375d8

Comment hidden (Intermittent Failures Robot)

16 failures in 143 pushes (0.112 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 11
* mozilla-beta: 2
* mozilla-release: 1
* mozilla-inbound: 1
* mozilla-central: 1

Platform breakdown:
* linux64: 14
* linux64-nightly: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-02&endday=2017-11-02&tree=all

Arthur Iakab [arthur_iakab]

Comment 22

•

7 years ago

now these are failing on:
TEST-START | /webdriver/tests/actions/modifier_click.py

So comment 19 is spot on that we are not executing any tests, we are just hanging.

Comment hidden (Intermittent Failures Robot)

86 failures in 857 pushes (0.1 failures/push) were associated with this bug in the last 7 days. 

This is the #18 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 43
* mozilla-inbound: 17
* mozilla-central: 13
* try: 9
* mozilla-beta: 3
* mozilla-release: 1

Platform breakdown:
* linux64: 72
* linux64-ccov: 6
* linux64-nightly: 5
* linux64-devedition: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-10-30&endday=2017-11-05&tree=all

Comment hidden (Intermittent Failures Robot)

17 failures in 138 pushes (0.123 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 11
* mozilla-inbound: 3
* try: 2
* mozilla-central: 1

Platform breakdown:
* linux64: 17

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-06&endday=2017-11-06&tree=all

Comment 26

•

7 years ago

This failure has occurred 100 times in the last 7 days. It has failed only on Linux x64 and on opt,pgo,debug build types.
Failing test suites: tests-wdspec-e10s, marionette-e10s, opt-mochitest-3, platform-tests-e10s.

Link to a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=144012458&lineNumber=1229 

Part of the log: 

task 2017-11-11T19:21:33.560Z] 19:21:33     INFO - WebDriver HTTP server listening at http://127.0.0.1:4444/
1228
[task 2017-11-11T19:21:33.560Z] 19:21:33     INFO - TEST-START | /webdriver/tests/actions/key_shortcuts.py
1229
[task 2017-11-11T19:38:13.591Z] 19:38:13     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=https://queue.taskcluster.net/v1/task/daJK4fMkTQC-LumF6CRNug/artifacts/public/build/target.crashreporter-symbols.zip', '--stackwalk-binary=/usr/local/bin/linux64-minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--run-by-dir=3', '--test-type=wdspec', '--stylo-threads=4', '--webdriver-binary=/builds/worker/workspace/build/tests/bin/geckodriver', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--processes=1', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil']
1230
[task 2017-11-11T19:38:13.625Z] 19:38:13    ERROR - timed out after 1000 seconds of no output
1231
[task 2017-11-11T19:38:13.626Z] 19:38:13    ERROR - Return code: -15
1232
[task 2017-11-11T19:38:13.626Z] 19:38:13    ERROR - No suite end message was emitted by this harness.
1233
[task 2017-11-11T19:38:13.627Z] 19:38:13    ERROR - # TBPL FAILURE #
1234
[task 2017-11-11T19:38:13.627Z] 19:38:13  WARNING - setting return code to 2
1235
[task 2017-11-11T19:38:13.628Z] 19:38:13     INFO - Running post-action listener: _package_coverage_data
1236
[task 2017-11-11T19:38:13.628Z] 19:38:13     INFO - Running post-action listener: _resource_record_post_action
1237
[task 2017-11-11T19:38:13.628Z] 19:38:13     INFO - [mozharness: 2017-11-11 19:38:13.628489Z] Finished run-tests step (success)
1238
[task 2017-11-11T19:38:13.628Z] 19:38:13     INFO - Running post-run listener: _resource_record_post_run
1239
[task 2017-11-11T19:38:13.728Z] 19:38:13     INFO - Total resource usage - Wall time: 1022s; CPU: 1.0%; Read bytes: 1077248; Write bytes: 1633046528; Read time: 60; Write time: 1293480
1240
[task 2017-11-11T19:38:13.728Z] 19:38:13     INFO - TinderboxPrint: CPU usage<br/>0.9%
1241
[task 2017-11-11T19:38:13.728Z] 19:38:13     INFO - TinderboxPrint: I/O read bytes / time<br/>1,077,248 / 60
1242
[task 2017-11-11T19:38:13.729Z] 19:38:13     INFO - TinderboxPrint: I/O write bytes / time<br/>1,633,046,528 / 1,293,480
1243
[task 2017-11-11T19:38:13.729Z] 19:38:13     INFO - TinderboxPrint: CPU idle<br/>4,038.3 (99.1%)
1244
[task 2017-11-11T19:38:13.729Z] 19:38:13     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
1245
[task 2017-11-11T19:38:13.730Z] 19:38:13     INFO - pull - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
1246
[task 2017-11-11T19:38:13.732Z] 19:38:13     INFO - install - Wall time: 18s; CPU: 28.0%; Read bytes: 139264; Write bytes: 33554432; Read time: 28; Write time: 110332
1247
[task 2017-11-11T19:38:13.743Z] 19:38:13     INFO - run-tests - Wall time: 1004s; CPU: 0.0%; Read bytes: 937984; Write bytes: 1595269120; Read time: 32; Write time: 1182940
1248
[task 2017-11-11T19:38:14.070Z] 19:38:14     INFO - Running post-run listener: _upload_blobber_files
1249
[task 2017-11-11T19:38:14.070Z] 19:38:14  WARNING - Blob upload gear skipped. Missing cmdline options.
1250
[task 2017-11-11T19:38:14.070Z] 19:38:14     INFO - Running post-run listener: copy_logs_to_upload_dir
1251
[task 2017-11-11T19:38:14.071Z] 19:38:14     INFO - Copying logs to upload dir...
1252
[task 2017-11-11T19:38:14.071Z] 19:38:14     INFO - mkdir: /builds/worker/workspace/build/upload/logs
1253
[task 2017-11-11T19:38:14.073Z] 19:38:14     INFO - Copying logs to upload dir...
1254
[task 2017-11-11T19:38:14.075Z] 19:38:14  WARNING - returning nonzero exit status 2
1255
[task 2017-11-11T19:38:14.099Z] cleanup
1256
[task 2017-11-11T19:38:14.099Z] + cleanup
1257
[task 2017-11-11T19:38:14.099Z] + local rv=2
1258
[task 2017-11-11T19:38:14.099Z] + [[ -s /builds/worker/.xsession-errors ]]
1259
[task 2017-11-11T19:38:14.099Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
1260
[task 2017-11-11T19:38:14.101Z] + true
1261
[task 2017-11-11T19:38:14.101Z] + cleanup_xvfb
1262
[task 2017-11-11T19:38:14.102Z] pidof Xvfb
1263
[task 2017-11-11T19:38:14.102Z] ++ pidof Xvfb
1264
[task 2017-11-11T19:38:14.105Z] + local xvfb_pid=25
1265
[task 2017-11-11T19:38:14.105Z] + local vnc=false
1266
[task 2017-11-11T19:38:14.105Z] + local interactive=false
1267
[task 2017-11-11T19:38:14.106Z] + '[' -n 25 ']'
1268
[task 2017-11-11T19:38:14.106Z] + [[ false == false ]]
1269
[task 2017-11-11T19:38:14.106Z] + [[ false == false ]]
1270
[task 2017-11-11T19:38:14.106Z] + kill 25
1271
[task 2017-11-11T19:38:14.107Z] + screen -XS xvfb quit
1272
[task 2017-11-11T19:38:14.107Z] XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
1273
[task 2017-11-11T19:38:14.107Z]       after 783 requests (783 known processed) with 0 events remaining.
1274
[task 2017-11-11T19:38:14.111Z] compizconfig - Info: Backend     : ini
1275
[task 2017-11-11T19:38:14.112Z] compizconfig - Info: Integration : true
1276
[task 2017-11-11T19:38:14.112Z] compizconfig - Info: Profile     : default
1277
[task 2017-11-11T19:38:14.157Z] No screen session found.
1278
[task 2017-11-11T19:38:14.157Z] + true
1279
[task 2017-11-11T19:38:14.157Z] + exit 2
1280
[taskcluster 2017-11-11 19:38:14.457Z] === Task Finished ===
1281
[taskcluster 2017-11-11 19:38:19.671Z] Unsuccessful task run with exit code: 2 completed in 2514.307 seconds

:whimboo
Do you have any updates?

Flags: needinfo?(hskupin)

Assignee

Comment 27

•

7 years ago

(In reply to OrangeFactor Robot from comment #24)
> For more details, see:
> https://brasstacks.mozilla.com/orangefactor/
> ?display=Bug&bugid=1407383&startday=2017-10-30&endday=2017-11-05&tree=all

If that is becoming a blocking issue James, Andreas, or Maja should have a look at this bug. They worked most on wptrunner. But I'm not sure whom of them have the time right now.

Flags: needinfo?(mjzffr)

Flags: needinfo?(james)

Flags: needinfo?(ato)

Comment 28

•

7 years ago

Something that's happening on mochitest, marionette and wpt clearly isn't a single bug in the wpt test harness. There are either multiple unrelated issues being bucketed together here, or it's in a lower layer like mozharness or marionette.

Assignee

Comment 29

•

7 years ago

Correct, but someone would have to investigate that first. wptrunner handles the process itself, and as such it cannot be related to the process handling in Marionette. Although we have a similar issue there (bug 1391545).

Flags: needinfo?(hskupin)

Comment hidden (Intermittent Failures Robot)

104 failures in 849 pushes (0.122 failures/push) were associated with this bug in the last 7 days. 

This is the #11 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 53
* mozilla-inbound: 21
* mozilla-central: 18
* try: 8
* mozilla-release: 2
* mozilla-beta: 2

Platform breakdown:
* linux64: 96
* linux64-ccov: 3
* linux64-jsdcov: 2
* linux32: 2
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-06&endday=2017-11-12&tree=all

Comment 31

•

7 years ago

Has anyone found a reliable way to reproduce this locally?

Comment 32

•

7 years ago

looking at this bug, Firefox is not starting up. There is 0 output from firefox or from Marionette after geckdriver starts up for wdspec tests. For mochitest we dont use geckodriver so thats not the cause there

Assignee

Comment 33

•

7 years ago

If this is happening fairly consistent the best option is really to just request a one click loaner and observe the tests via VNC. Given that those failures are on Linux, it is very easy to handle via Treeherder/Taskcluster.

Assignee

Updated

•

7 years ago

Depends on: 1418227

Comment hidden (Intermittent Failures Robot)

91 failures in 762 pushes (0.119 failures/push) were associated with this bug in the last 7 days. 

This is the #3 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 34
* mozilla-inbound: 31
* mozilla-central: 14
* mozilla-beta: 7
* try: 5

Platform breakdown:
* linux64: 52
* linux32: 30
* linux64-devedition: 3
* linux64-ccov: 3
* windows7-32: 1
* linux64-jsdcov: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-13&endday=2017-11-19&tree=all

Comment hidden (Intermittent Failures Robot)

33 failures in 146 pushes (0.226 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 20
* mozilla-inbound: 10
* mozilla-central: 3

Platform breakdown:
* linux64: 20
* linux32: 12
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-23&endday=2017-11-23&tree=all

Andrei Ciure[:aciure]

Comment 36

•

7 years ago

There were 144 failures related to this bug in the last 7 days on Linux.

:whimboo any updates for this, it has spiked pretty high lately?

Andrei Ciure[:aciure]

Updated

•

7 years ago

Whiteboard: [stockwell disabled] → [stockwell needswork]

Maja Frydrychowicz :impossibus (was :maja_zf) (needinfo me)

Assignee

Comment 37

•

7 years ago

(In reply to Andrei Ciure[:andrei_ciure_] from comment #36)
> :whimboo any updates for this, it has spiked pretty high lately?

I'm not the only one who has to work on all the intermittent failures. That doesn't scale. As such I put ni? for James, Andreas, and Maja a while ago. I hope one of those can find the time to check specifically this bug.

Comment hidden (Intermittent Failures Robot)

7 failures in 44 pushes (0.159 failures/push) were associated with this bug yesterday.    

** This test has failed more than 200 times in the last 30 days. It should be disabled until it can be fixed. ** 

Repository breakdown:
* autoland: 4
* mozilla-inbound: 1
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 5
* linux64-nightly: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-25&endday=2017-11-25&tree=all

Comment hidden (Intermittent Failures Robot)

138 failures in 744 pushes (0.185 failures/push) were associated with this bug in the last 7 days. 

This is the #5 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 83
* mozilla-inbound: 35
* mozilla-central: 12
* mozilla-beta: 5
* try: 3

Platform breakdown:
* linux64: 81
* linux32: 46
* linux64-nightly: 4
* linux64-jsdcov: 3
* linux64-devedition: 2
* linux64-stylo-disabled: 1
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-20&endday=2017-11-26&tree=all

Updated

•

7 years ago

Flags: needinfo?(mjzffr)

Comment 40

•

7 years ago

James, Andreas,

Looking at the log it looks like we never get to the point where we call Firefox to startup (if I understand the flow).

# An Example of a test running

[task 2017-11-20T10:46:49.696Z] 10:46:49     INFO - Starting http server on 127.0.0.1:8000
[task 2017-11-20T10:46:49.697Z] 10:46:49     INFO - Starting http server on 127.0.0.1:8443
[task 2017-11-20T10:46:49.706Z] 10:46:49     INFO - Starting http server on 127.0.0.1:8001
[task 2017-11-20T10:46:49.717Z] 10:46:49     INFO - PID 852 | 1511174809713	geckodriver	INFO	geckodriver 0.19.1 ( '589acfe789d8'  '2017-11-20 13:59 +0800')
[task 2017-11-20T10:46:49.721Z] 10:46:49     INFO - PID 852 | 1511174809716	geckodriver	INFO	Listening on 127.0.0.1:4444
[task 2017-11-20T10:46:50.220Z] 10:46:50     INFO - WebDriver HTTP server listening at http://127.0.0.1:4444/
[task 2017-11-20T10:46:50.221Z] 10:46:50     INFO - TEST-START | /webdriver/tests/actions/modifier_click.py
[task 2017-11-20T10:46:50.540Z] 10:46:50     INFO - STDOUT: ============================= test session starts ==============================
[task 2017-11-20T10:46:50.540Z] 10:46:50     INFO - STDOUT: platform linux2 -- Python 2.7.12, pytest-2.9.1, py-1.4.31, pluggy-0.3.1 -- /builds/worker/workspace/build/venv/bin/python
[task 2017-11-20T10:46:50.541Z] 10:46:50     INFO - STDOUT: rootdir: /builds/worker/workspace/build/tests/web-platform/tests/webdriver/tests/actions, inifile: 
[task 2017-11-20T10:46:50.541Z] 10:46:50     INFO - STDOUT: collecting ... 
[task 2017-11-20T10:46:50.558Z] 10:46:50     INFO - STDOUT: collected 9 items
[task 2017-11-20T10:46:50.558Z] 10:46:50     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/modifier_click.py::test_modifier_click[modifier0-ctrlKey] 
[task 2017-11-20T10:46:50.558Z] 10:46:50     INFO - PID 852 | 1511174810555	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.IRSxkIla0Kpi"
[task 2017-11-20T10:46:52.629Z] 10:46:52     INFO - PID 852 | [865, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/extensions/cookie/nsPermissionManager.cpp, line 2872
[task 2017-11-20T10:47:19.603Z] 10:47:19     INFO - PID 852 | 1511174839597	Marionette	INFO	Enabled via --marionette

# An example of a failure

[task 2017-11-20T10:44:56.874Z] 10:44:56     INFO - TEST-SKIP | /webdriver/tests/contexts/maximize_window.py | took 0ms
[task 2017-11-20T10:44:56.891Z] 10:44:56     INFO - Starting runner
[task 2017-11-20T10:44:56.899Z] 10:44:56     INFO - Starting http server on 127.0.0.1:8443
[task 2017-11-20T10:44:56.900Z] 10:44:56     INFO - Starting http server on 127.0.0.1:8000
[task 2017-11-20T10:44:56.917Z] 10:44:56     INFO - PID 852 | 1511174696914	geckodriver	INFO	geckodriver 0.19.1 ( '589acfe789d8'  '2017-11-20 13:59 +0800')
[task 2017-11-20T10:44:56.921Z] 10:44:56     INFO - PID 852 | 1511174696918	geckodriver	INFO	Listening on 127.0.0.1:4444
[task 2017-11-20T10:44:57.421Z] 10:44:57     INFO - WebDriver HTTP server listening at http://127.0.0.1:4444/
[task 2017-11-20T10:44:57.422Z] 10:44:57     INFO - TEST-START | /webdriver/tests/actions/key_shortcuts.py
[task 2017-11-20T11:01:37.471Z] 11:01:37     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=https://queue.taskcluster.net/v1/task/O13YgUo-SwOE8BF5SXh-kw/artifacts/public/build/target.crashreporter-symbols.zip', '--stackwalk-binary=/usr/local/bin/linux64-minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--run-by-dir=3', '--test-type=wdspec', '--stylo-threads=4', '--webdriver-binary=/builds/worker/workspace/build/tests/bin/geckodriver', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--processes=1', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil']
[task 2017-11-20T11:01:37.505Z] 11:01:37    ERROR - timed out after 1000 seconds of no output
[task 2017-11-20T11:01:37.506Z] 11:01:37    ERROR - Return code: -15
[task 2017-11-20T11:01:37.508Z] 11:01:37    ERROR - No suite end message was emitted by this harness.

The only obvious this for me is that Firefox doesnt start up but also looking we don't see any logging from mozrunner being emitted either from GeckoDriver.

E.g. mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.IRSxkIla0Kpi" is not shown anywhere.

Comment 41

•

7 years ago

(In reply to David Burns :automatedtester from comment #40)

> Looking at the log it looks like we never get to the point where
> we call Firefox to startup (if I understand the flow).

So based on the log you pasted, geckodriver starts but never accepts
a new connection.  It would be extremely useful here, I think, with
a trace-level log from geckodriver.

I wonder if doing a try build with lots of Wd retriggers and a patch
to pass --webdriver-arg='-vv' by default would show us anything
interesting.

Flags: needinfo?(ato)

Comment hidden (Intermittent Failures Robot)

16 failures in 137 pushes (0.117 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 6
* mozilla-inbound: 4
* mozilla-central: 2
* mozilla-beta: 2
* try: 1
* mozilla-release: 1

Platform breakdown:
* linux64: 8
* linux32: 6
* linux64-nightly: 1
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-27&endday=2017-11-27&tree=all

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/203478/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/203478/

Comment 44

•

7 years ago

try run with verbose logging https://treeherder.mozilla.org/#/jobs?repo=try&revision=2fd843112adc&selectedJob=148141221

Updated

•

7 years ago

Flags: needinfo?(james)

Comment 45

•

7 years ago

looking at the log with more verbose logging we don't get pytest starting up and collecting tests

Good run

[task 2017-11-28T12:25:29.916Z] 12:25:29     INFO - PID 847 | 1511871929913	geckodriver	INFO	geckodriver 0.19.1 ( '2fd843112adc'  '2017-11-28 12:01 +0000')
[task 2017-11-28T12:25:29.916Z] 12:25:29     INFO - Starting http server on 127.0.0.1:8443
[task 2017-11-28T12:25:29.919Z] 12:25:29     INFO - PID 847 | 1511871929913	webdriver::httpapi	DEBUG	Creating routes
[task 2017-11-28T12:25:29.926Z] 12:25:29     INFO - PID 847 | 1511871929918	geckodriver	INFO	Listening on 127.0.0.1:4444
[task 2017-11-28T12:25:30.418Z] 12:25:30     INFO - WebDriver HTTP server listening at http://127.0.0.1:4444/
[task 2017-11-28T12:25:30.418Z] 12:25:30     INFO - TEST-START | /webdriver/tests/actions/key_shortcuts.py
[task 2017-11-28T12:25:30.762Z] 12:25:30     INFO - STDOUT: ============================= test session starts ==============================
[task 2017-11-28T12:25:30.763Z] 12:25:30     INFO - STDOUT: platform linux2 -- Python 2.7.12, pytest-2.9.1, py-1.4.31, pluggy-0.3.1 -- /builds/worker/workspace/build/venv/bin/python
[task 2017-11-28T12:25:30.763Z] 12:25:30     INFO - STDOUT: rootdir: /builds/worker/workspace/build/tests/web-platform/tests/webdriver/tests/actions, inifile: 
[task 2017-11-28T12:25:30.764Z] 12:25:30     INFO - STDOUT: collecting ... 
[task 2017-11-28T12:25:30.780Z] 12:25:30     INFO - STDOUT: collected 3 items
[task 2017-11-28T12:25:30.780Z] 12:25:30     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/key_shortcuts.py::test_mod_a_and_backspace_deletes_all_text 
[task 2017-11-28T12:25:30.781Z] 12:25:30     INFO - PID 847 | 1511871930773	webdriver::server	DEBUG	-> POST /session {"capabilities": {"alwaysMatch": {"moz:firefoxOptions": {"binary": "/builds/worker/workspace/build/application/firefox/firefox", "prefs": {"network.dns.localDomains": "web-platform.test,www.web-platform.test,www1.web-platform.test,www2.web-platform.test,xn--n8j6ds53lwwkrqhv28a.web-platform.test,xn--lve-6lad.web-platform.test"}}}}}
[task 2017-11-28T12:25:30.784Z] 12:25:30     INFO - PID 847 | 1511871930777	mozrunner::runner	INFO	Running command: "/builds/worker/workspace/build/application/firefox/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.hpPAvne6jv1N"
[task 2017-11-28T12:25:30.785Z] 12:25:30     INFO - PID 847 | 1511871930780	geckodriver::marionette	TRACE	  connection attempt 0/600
[task 2017-11-28T12:25:30.886Z] 12:25:30     INFO - PID 847 | 1511871930880	geckodriver::marionette	TRACE	  connection attempt 1/600
[task 2017-11-28T12:25:30.988Z] 12:25:30     INFO - PID 847 | 1511871930981	geckodriver::marionette	TRACE	  connection attempt 2/600
[task 2017-11-28T12:25:31.089Z] 12:25:31     INFO - PID 847 | 1511871931081	geckodriver::marionette	TRACE	  connection attempt 3/600


bad run

[task 2017-11-28T13:27:29.259Z] 13:27:29     INFO - TEST-SKIP | /webdriver/tests/element_click/select.py | took 0ms
[task 2017-11-28T13:27:29.282Z] 13:27:29     INFO - Starting http server on 127.0.0.1:8001
[task 2017-11-28T13:27:29.282Z] 13:27:29     INFO - Starting runner
[task 2017-11-28T13:27:29.285Z] 13:27:29     INFO - Starting http server on 127.0.0.1:8000
[task 2017-11-28T13:27:29.294Z] 13:27:29     INFO - Starting http server on 127.0.0.1:8443
[task 2017-11-28T13:27:29.306Z] 13:27:29     INFO - PID 849 | 1511875649302	geckodriver	INFO	geckodriver 0.19.1 ( '2fd843112adc'  '2017-11-28 12:01 +0000')
[task 2017-11-28T13:27:29.307Z] 13:27:29     INFO - PID 849 | 1511875649302	webdriver::httpapi	DEBUG	Creating routes
[task 2017-11-28T13:27:29.310Z] 13:27:29     INFO - PID 849 | 1511875649306	geckodriver	INFO	Listening on 127.0.0.1:4444
[task 2017-11-28T13:27:29.809Z] 13:27:29     INFO - WebDriver HTTP server listening at http://127.0.0.1:4444/
[task 2017-11-28T13:27:29.810Z] 13:27:29     INFO - TEST-START | /webdriver/tests/actions/key_shortcuts.py
[task 2017-11-28T13:44:09.832Z] 13:44:09     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=https://queue.taskcluster.net/v1/task/UBMTC4VvSEOM_TnO_q7h0A/artifacts/public/build/target.crashreporter-symbols.zip', '--stackwalk-binary=/usr/local/bin/linux64-minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--run-by-dir=3', '--test-type=wdspec', '--stylo-threads=4', '--webdriver-binary=/builds/worker/workspace/build/tests/bin/geckodriver', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--processes=1', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil']
[task 2017-11-28T13:44:09.883Z] 13:44:09    ERROR - timed out after 1000 seconds of no output
[task 2017-11-28T13:44:09.883Z] 13:44:09    ERROR - Return code: -15
[task 2017-11-28T13:44:09.883Z] 13:44:09    ERROR - No suite end message was emitted by this harness.
[task 2017-11-28T13:44:09.884Z] 13:44:09    ERROR - # TBPL FAILURE #
[task 2017-11-28T13:44:09.884Z] 13:44:09  WARNING - setting return code to 2
[task 2017-11-28T13:44:09.885Z] 13:44:09     INFO - Running post-action listener: _package_coverage_data

because we dont get oytest starting we then never get a session request which then times out


Dave, can you suggest how we can delve deeper into this?

Flags: needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 46

•

7 years ago

Maybe this is also all related to bug 1421289, which is causing various 1000s timeouts for Mochitests, and Marionette tests.

Depends on: 1421289

Comment 47

•

7 years ago

> Dave, can you suggest how we can delve deeper into this?

I don't think I have enough context to add value here. The output in comment 60 suggests a TEST-START is being reported before the pytest session starts, which seems odd. Could you direct me to where pytest is invoked? I would suggest that either the issue is occurring before this, or perhaps pytest is throwing an exception that's being ignored.

Flags: needinfo?(dave.hunt)

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/2-3/

Comment hidden (Intermittent Failures Robot)

40 failures in 163 pushes (0.245 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 19
* mozilla-inbound: 17
* mozilla-central: 3
* mozilla-beta: 1

Platform breakdown:
* linux32: 24
* linux64: 14
* linux64-nightly: 1
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-28&endday=2017-11-28&tree=all

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/3-4/

Comment 52

•

7 years ago

So it looks like we are hitting some external timeout if we look at https://treeherder.mozilla.org/logviewer.html#?job_id=148432942&repo=try&lineNumber=1223

More debug statements going in.

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/4-5/

Comment hidden (Intermittent Failures Robot)

18 failures in 155 pushes (0.116 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 11
* mozilla-inbound: 4
* mozilla-central: 2
* mozilla-beta: 1

Platform breakdown:
* linux64: 9
* linux32: 8
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-29&endday=2017-11-29&tree=all

Comment 55

•

7 years ago

Ok, I have managed to track it down to the invocaction of Pytest and then it hangs so it looks like my initial gut of Pytest being the cause is likely. Now... to understand why...

We get to https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py#54-62 before we hang.


See log with my statements in https://treeherder.mozilla.org/logviewer.html#?job_id=148724638&repo=try&lineNumber=1251

Comment 56

•

7 years ago

Not knowing the best way to get through all of this I don't really know what are the next steps.

I definitely know that _something_ in pytest is not returning. We pass in a file and other arguments, see comment above for link to code, and then wait for pytest. We then timeout, somehow, and realise its a timeout but mozprocess comes a long and kills us for taking too long.

Dave, if you could give me pointers on where to start looking in pytest, I would be grateful. We are using the vendored version in third_party/python/pytest in mozilla central.

James, should we, when hitting EXTERNAL-TIMEOUT do `result = Stop` in wptrunner as there seems to be an error we can't recover from?

Flags: needinfo?(james)

Flags: needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 57

•

7 years ago

pytest in the following folder is kinda old:
https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/tools/pytest

Maybe it should be upgraded to a newer release?

Comment 58

•

7 years ago

It looks to me like pytest isn't even starting, as we'd expect to see some output in the console. You could try the the pytest-instafail plugin (https://pypi.python.org/pypi/pytest-instafail/), which shows errors/failures immediately rather than wait until the end of the suite. Alternatively, pytest-faulthandler (https://pypi.python.org/pypi/pytest-faulthandler) might shed some light on what's happening. You could also try adding some debug in WdspecRun.run(), where the thread managing the run is created and ultimately timed out.

I would also suggest trying a later pytest release, as 3.x has been available for more than a year.

Flags: needinfo?(dave.hunt)

Comment hidden (Intermittent Failures Robot)

32 failures in 154 pushes (0.208 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 14
* mozilla-inbound: 9
* mozilla-central: 6
* try: 1
* mozilla-release: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 18
* linux32: 11
* linux64-jsdcov: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-30&endday=2017-11-30&tree=all

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/204614/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/204614/

Comment hidden (mozreview-request)

Comment on attachment 8933670 [details]
Bug 1407383: Update Vendored libraries used for wptrunner

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/204614/diff/1-2/

Comment hidden (Intermittent Failures Robot)

22 failures in 105 pushes (0.21 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 16
* mozilla-inbound: 5
* mozilla-beta: 1

Platform breakdown:
* linux64: 12
* linux32: 10

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-01&endday=2017-12-01&tree=all

Comment hidden (Intermittent Failures Robot)

161 failures in 792 pushes (0.203 failures/push) were associated with this bug in the last 7 days. 

This is the #1 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 83
* mozilla-inbound: 44
* mozilla-central: 20
* mozilla-beta: 9
* try: 3
* mozilla-release: 2

Platform breakdown:
* linux64: 82
* linux32: 66
* linux64-nightly: 5
* linux64-jsdcov: 5
* linux64-devedition: 2
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-11-27&endday=2017-12-03&tree=all

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8933670 [details]
Bug 1407383: Update Vendored libraries used for wptrunner

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/204614/diff/2-3/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/205100/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/205100/

Comment hidden (Intermittent Failures Robot)

29 failures in 139 pushes (0.209 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 18
* mozilla-inbound: 4
* mozilla-central: 3
* try: 2
* mozilla-beta: 2

Platform breakdown:
* linux32: 14
* linux64: 12
* linux64-devedition: 2
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-04&endday=2017-12-04&tree=all

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/6-7/

Comment hidden (mozreview-request)

Comment on attachment 8934146 [details]
More debugging and changing nodeid for comparisons

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/205100/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8933670 [details]
Bug 1407383: Update Vendored libraries used for wptrunner

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/204614/diff/3-4/

Comment 71

•

7 years ago

looks like updating pytest solves this.

Flags: needinfo?(james)

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/7-8/

Updated

•

7 years ago

Attachment #8934146 - Attachment is obsolete: true

Updated

•

7 years ago

Attachment #8933670 - Attachment is obsolete: true

Comment hidden (Intermittent Failures Robot)

32 failures in 178 pushes (0.18 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 19
* mozilla-inbound: 10
* mozilla-central: 3

Platform breakdown:
* linux32: 18
* linux64: 14

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-05&endday=2017-12-05&tree=all

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/8-9/

Comment hidden (Intermittent Failures Robot)

15 failures in 157 pushes (0.096 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 8
* autoland: 6
* mozilla-central: 1

Platform breakdown:
* linux32: 9
* linux64: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-06&endday=2017-12-06&tree=all

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/9-10/

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/10-11/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/206312/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/206312/

Comment hidden (mozreview-request)

Comment on attachment 8935420 [details]
Bug 1407383 - Update  metadata for WebDriver

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/206312/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8935420 [details]
Bug 1407383 - Update  metadata for WebDriver

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/206312/diff/2-3/

Comment 81

•

7 years ago

so from try runs the vendor update drops the chances of it happening but doesnt outright solve it

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/206624/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/206624/

Comment hidden (mozreview-request)

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/203478/diff/11-12/

Comment hidden (mozreview-request)

Comment on attachment 8935420 [details]
Bug 1407383 - Update  metadata for WebDriver

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/206312/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8935728 [details]
Bug 1407383: Wait for the URL to change in test before asserting

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/206624/diff/1-2/

Comment 86

•

7 years ago

(In reply to David Burns :automatedtester from comment #81)

> so from try runs the vendor update drops the chances of it
> happening but doesnt outright solve it

Do you have an indication of the improvement?  Do you think there
are more bugs in pytest downstream that causes it to occasionally
freeze up?

Comment 87

•

7 years ago

mozreview-review

Comment on attachment 8935728 [details]
Bug 1407383: Wait for the URL to change in test before asserting

https://reviewboard.mozilla.org/r/206624/#review212366

::: testing/web-platform/tests/webdriver/tests/navigation/current_url.py:82
(Diff revision 2)
>  
>  def test_get_current_url_after_modified_location(session):
> +    start = session.transport.send("GET", "session/%s/url" % session.session_id)
>      session.execute_script("window.location.href = 'about:blank#wd_test_modification'")
> -
> +    wait(session,
> +         lambda d: session.transport.send("GET", "session/%s/url" % session.session_id) != start.body["value"],

d is unused.

Attachment #8935728 - Flags: review?(ato) → review+

Comment 88

•

7 years ago

(In reply to Andreas Tolfsen ‹:ato› from comment #86)
> (In reply to David Burns :automatedtester from comment #81)
> 
> > so from try runs the vendor update drops the chances of it
> > happening but doesnt outright solve it
> 
> Do you have an indication of the improvement?  Do you think there
> are more bugs in pytest downstream that causes it to occasionally
> freeze up?

we go from ~15% intermittent rate to ~2.3%. I feel like it would be worth watching after this lands to know if its the next top prioritty issue

As for pytest, I am not sure. It could be due to the other mozprocess intermittents we have on file.

Comment hidden (mozreview-request)

Comment on attachment 8935728 [details]
Bug 1407383: Wait for the URL to change in test before asserting

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/206624/diff/2-3/

Comment hidden (Intermittent Failures Robot)

31 failures in 164 pushes (0.189 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 19
* mozilla-inbound: 9
* try: 1
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 19
* linux32: 11
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-08&endday=2017-12-08&tree=all

Comment hidden (Intermittent Failures Robot)

143 failures in 889 pushes (0.161 failures/push) were associated with this bug in the last 7 days. 

This is the #5 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 73
* mozilla-inbound: 46
* mozilla-central: 13
* try: 5
* mozilla-beta: 5
* mozilla-release: 1

Platform breakdown:
* linux32: 70
* linux64: 66
* linux64-devedition: 3
* linux64-nightly: 2
* linux64-jsdcov: 1
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-04&endday=2017-12-10&tree=all

Comment 92

•

6 years ago

mozreview-review

Comment on attachment 8935420 [details]
Bug 1407383 - Update  metadata for WebDriver

https://reviewboard.mozilla.org/r/206312/#review212780

I'l l apply this during the next wpt udpate.

Attachment #8935420 - Flags: review?(james) → review+

Comment 93

•

6 years ago

mozreview-review

Comment on attachment 8932433 [details]
Bug 1407383 - Update Vendored packages for Web-Platform-Tests

https://reviewboard.mozilla.org/r/203478/#review212782

I think it's easier to start a wpt update rather than pre-apply this.

Attachment #8932433 - Flags: review?(james)

Comment hidden (mozreview-request)

Comment on attachment 8935728 [details]
Bug 1407383: Wait for the URL to change in test before asserting

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/206624/diff/3-4/

Updated

•

6 years ago

Attachment #8932433 - Attachment is obsolete: true

Updated

•

6 years ago

Attachment #8935420 - Attachment is obsolete: true

Pulsebot

Comment 95

•

6 years ago

Pushed by dburns@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/81d47aa03334
Wait for the URL to change in test before asserting r=ato

Dorel Luca [:dluca]

Comment 96

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/81d47aa03334

Comment hidden (Intermittent Failures Robot)

18 failures in 92 pushes (0.196 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 9
* mozilla-inbound: 5
* mozilla-central: 4

Platform breakdown:
* linux64: 8
* linux32: 8
* linux64-nightly: 1
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-13&endday=2017-12-13&tree=all

Comment hidden (Intermittent Failures Robot)

17 failures in 91 pushes (0.187 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 9
* mozilla-inbound: 6
* mozilla-central: 2

Platform breakdown:
* linux64: 10
* linux32: 6
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-14&endday=2017-12-14&tree=all

Comment hidden (Intermittent Failures Robot)

15 failures in 85 pushes (0.176 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 9
* mozilla-inbound: 3
* mozilla-central: 3

Platform breakdown:
* linux32: 10
* linux64: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-15&endday=2017-12-15&tree=all

Comment hidden (Intermittent Failures Robot)

77 failures in 423 pushes (0.182 failures/push) were associated with this bug in the last 7 days. 

This is the #6 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 37
* mozilla-inbound: 19
* mozilla-central: 16
* try: 3
* mozilla-beta: 2

Platform breakdown:
* linux64: 41
* linux32: 30
* linux64-jsdcov: 3
* linux64-nightly: 1
* linux64-devedition: 1
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-11&endday=2017-12-17&tree=all

Comment 101

•

6 years ago

I can’t say I see any improvement from the OrangeFactor bot.

Comment hidden (Intermittent Failures Robot)

16 failures in 116 pushes (0.138 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 11
* mozilla-inbound: 5

Platform breakdown:
* linux32: 9
* linux64: 7

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-19&endday=2017-12-19&tree=all

Comment hidden (Intermittent Failures Robot)

15 failures in 108 pushes (0.139 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 6
* autoland: 6
* mozilla-central: 3

Platform breakdown:
* linux64: 7
* linux32: 6
* linux64-jsdcov: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-20&endday=2017-12-20&tree=all

Comment hidden (Intermittent Failures Robot)

38 failures in 170 pushes (0.224 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 15
* mozilla-inbound: 14
* mozilla-central: 7
* try: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 16
* linux32: 15
* linux64-jsdcov: 6
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-21&endday=2017-12-21&tree=all

Comment hidden (Intermittent Failures Robot)

103 failures in 590 pushes (0.175 failures/push) were associated with this bug in the last 7 days. 

This is the #4 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 41
* mozilla-inbound: 36
* mozilla-central: 20
* try: 3
* mozilla-beta: 3

Platform breakdown:
* linux64: 49
* linux32: 38
* linux64-jsdcov: 10
* linux64-ccov: 3
* linux64-nightly: 2
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-18&endday=2017-12-24&tree=all

Eliza Balazs [:ebalazs_]

Comment 106

•

6 years ago

There are 71 failures in the last 7 days.
They occur on Linux32, Linux x64 (ccov, jsdcov, nightly) nd the affected builds type are debug, opt, pgo.

Log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=153448184&lineNumber=5348

[task 2017-12-27T20:18:52.540Z] 20:18:52     INFO - TEST-START | toolkit/mozapps/extensions/test/xpinstall/browser_navigateaway3.js
5340
[task 2017-12-27T20:18:52.622Z] 20:18:52     INFO - GECKO(4930) | Sending InstallTriggered
5341
[task 2017-12-27T20:18:52.666Z] 20:18:52     INFO - GECKO(4930) | 1514405932656	addons.xpi	DEBUG	Download started for http://example.com/browser/toolkit/mozapps/extensions/test/xpinstall/amosigned.xpi to file /tmp/tmp-bvk.xpi
5342
[task 2017-12-27T20:18:52.666Z] 20:18:52     INFO - GECKO(4930) | 1514405932662	addons.xpi	DEBUG	Download of http://example.com/browser/toolkit/mozapps/extensions/test/xpinstall/amosigned.xpi completed.
5343
[task 2017-12-27T20:18:52.708Z] 20:18:52     INFO - GECKO(4930) | 1514405932704	addons.repository	DEBUG	cacheAddons: enabled false IDs ["unsigned-xpi@tests.mozilla.org"]
5344
[task 2017-12-27T20:18:52.733Z] 20:18:52     INFO - GECKO(4930) | Sending PageLoaded
5345
[task 2017-12-27T20:26:21.477Z] 20:26:21     INFO - GECKO(4930) | 1514406381473	addons.manager	INFO	Skipping background update check
5346
[task 2017-12-27T20:28:21.479Z] 20:28:21     INFO - GECKO(4930) | *** Blocklist::notify: Requesting http://127.0.0.1:8888/extensions-dummy/blocklistURL
5347
[task 2017-12-27T20:28:21.495Z] 20:28:21     INFO - GECKO(4930) | *** Blocklist::onXMLLoad: there was an error during load
5348
[task 2017-12-27T20:45:01.531Z] 20:45:01     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/mochitest/runtests.py', '--disable-e10s', '--total-chunks', '7', '--this-chunk', '3', '--jscov-dir-prefix=/builds/worker/workspace/build/blobber_upload_dir', '--appname=/builds/worker/workspace/build/application/firefox/firefox', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://queue.taskcluster.net/v1/task/a7vaVafyQsW-qx46KzZi2w/artifacts/public/build/target.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--setpref=webgl.force-enabled=true', '--quiet', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/browser-chrome-coverage_raw.log', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/browser-chrome-coverage_errorsummary.log', '--use-test-media-devices', '--screenshot-on-fail', '--cleanup-crashes', '--marionette-startup-timeout=180', '--sandbox-read-whitelist=/builds/worker/workspace/build', '--log-raw=-', '--flavor=browser', '--chunk-by-runtime', '--timeout=1200']
5349
[task 2017-12-27T20:45:01.556Z] 20:45:01    ERROR - timed out after 1000 seconds of no output
5350
[task 2017-12-27T20:45:01.556Z] 20:45:01    ERROR - Return code: -15
5351
[task 2017-12-27T20:45:01.557Z] 20:45:01    ERROR - No suite end message was emitted by this harness.
5352
[task 2017-12-27T20:45:01.557Z] 20:45:01     INFO - TinderboxPrint: mochitest-browser-chrome-coverage<br/>213/0/0
5353
[task 2017-12-27T20:45:01.557Z] 20:45:01    ERROR - # TBPL FAILURE #
5354
[task 2017-12-27T20:45:01.558Z] 20:45:01  WARNING - setting return code to 2
5355
[task 2017-12-27T20:45:01.558Z] 20:45:01    ERROR - The mochitest suite: browser-chrome-coverage ran with return status: FAILURE
5356
[task 2017-12-27T20:45:01.559Z] 20:45:01     INFO - Running post-action listener: _package_coverage_data
5357
[task 2017-12-27T20:45:01.559Z] 20:45:01     INFO - Beginning compression of JSDCov artifacts...
5358
[task 2017-12-27T20:45:01.559Z] 20:45:01     INFO - Running command: ['zip', '-r', '/builds/worker/workspace/build/blobber_upload_dir/jsdcov_artifacts.zip', '.', '-i', 'jscov*.json'] in /builds/worker/workspace/build/blobber_upload_dir
5359
[task 2017-12-27T20:45:01.559Z] 20:45:01     INFO - Copy/paste: zip -r /builds/worker/workspace/build/blobber_upload_dir/jsdcov_artifacts.zip . -i jscov*.json

Flags: needinfo?(ato)

Comment hidden (Intermittent Failures Robot)

31 failures in 147 pushes (0.211 failures/push) were associated with this bug in the last 7 days. 

This is the #5 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* mozilla-central: 14
* autoland: 9
* mozilla-inbound: 7
* mozilla-beta: 1

Platform breakdown:
* linux64: 15
* linux32: 11
* linux64-nightly: 2
* linux64-jsdcov: 2
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2017-12-25&endday=2017-12-31&tree=all

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 108

•

6 years ago

It doesn’t look like the pytest/py vendor upgrade made any
difference, but I don’t have any better ideas what this could be
and I don’t currently have time allocated to dig in.

Flags: needinfo?(ato)

Comment 109

•

6 years ago

The latest version of pytest captures logging, and includes it in the output when a test fails. It's possible that this could reveal more information when the process is appearing to hang. To enable live logging you require pytest 3.3.1 or later, and the command line options: -s and --log-cli-level with the appropriate log level. See https://docs.pytest.org/en/3.3.1/logging.html for more information.

Comment 110

•

6 years ago

:automatedtester, can you find someone to look into this high frequency failure that we have had for 3+ months?

Flags: needinfo?(dburns)

Comment 111

•

6 years ago

(In reply to Andreas Tolfsen ‹:ato› from comment #108)
> It doesn’t look like the pytest/py vendor upgrade made any
> difference, but I don’t have any better ideas what this could be
> and I don’t currently have time allocated to dig in.

the changes I have done for vendored code has not been downstreamed from the w3c. I will get James to schedule a sync

Flags: needinfo?(dburns)

Comment hidden (Intermittent Failures Robot)

17 failures in 111 pushes (0.153 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 6
* mozilla-central: 5
* try: 3
* mozilla-inbound: 2
* mozilla-release: 1

Platform breakdown:
* linux64: 8
* linux32: 7
* linux64-nightly: 1
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-03&endday=2018-01-03&tree=all

Comment hidden (Intermittent Failures Robot)

22 failures in 130 pushes (0.169 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 11
* mozilla-inbound: 4
* mozilla-central: 4
* try: 3

Platform breakdown:
* linux32: 11
* linux64: 8
* linux64-devedition: 2
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-04&endday=2018-01-04&tree=all

Comment 114

•

6 years ago

I still see failures here, I assume a sync took place sometime in the last day or so, :automatedtester, can you follow up here?

Flags: needinfo?(dburns)

Assignee

Comment 115

•

6 years ago

No, AFAIK the sync has caused a lot of failures to appear, so it might not have been done yet. At least I don't see an updated pytest: https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/tools/pytest

James, can you supply details regarding the sync?

Flags: needinfo?(dburns) → needinfo?(james)

Comment hidden (Intermittent Failures Robot)

26 failures in 119 pushes (0.218 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 12
* autoland: 9
* mozilla-central: 4
* try: 1

Platform breakdown:
* linux64: 16
* linux32: 7
* linux64-jsdcov: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-05&endday=2018-01-05&tree=all

Comment hidden (Intermittent Failures Robot)

15 failures in 46 pushes (0.326 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 7
* try: 4
* mozilla-central: 3
* mozilla-inbound: 1

Platform breakdown:
* linux64: 6
* linux32: 4
* linux64-jsdcov: 2
* linux64-ccov: 2
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-06&endday=2018-01-06&tree=all

Comment hidden (Intermittent Failures Robot)

99 failures in 462 pushes (0.214 failures/push) were associated with this bug in the last 7 days. 

This is the #4 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 34
* mozilla-central: 30
* mozilla-inbound: 21
* try: 13
* mozilla-release: 1

Platform breakdown:
* linux64: 45
* linux32: 32
* linux64-jsdcov: 12
* linux64-ccov: 5
* linux64-nightly: 3
* linux64-devedition: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-01&endday=2018-01-07&tree=all

Updated

•

6 years ago

Component: geckodriver → web-platform-tests

Comment hidden (Intermittent Failures Robot)

21 failures in 111 pushes (0.189 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 10
* mozilla-inbound: 6
* mozilla-central: 5

Platform breakdown:
* linux64: 11
* linux32: 6
* linux64-jsdcov: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-08&endday=2018-01-08&tree=all

Comment 120

•

6 years ago

:jgraham, this still shows a lot of failures- can you help prioritize this?

Comment 121

•

6 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #120)
> :jgraham, this still shows a lot of failures- can you help prioritize this?

the wpt sync should finish today or first thing tomorrow.

Flags: needinfo?(james)

Comment 122

•

6 years ago

It looks like this isn't fixed by the sync: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=83bfdce958e7de595edd901f1c60686127ca22e9

I think someone needs to get a loaner, run the test in a loop until it fails and then use gdb to figure out what's going on.

Comment 123

•

6 years ago

(In reply to James Graham [:jgraham] from comment #122)
> It looks like this isn't fixed by the sync:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&revision=83bfdce958e7de595edd901f1c60686127ca22e9
> 
> I think someone needs to get a loaner, run the test in a loop until it fails
> and then use gdb to figure out what's going on.

I can't see this error on that push

My original work saw it move from >15% down to 2%(comment 88). I would like to see the numbers over the next few days and then see what our next steps are.

Assignee

Comment 124

•

6 years ago

Please note that the affected test now seems to be key_shortcuts.py and not key.py. At least in those two failure cases on inbound.

Comment hidden (Intermittent Failures Robot)

18 failures in 149 pushes (0.121 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 9
* mozilla-inbound: 4
* try: 3
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 11
* linux32: 4
* linux64-nightly: 2
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-09&endday=2018-01-09&tree=all

Comment 126

•

6 years ago

I can reproduce a hang on TC, but not on my machine (so far) by running something like:

mach wpt --test-type wdspec  --repeat 1000

Unfortunately I can't figure out how to debug it on TC:

* gdb -p (to attach to the existing processes) seems to be disabled on taskcluster since it would allow interacting with processes outside the container
* gdb python --args <command line above> doesen't provide me with a way to look at the child processes (afaik)
* rr doesn't work on AWS

I'm pretty sure there's some kind of deadlock here. It could be something to do with the log output, but that's only a guess. Debugging this blind is hard, so any suggestions would be appreciated.

Assignee

Comment 127

•

6 years ago

Similar to David I see this on and off on MacOS. gdb shouldn't be possible to use anymore. So any instructions in how to best debug this with lldb? Just in case I see it again.

Comment hidden (Intermittent Failures Robot)

44 failures in 152 pushes (0.289 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 17
* autoland: 16
* mozilla-central: 7
* try: 4

Platform breakdown:
* linux64: 25
* linux32: 12
* linux64-jsdcov: 5
* linux64-nightly: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-10&endday=2018-01-10&tree=all

Comment 129

•

6 years ago

I ran these tests all night on an OSX debug build and didn't see a hang, so if you can reproduce that's very interesting.

In after starting lldb it looks like you do something like

process attach --pid <pid>

I don't know if there's lldb extensions for python like the gdb ones that allow py-bt, but in any case backtraces from each running python processs seem interesting.

Assignee

Comment 130

•

6 years ago

So I hit this today myself again, and I tried to debug it via lldb. But the stack looks pretty useless because it is not symbolicated. I also tried with pyrasite to connect to the hanging processes but it was also not working.

At least the last spawned process looks a bit different for the first thread's last frames:

    + 2646 connection_recv_obj  (in _multiprocessing.so) + 90  [0x10e52545a]
    +   2646 conn_recv_string  (in _multiprocessing.so) + 68  [0x10e5259c5]
    +     2646 _conn_recvall  (in _multiprocessing.so) + 49  [0x10e525a9c]
    +       2646 read  (in libsystem_kernel.dylib) + 10  

So it seems to hang in communicating with the other processes?

What I can say that it stops in between those two lines. Means when the issue occurs pytest does not print it's info line:

> 0:04.67 TEST_START: /webdriver/tests/actions/key_shortcuts.py
> ============================================================== test session starts ==============================================================

Assignee

Comment 131

•

6 years ago

(In reply to James Graham [:jgraham] from comment #129)
> I ran these tests all night on an OSX debug build and didn't see a hang, so
> if you can reproduce that's very interesting.

Btw how did you run those? With `--repeat-until-unexpected`? If yes, that should not work because if we really hang that early, you should always execute the full command and run the test only once. And that until it hangs.

Comment hidden (Intermittent Failures Robot)

21 failures in 149 pushes (0.141 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 9
* mozilla-inbound: 5
* mozilla-central: 3
* try: 2
* mozilla-beta: 2

Platform breakdown:
* linux64: 11
* linux32: 5
* linux64-jsdcov: 2
* linux64-nightly: 1
* linux64-devedition: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-11&endday=2018-01-11&tree=all

Comment hidden (Intermittent Failures Robot)

41 failures in 148 pushes (0.277 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 25
* mozilla-inbound: 8
* mozilla-central: 5
* try: 1
* mozilla-release: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 26
* linux32: 9
* linux64-nightly: 2
* linux64-jsdcov: 2
* linux64-stylo-disabled: 1
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-12&endday=2018-01-12&tree=all

Comment hidden (Intermittent Failures Robot)

165 failures in 788 pushes (0.209 failures/push) were associated with this bug in the last 7 days. 

This is the #3 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 78
* mozilla-inbound: 44
* mozilla-central: 26
* try: 12
* mozilla-beta: 4
* mozilla-release: 1

Platform breakdown:
* linux64: 93
* linux32: 44
* linux64-jsdcov: 13
* linux64-nightly: 8
* linux64-devedition: 2
* linux64-ccov: 2
* linux32-devedition: 2
* linux64-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-08&endday=2018-01-14&tree=all

Comment hidden (Intermittent Failures Robot)

26 failures in 143 pushes (0.182 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 14
* mozilla-inbound: 7
* mozilla-central: 4
* mozilla-beta: 1

Platform breakdown:
* linux64: 14
* linux32: 10
* linux64-jsdcov: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-17&endday=2018-01-17&tree=all

Comment 137

•

6 years ago

:jgraham this failure is showing up at a very high rate still, can we disable all tests that affect this on linux until this is resolved? do you have other ideas?

Flags: needinfo?(james)

Assignee

Comment 138

•

6 years ago

Please note that there are still a lot of falsely starred entries in the list of OF.

Also this is not only related to this specific test but it happens for the very first test as run in the job. So disabling just the first test doesn't help. We would have to completely turn of wdspec tests.

Comment hidden (Intermittent Failures Robot)

23 failures in 141 pushes (0.163 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 10
* mozilla-inbound: 8
* mozilla-central: 5

Platform breakdown:
* linux64: 12
* linux32: 6
* linux64-jsdcov: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-18&endday=2018-01-18&tree=all

Comment 140

•

6 years ago

(In reply to OrangeFactor Robot from comment #139)
> 23 failures in 141 pushes (0.163 failures/push) were associated with this
> bug yesterday.    
> 

so its 18 failures in 141 pushes(0.127 failures/push)

Comment 141

•

6 years ago

a 12%+ failure rate- should we hide this test or restrict it to a single config until this is fixed?

Flags: needinfo?(dburns)

Comment 142

•

6 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #141)
> a 12%+ failure rate- should we hide this test or restrict it to a single
> config until this is fixed?

As mentioned in comment 138 that would turn off all wdspec tests as its not related to a test. From a quick search of Bugzilla[1] I think there might be an underlying issue in mozprocess issue or there might be a resource issue from TC as it appears, anecdotally, that we are hitting a deadlock probably due to I/O

[1] https://bugzilla.mozilla.org/buglist.cgi?quicksearch=mozprocess%20timed%20out&list_id=13974869

Flags: needinfo?(dburns)

Comment 143

•

6 years ago

I asked if we should restrict this to a single config until this is fixed, for example linux64/debug only.  Who is working on the mozprocess issue?  Is there a bug we can track instead of this?

Flags: needinfo?(dburns)

Assignee

Comment 144

•

6 years ago

David, you said that the underlying issue might be in pytest. How safe is this assumption? I mean in relation to a possible issue with mozprocess, which both would be mutual exclusive.

Flags: needinfo?(james)

Comment 145

•

6 years ago

Sorry if I’m missing some context here, but what uses mozprocess here?

Comment 146

•

6 years ago

(In reply to Andreas Tolfsen ‹:ato› from comment #145)
> Sorry if I’m missing some context here, but what uses mozprocess here?

Mozprocess starts everything and makes sure that they don't just hang by killing jobs that don't output for 1000s.

Flags: needinfo?(dburns)

Comment 147

•

6 years ago

(In reply to Henrik Skupin (:whimboo) from comment #144)
> David, you said that the underlying issue might be in pytest. How safe is
> this assumption? I mean in relation to a possible issue with mozprocess,
> which both would be mutual exclusive.

I know that pytest is not returning. I am 100% confident in that. Why it's not returning I have no idea and I know James hasnt been able to debug either. 

James' theory, which makes the most sense, is there is an I/O deadlock somewhere. If that is the case then there might be a bigger issue which is why I searched for mozprocess intermittents today which shocked me

Comment 148

•

6 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #143)
> I asked if we should restrict this to a single config until this is fixed,
> for example linux64/debug only.  Who is working on the mozprocess issue?  Is
> there a bug we can track instead of this?

No one as far as I know, I thought that I would look it up today to see if there is a trend and clearly there is. I will raise one and put all the mozprocess bugs there as deps that we can then clear up as we know that its not something more sinister underlying

Comment 149

•

6 years ago

I have put some time into this e.g. I set up the docker image from Taskcluster locally and tried to reproduce. But so far it seems to be easy to reproduce in CI and hard to debug there, and hard to reproduce outside CI where it would be eas[y|ier] to debug. AFAIK only whimboo has reported seeing the hang outside the CI environment, and it's not 100% clear if that's the same issue.

I agree that the situation in CI is currently untenable. I don't really object to reducing the number of Wd jobs in the short term, but ultimately someone needs to dedicate some time to actually resolving the issue, and I haven't been able to do that recently.

Assignee

Comment 150

•

6 years ago

David could you please give me some links in where I would have to add some debug print lines to actually check if I see the identical problem with pytest? I don't know where to start, and your input could shorten the time drastically.

Flags: needinfo?(dburns)

Assignee

Comment 151

•

6 years ago

And one note... we actually use mozprocess to start the wpt tests via mozharness. So keep in mind that it has its own copy of mozprocess in place. To reproduce this output, you should use mozharness. Also this 1000s timeout means that no output has been given in that time, which underpins the comment from David about the deadlock above.

Yes, I was able to see the deadlock one and off locally, and if I know where to get started with debugging, I can jump into that.

Comment 152

•

6 years ago

(In reply to Henrik Skupin (:whimboo) from comment #151)
> And one note... we actually use mozprocess to start the wpt tests via
> mozharness. So keep in mind that it has its own copy of mozprocess in place.
> To reproduce this output, you should use mozharness. Also this 1000s timeout
> means that no output has been given in that time, which underpins the
> comment from David about the deadlock above.
> 
> Yes, I was able to see the deadlock one and off locally, and if I know where
> to get started with debugging, I can jump into that.

If its pytest as the cause then https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py#54-63 is what will hang.

Flags: needinfo?(dburns)

Comment hidden (Intermittent Failures Robot)

17 failures in 65 pushes (0.262 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 6
* autoland: 6
* try: 2
* mozilla-central: 2
* oak: 1

Platform breakdown:
* linux64: 14
* linux32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-19&endday=2018-01-19&tree=all

Comment hidden (Intermittent Failures Robot)

96 failures in 657 pushes (0.146 failures/push) were associated with this bug in the last 7 days. 

This is the #7 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 46
* mozilla-inbound: 26
* mozilla-central: 17
* try: 3
* mozilla-beta: 3
* oak: 1

Platform breakdown:
* linux64: 56
* linux32: 26
* linux64-jsdcov: 12
* linux64-nightly: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-15&endday=2018-01-21&tree=all

Comment hidden (Intermittent Failures Robot)

17 failures in 101 pushes (0.168 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 10
* mozilla-inbound: 5
* try: 2

Platform breakdown:
* linux64: 11
* linux32: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-22&endday=2018-01-22&tree=all

Comment 156

•

6 years ago

Fun discovery from the weekend: add enough logging and this stops happening.

Comment hidden (Intermittent Failures Robot)

34 failures in 148 pushes (0.23 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 16
* mozilla-inbound: 9
* mozilla-central: 7
* try: 1
* mozilla-release: 1

Platform breakdown:
* linux64: 20
* linux32: 10
* linux64-ccov: 2
* windows10-64: 1
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-23&endday=2018-01-23&tree=all

Comment hidden (Intermittent Failures Robot)

21 failures in 126 pushes (0.167 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 11
* mozilla-inbound: 7
* mozilla-beta: 2
* mozilla-central: 1

Platform breakdown:
* linux64: 14
* linux32: 5
* linux64-devedition: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-24&endday=2018-01-24&tree=all

Comment hidden (Intermittent Failures Robot)

18 failures in 129 pushes (0.14 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 9
* autoland: 6
* try: 1
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 11
* linux32: 5
* linux64-devedition: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-25&endday=2018-01-25&tree=all

Comment hidden (Intermittent Failures Robot)

24 failures in 119 pushes (0.202 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 17
* mozilla-inbound: 4
* mozilla-central: 3

Platform breakdown:
* linux32: 12
* linux64: 11
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-26&endday=2018-01-26&tree=all

Comment hidden (Intermittent Failures Robot)

130 failures in 701 pushes (0.185 failures/push) were associated with this bug in the last 7 days. 

This is the #3 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 69
* mozilla-inbound: 37
* mozilla-central: 13
* try: 7
* mozilla-beta: 3
* mozilla-release: 1

Platform breakdown:
* linux64: 78
* linux32: 42
* linux64-devedition: 3
* linux64-ccov: 2
* windows10-64: 1
* linux64-nightly: 1
* linux64-jsdcov: 1
* linux32-nightly: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-22&endday=2018-01-28&tree=all

Assignee

Comment 162

•

6 years ago

Using a specific test in the summary is misleading as mentioned a couple of times earlier here. So we should really get rid of it. Instead lets prefix it with the test job symbol for better starring.

Summary: Intermittent /webdriver/tests/actions/key.py Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', → Intermittent Wd failure: Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-',

Assignee

Updated

•

6 years ago

Summary: Intermittent Wd failure: Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', → Intermittent Wd Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-',

Updated

•

6 years ago

Blocks: 1433861

Comment hidden (Intermittent Failures Robot)

16 failures in 111 pushes (0.144 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 7
* mozilla-inbound: 6
* try: 1
* mozilla-release: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 9
* linux32: 7

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-29&endday=2018-01-29&tree=all

Comment hidden (Intermittent Failures Robot)

16 failures in 148 pushes (0.108 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 8
* mozilla-central: 4
* autoland: 4

Platform breakdown:
* linux64: 8
* linux32: 7
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-30&endday=2018-01-30&tree=all

Comment hidden (Intermittent Failures Robot)

23 failures in 129 pushes (0.178 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 10
* mozilla-inbound: 7
* mozilla-central: 3
* try: 2
* mozilla-beta: 1

Platform breakdown:
* linux64: 13
* linux32: 9
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-01&endday=2018-02-01&tree=all

Comment hidden (Intermittent Failures Robot)

94 failures in 735 pushes (0.128 failures/push) were associated with this bug in the last 7 days. 

This is the #3 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 34
* mozilla-inbound: 32
* mozilla-central: 15
* mozilla-beta: 7
* try: 5
* mozilla-release: 1

Platform breakdown:
* linux64: 48
* linux32: 37
* linux64-nightly: 5
* linux64-jsdcov: 2
* linux64-ccov: 1
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-01-29&endday=2018-02-04&tree=all

Arthur Iakab [arthur_iakab]

Assignee

Updated

•

6 years ago

Blocks: 1429391

Noemi Erli[:noemi_erli]

Comment 167

•

6 years ago

There are 87 failures in the past 7 days.
Platforms: Linux opt/ debug, Linux x64 opt/ debug/ pgo, linux32-nightly opt, linux32-devedition opt, linux64-ccov opt, 	linux64-jsdcov opt, linux64-nightly opt.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=160524801

Relevant part of the log:
[task 2018-02-06T01:40:01.065Z] 01:40:01     INFO - SUITE-START | Running 54 tests
[task 2018-02-06T01:40:01.065Z] 01:40:01     INFO - Running wdspec tests
[task 2018-02-06T01:40:01.065Z] 01:40:01     INFO - TEST-START | /webdriver/tests/actions/key.py
[task 2018-02-06T01:40:01.065Z] 01:40:01     INFO - TEST-SKIP | /webdriver/tests/actions/key.py | took 0ms
[task 2018-02-06T01:40:01.066Z] 01:40:01     INFO - TEST-START | /webdriver/tests/contexts/maximize_window.py
[task 2018-02-06T01:40:01.066Z] 01:40:01     INFO - TEST-SKIP | /webdriver/tests/contexts/maximize_window.py | took 0ms
[task 2018-02-06T01:40:01.066Z] 01:40:01     INFO - Starting http server on web-platform.test:8001
[task 2018-02-06T01:40:01.070Z] 01:40:01     INFO - Starting runner
[task 2018-02-06T01:40:01.070Z] 01:40:01     INFO - Starting http server on web-platform.test:8443
[task 2018-02-06T01:40:01.097Z] 01:40:01     INFO - PID 840 | 1517881201093	geckodriver	INFO	geckodriver 0.19.1 ( 'e6138707451a'  '2018-02-05 20:17 -0500')
[task 2018-02-06T01:40:01.101Z] 01:40:01     INFO - PID 840 | 1517881201096	geckodriver	INFO	Listening on 127.0.0.1:4444
[task 2018-02-06T01:40:01.602Z] 01:40:01     INFO - WebDriver HTTP server listening at http://127.0.0.1:4444/
[task 2018-02-06T01:40:01.602Z] 01:40:01     INFO - TEST-START | /webdriver/tests/actions/key_shortcuts.py
[task 2018-02-06T01:56:41.662Z] 01:56:41     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=https://queue.taskcluster.net/v1/task/XXPgFAK3QL-xbZQq71hzZA/artifacts/public/build/target.crashreporter-symbols.zip', '--stackwalk-binary=/usr/local/bin/linux64-minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--run-by-dir=3', '--no-pause-after-test', '--test-type=wdspec', '--stylo-threads=4', '--webdriver-binary=/builds/worker/workspace/build/tests/bin/geckodriver', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--processes=1', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil']
[task 2018-02-06T01:56:41.663Z] 01:56:41    ERROR - timed out after 1000 seconds of no output
[task 2018-02-06T01:56:41.663Z] 01:56:41    ERROR - Return code: -15
[task 2018-02-06T01:56:41.664Z] 01:56:41    ERROR - No suite end message was emitted by this harness.
[task 2018-02-06T01:56:41.664Z] 01:56:41    ERROR - # TBPL FAILURE #
[task 2018-02-06T01:56:41.665Z] 01:56:41  WARNING - setting return code to 2
[task 2018-02-06T01:56:41.665Z] 01:56:41     INFO - Running post-action listener: _package_coverage_data
[task 2018-02-06T01:56:41.665Z] 01:56:41     INFO - Running post-action listener: _resource_record_post_action
[task 2018-02-06T01:56:41.665Z] 01:56:41     INFO - [mozharness: 2018-02-06 01:56:41.665687Z] Finished run-tests step (success)
[task 2018-02-06T01:56:41.666Z] 01:56:41     INFO - Running post-run listener: _resource_record_post_run
[task 2018-02-06T01:56:41.738Z] 01:56:41     INFO - Total resource usage - Wall time: 1021s; CPU: 1.0%; Read bytes: 442368; Write bytes: 444174336; Read time: 24; Write time: 24936
[task 2018-02-06T01:56:41.738Z] 01:56:41     INFO - TinderboxPrint: CPU usage<br/>0.6%
[task 2018-02-06T01:56:41.738Z] 01:56:41     INFO - TinderboxPrint: I/O read bytes / time<br/>442,368 / 24
[task 2018-02-06T01:56:41.739Z] 01:56:41     INFO - TinderboxPrint: I/O write bytes / time<br/>444,174,336 / 24,936
[task 2018-02-06T01:56:41.739Z] 01:56:41     INFO - TinderboxPrint: CPU idle<br/>4,038.5 (99.4%)
[task 2018-02-06T01:56:41.739Z] 01:56:41     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2018-02-06T01:56:41.740Z] 01:56:41     INFO - pull - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2018-02-06T01:56:41.741Z] 01:56:41     INFO - install - Wall time: 17s; CPU: 25.0%; Read bytes: 0; Write bytes: 9338880; Read time: 0; Write time: 484
[task 2018-02-06T01:56:41.751Z] 01:56:41     INFO - run-tests - Wall time: 1004s; CPU: 0.0%; Read bytes: 204800; Write bytes: 434835456; Read time: 16; Write time: 24452
[task 2018-02-06T01:56:42.030Z] 01:56:42     INFO - Running post-run listener: _upload_blobber_files
[task 2018-02-06T01:56:42.030Z] 01:56:42  WARNING - Blob upload gear skipped. Missing cmdline options.
[task 2018-02-06T01:56:42.030Z] 01:56:42     INFO - Running post-run listener: copy_logs_to_upload_dir
[task 2018-02-06T01:56:42.031Z] 01:56:42     INFO - Copying logs to upload dir...
[task 2018-02-06T01:56:42.031Z] 01:56:42     INFO - mkdir: /builds/worker/workspace/build/upload/logs
[task 2018-02-06T01:56:42.032Z] 01:56:42     INFO - Copying logs to upload dir...
[task 2018-02-06T01:56:42.034Z] 01:56:42  WARNING - returning nonzero exit status 2

Comment hidden (Intermittent Failures Robot)

17 failures in 125 pushes (0.136 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 9
* mozilla-inbound: 4
* mozilla-central: 2
* mozilla-beta: 2

Platform breakdown:
* linux64: 10
* linux32: 5
* linux64-stylo-disabled: 1
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-08&endday=2018-02-08&tree=all

Comment hidden (Intermittent Failures Robot)

19 failures in 141 pushes (0.135 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 8
* mozilla-central: 5
* mozilla-inbound: 4
* mozilla-beta: 2

Platform breakdown:
* linux64: 9
* linux32: 7
* linux64-nightly: 2
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-09&endday=2018-02-09&tree=all

Comment hidden (Intermittent Failures Robot)

81 failures in 702 pushes (0.115 failures/push) were associated with this bug in the last 7 days. 

This is the #10 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 36
* mozilla-inbound: 24
* mozilla-central: 11
* mozilla-beta: 9
* try: 1

Platform breakdown:
* linux64: 40
* linux32: 32
* linux64-nightly: 4
* linux32-nightly: 2
* linux64-stylo-disabled: 1
* linux64-devedition: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-05&endday=2018-02-11&tree=all

Comment hidden (Intermittent Failures Robot)

17 failures in 120 pushes (0.142 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 9
* autoland: 6
* mozilla-central: 2

Platform breakdown:
* linux64: 8
* linux32: 7
* linux64-ccov: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-12&endday=2018-02-12&tree=all

Comment hidden (Intermittent Failures Robot)

23 failures in 89 pushes (0.258 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 16
* mozilla-beta: 3
* mozilla-inbound: 2
* try: 1
* mozilla-central: 1

Platform breakdown:
* linux64: 10
* linux32: 10
* linux64-nightly: 2
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-13&endday=2018-02-13&tree=all

Comment 173

•

6 years ago

This bug has failed 85 times in the last 7 days. It failed only on Linux affecting debug, pgo and opt build types.

Failing tests: debug-web-platform-tests-wdspec-e10, opt-web-platform-tests-wdspec-e10, opt-web-platform-tests-wdspec-e10, opt-test-verify-wpt-e10.

Link to a recent log:https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=162310358&lineNumber=1222

Part of that log:  01:30:07     INFO - TEST-START | /webdriver/tests/actions/key_shortcuts.py
[task 2018-02-15T01:46:47.542Z] 01:46:47     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=https://queue.taskcluster.net/v1/task/YmxWjjgmTZy5PdEjMJy1FQ/artifacts/public/build/target.crashreporter-symbols.zip', '--stackwalk-binary=/usr/local/bin/linux64-minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--run-by-dir=3', '--no-pause-after-test', '--test-type=wdspec', '--stylo-threads=4', '--webdriver-binary=/builds/worker/workspace/build/tests/bin/geckodriver', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--processes=1', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil']
[task 2018-02-15T01:46:47.559Z] 01:46:47    ERROR - timed out after 1000 seconds of no output
[task 2018-02-15T01:46:47.560Z] 01:46:47    ERROR - Return code: -15
[task 2018-02-15T01:46:47.560Z] 01:46:47    ERROR - No suite end message was emitted by this harness.
[task 2018-02-15T01:46:47.561Z] 01:46:47    ERROR - # TBPL FAILURE #
[task 2018-02-15T01:46:47.561Z] 01:46:47  WARNING - setting return code to 2
[task 2018-02-15T01:46:47.562Z] 01:46:47     INFO - Running post-action listener: _package_coverage_data
[task 2018-02-15T01:46:47.562Z] 01:46:47     INFO - Running post-action listener: _resource_record_post_action
[task 2018-02-15T01:46:47.562Z] 01:46:47     INFO - [mozharness: 2018-02-15 01:46:47.561867Z] Finished run-tests step (success)
[task 2018-02-15T01:46:47.562Z] 01:46:47     INFO - Running post-run listener: _resource_record_post_run
[task 2018-02-15T01:46:47.646Z] 01:46:47     INFO - Total resource usage - Wall time: 1022s; CPU: 1.0%; Read bytes: 241664; Write bytes: 524836864; Read time: 4; Write time: 93468

The WPT key tests are disabled due to some problem in wptrunner.

new file mode 100644
--- /dev/null
+++ b/testing/web-platform/meta/webdriver/tests/actions/key.py.ini
@@ -0,0 +1,4 @@
+[key.py]
+  type: wdspec
+  disabled:
+    if (os == "linux"): https://bugzilla.mozilla.org/show_bug.cgi?id=1407383

 Bug 1421289 needs to be fixed in order to make progress on the actual bug.

Assignee

Comment 174

•

6 years ago

I really propose that we re-enable actions/key.py given that the test module itself is not the problem here. We shouldn't see an increase of failures. The test was just inappropriately disabled a long time ago.

Comment 175

•

6 years ago

(In reply to Henrik Skupin (:whimboo) from comment #174)
> I really propose that we re-enable actions/key.py given that the test module
> itself is not the problem here. We shouldn't see an increase of failures.
> The test was just inappropriately disabled a long time ago.

Sounds reasonable to me.

Comment 176

•

6 years ago

will we re-enable that in this bug or somewhere else?

Assignee

Comment 177

•

6 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #176)
> will we re-enable that in this bug or somewhere else?

You skipped the test without a patch attached to this bug, so I assume we can do the same to unskip?

Flags: needinfo?(jmaher)

Assignee

Updated

•

6 years ago

status-firefox59: --- → disabled

status-firefox60: --- → disabled

Comment 178

•

6 years ago

can you link to the bug where we will enable it?

Flags: needinfo?(jmaher)

Assignee

Comment 179

•

6 years ago

We should do it here. See your comment 12.

Flags: needinfo?(jmaher)

Comment 180

•

6 years ago

for disabling tests we often just disable the test, not add a patch; if you want to revert that patch, go ahead!  if there are other factors other than removing the .ini file, it probably is worth a review.

Flags: needinfo?(jmaher)

Comment hidden (Intermittent Failures Robot)

25 failures in 149 pushes (0.168 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 12
* mozilla-inbound: 9
* try: 3
* mozilla-central: 1

Platform breakdown:
* linux64: 16
* linux32: 9

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-15&endday=2018-02-15&tree=all

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/220862/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/220862/

Tiberius Oros[:tiberius_oros]

Assignee

Comment 183

•

6 years ago

Ok, we cannot re-enable key.py yet due to perma timeout failures. There are no restarts of Firefox in-between each sub-test but the very first start of Firefox seems to take extra-ordinary long:

[task 2018-02-16T12:24:42.511Z] 12:24:42     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/key.py::test_lone_keyup_sends_no_events 
[task 2018-02-16T12:25:09.626Z] 12:25:09     INFO - PID 843 | GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
[task 2018-02-16T12:25:10.320Z] 12:25:10     INFO - PID 843 | 1518783910314	Marionette	INFO	Listening on port 2828
[task 2018-02-16T12:25:11.003Z] 12:25:11     INFO - STDOUT: PASSED

Here exactly 28 seconds until Firefox was ready. It is actually a different problem.

Comment hidden (Intermittent Failures Robot)

18 failures in 117 pushes (0.154 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 9
* autoland: 7
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 12
* linux32: 4
* linux64-ccov: 1
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-16&endday=2018-02-16&tree=all

Comment hidden (Intermittent Failures Robot)

96 failures in 685 pushes (0.14 failures/push) were associated with this bug in the last 7 days. 

This is the #6 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 43
* mozilla-inbound: 35
* mozilla-central: 8
* try: 5
* mozilla-beta: 5

Platform breakdown:
* linux64: 51
* linux32: 37
* linux64-nightly: 3
* linux64-ccov: 2
* linux32-nightly: 2
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-12&endday=2018-02-18&tree=all

Comment hidden (Intermittent Failures Robot)

22 failures in 124 pushes (0.177 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 10
* mozilla-inbound: 6
* mozilla-central: 5
* mozilla-beta: 1

Platform breakdown:
* linux64: 13
* linux32: 7
* linux64-jsdcov: 1
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-21&endday=2018-02-21&tree=all

Comment hidden (Intermittent Failures Robot)

24 failures in 135 pushes (0.178 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 10
* mozilla-inbound: 7
* mozilla-central: 5
* try: 2

Platform breakdown:
* linux64: 11
* linux32: 11
* linux64-ccov: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-23&endday=2018-02-23&tree=all

Comment 188

•

6 years ago

This bug
- Depends on:	1421289, 1418227
- Blocks:	1429391, 1433861

I can see there is a lot work on this bug, as an update I want to add the following:

There are 81 failures in the last 7 days and 242 failures in the last 21 days.
- Linux x64: 50
- Linux: 23
- linux32-nightly: 2
- linux64-ccov: 3
- linux64-jsdcov: 1
- linux64-nightly: 1
- OS X 10.10: 1

- opt: 41
- debug: 27
- pgo: 13

Here is a recent log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=164092702&lineNumber=2952

[task 2018-02-24T05:37:25.838Z] 05:37:25     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=https://queue.taskcluster.net/v1/task/Eq0ruhAHRN6GlG2-xuqcqQ/artifacts/public/build/target.crashreporter-symbols.zip', '--stackwalk-binary=/usr/local/bin/linux64-minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--run-by-dir=3', '--no-pause-after-test', '--test-type=wdspec', '--stylo-threads=4', '--webdriver-binary=/builds/worker/workspace/build/tests/bin/geckodriver', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--processes=1', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil']
[task 2018-02-24T05:37:25.846Z] 05:37:25    ERROR - timed out after 1000 seconds of no output
[task 2018-02-24T05:37:25.846Z] 05:37:25    ERROR - Return code: -15
[task 2018-02-24T05:37:25.846Z] 05:37:25    ERROR - No suite end message was emitted by this harness.
[task 2018-02-24T05:37:25.846Z] 05:37:25    ERROR - # TBPL FAILURE #
[task 2018-02-24T05:37:25.846Z] 05:37:25  WARNING - setting return code to 2

Flags: needinfo?(james)

Comment hidden (Intermittent Failures Robot)

89 failures in 675 pushes (0.132 failures/push) were associated with this bug in the last 7 days. 

This is the #5 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 41
* mozilla-inbound: 23
* mozilla-central: 17
* try: 5
* mozilla-beta: 3

Platform breakdown:
* linux64: 51
* linux32: 28
* linux64-ccov: 3
* linux32-nightly: 3
* linux64-nightly: 2
* osx-10-10: 1
* linux64-jsdcov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-19&endday=2018-02-25&tree=all

Comment hidden (Intermittent Failures Robot)

23 failures in 145 pushes (0.159 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 12
* mozilla-inbound: 7
* try: 2
* mozilla-central: 2

Platform breakdown:
* linux64: 14
* linux32: 8
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-26&endday=2018-02-26&tree=all

Comment hidden (Intermittent Failures Robot)

30 failures in 169 pushes (0.178 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 17
* mozilla-inbound: 11
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 23
* linux32: 6
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-28&endday=2018-02-28&tree=all

Comment hidden (Intermittent Failures Robot)

34 failures in 179 pushes (0.19 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* autoland: 21
* mozilla-inbound: 11
* try: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 24
* linux32: 8
* osx-10-10: 1
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-03-01&endday=2018-03-01&tree=all

Comment 193

•

6 years ago

:jgraham, can we disable wdspec on linux?

Comment hidden (Intermittent Failures Robot)

17 failures in 150 pushes (0.113 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 6
* autoland: 5
* mozilla-central: 3
* mozilla-beta: 2
* try: 1

Platform breakdown:
* linux64: 8
* linux32: 5
* linux64-nightly: 1
* linux64-jsdcov: 1
* linux64-ccov: 1
* linux32-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-03-02&endday=2018-03-02&tree=all

Comment hidden (Intermittent Failures Robot)

137 failures in 831 pushes (0.165 failures/push) were associated with this bug in the last 7 days. 

This is the #3 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **   

Repository breakdown:
* autoland: 69
* mozilla-inbound: 39
* mozilla-central: 14
* try: 9
* mozilla-beta: 4
* mozilla-release: 2

Platform breakdown:
* linux64: 91
* linux32: 32
* linux64-jsdcov: 4
* linux64-nightly: 3
* osx-10-10: 2
* linux64-ccov: 2
* linux32-nightly: 2
* linux64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-02-26&endday=2018-03-04&tree=all

Assignee

Comment 196

•

6 years ago

I was trying to reproduce this hang with a 1-click loaner and got it fairly quickly. Sadly I cannot use GDB to get stacktraces of each of the processes because ptrace is not allowed:

> # sudo gdb --pid=3586
> GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
> [..]
> Attaching to process 3586
> Could not attach to process.  If your uid matches the uid of the target
> process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
> again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
> ptrace: Operation not permitted.

I also cannot turn off the restriction:

> # echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> tee: /proc/sys/kernel/yama/ptrace_scope: Read-only file system
> 0

At least by inspecting the log I noticed that the hang happens after `Executor setup done`, which is NOT printed to the public logs as uploaded to taskcluster:

{"source": "web-platform-tests", "thread": "Thread-TestrunnerManager-1", "time": 1520242899539, "action": "log", "message": "Executor setup done",
 "level": "DEBUG", "pid": 3586}
^C{"source": "web-platform-tests", "thread": "Thread-TestrunnerManager-1", "time": 1520243299983, "action": "log", "message": "TestRunnerManager main loop terminating, starting cleanup", "level": "DEBUG", "pid": 3586}

Assignee

Comment 197

•

6 years ago

Attached file bash script for reproduction — Details

Here the instructions in how to reproduce the hang on a one-click loaner:

1) Create a one-click loaner based on https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=e0a1ba39c48913e4e88efe4fa289c23454ff5181&selectedJob=165824714
2) Run the execution step and stop it when tests are started
3) Save this attachment as run.sh and make it executable
4) Run ./run.sh and wait until it hangs

Comment 198

•

6 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #193)
> :jgraham, can we disable wdspec on linux?

Redirecting this to maja_zf.

Flags: needinfo?(james) → needinfo?(mjzffr)

Maja Frydrychowicz :impossibus (was :maja_zf) (needinfo me)

Assignee

Comment 199

•

6 years ago

The hang is actually caused for `__import__(modname)` where `modname` is `tests.conftest`:

https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/third_party/py/py/_path/local.py#668
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/webdriver/tests/conftest.py

I'm going to reduce the content of this file now to see what causes it. I suspect it is the import of pytest.

Comment 200

•

6 years ago

Rather than disabling the entire suite, I like the idea of setting it to Tier-2 for a couple of months until we get the most broad, infrastructure-related intermittents figured out. The individual tests, for the most part, are fine and provide useful information to developers working on the features being tested.

Flags: needinfo?(mjzffr)

Assignee

Comment 201

•

6 years ago

Reducing conftest.py showed that the hang comes actually with `fixtures.py`:

https://searchfox.org/mozilla-central/source/testing/web-platform/tests/webdriver/tests/support/fixtures.py

In that file I can remove everything except the following line:

> import webdriver

So something in that package keeps us hanging. More updates follow...

Assignee

Comment 202

•

6 years ago

I finally traced it down to StructuredLog and here specifically the `__init__` method:

https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/structuredlog.py#149

The hang happens in `with self._lock`. James given that you know mozlog better than me, any advice in what would be good as next steps?

Flags: needinfo?(james)

Comment 203

•

6 years ago

So, I would really like to see what happens if you apply the following patch to log all activity on that lock:

diff --git a/testing/mozbase/mozlog/mozlog/structuredlog.py b/testing/mozbase/mozlog/mozlog/structuredlog.py
index 74a7ef55e3f4..712560a0eeda 100644
--- a/testing/mozbase/mozlog/mozlog/structuredlog.py
+++ b/testing/mozbase/mozlog/mozlog/structuredlog.py
@@ -7,6 +7,7 @@ from __future__ import absolute_import, print_function, unicode_literals
 from multiprocessing import current_process
 from threading import current_thread, Lock
 import json
+import os
 import sys
 import time
 import traceback
@@ -69,6 +70,27 @@ Subfields for all messages:
 _default_logger_name = None
 
 
+class LoggedLock(object):
+    def __init__(self):
+        self._lock = Lock()
+
+    def acquire(self):
+        os.write(2, "Lock requested %s %s\n" % (current_process().pid, current_thread().name))
+        rv = self._lock.acquire()
+        os.write(2, "Lock acquired %s %s\n" % (current_process().pid, current_thread().name))
+        return rv
+
+    def release(self):
+        os.write(2, "Lock released by %s %s\n" % (current_process().pid, current_thread().name))
+        return self._lock.release()
+
+    def __enter__(self):
+        return self.acquire()
+
+    def __exit__(self, *args, **kwargs):
+        return self.release()
+
+
 def get_default_logger(component=None):
     """Gets the default logger if available, optionally tagged with component
     name. Will return None if not yet set
@@ -134,7 +156,7 @@ class ComponentState(object):
 
 
 class StructuredLogger(object):
-    _lock = Lock()
+    _lock = LoggedLock()
     _logger_states = {}
     """Create a structured logger with the given name
 
diff --git a/testing/web-platform/tests/tools/webdriver/webdriver/client.py b/testing/web-platform/tests/tools/webdriver/webdriver/client.py
index 622c5fce76ac..86209a366691 100644
--- a/testing/web-platform/tests/tools/webdriver/webdriver/client.py
+++ b/testing/web-platform/tests/tools/webdriver/webdriver/client.py
@@ -6,7 +6,10 @@ import transport
 
 from six import string_types
 
+import os
+os.write(2, "Before get_default_logger\n")
 from mozlog import get_default_logger
+os.write(2, "After get_default_logger\n")
 
 logger = get_default_logger()
 



That's against m-c so you may need to edit the paths to get it to apply.


I suspect what's going on is that we are randomly inheriting a locked lock into the subprocess due to the Incredible Brokenness of Python Multithreading. But I'd like to confirm that's what's going on.


Irrespective of that, we don't actually seem to use the logger anywhere? So we could delete that line and stop using mozlog in the conftest file and the problem will go away. Apologies for not spotting this earlier. In retrospect it seems like exactly the kind of thing that coculd cause exactly this kind of problem

Flags: needinfo?(james)

Comment 204

•

6 years ago

*Multiprocessing. Multithreading is less broken, just not fit for (many) purpose(s).

Assignee

Updated

•

6 years ago

Assignee: nobody → hskupin

Status: NEW → ASSIGNED

Priority: P5 → P1

Assignee

Comment 205

•

6 years ago

Attached file mozlog LoggedLock output — Details

Assignee

Comment 206

•

6 years ago

Also I don't see why we are hanging here in trying to aquire a lock. As it looks like the same lock instance gets released right before:

Lock requested 14850 Thread-TestrunnerManager-1 for <mozlog.structuredlog.LoggedLock object at 0x7f3bacd64bd0>
Lock acquired 14850 Thread-TestrunnerManager-1 for <mozlog.structuredlog.LoggedLock object at 0x7f3bacd64bd0>
{"source": "web-platform-tests", "thread": "Thread-TestrunnerManager-1", "time": 1520281029594, "action": "log", "message": "Executor setup done
", "level": "DEBUG", "pid": 14850}
Lock released by 14850 Thread-TestrunnerManager-1 for <mozlog.structuredlog.LoggedLock object at 0x7f3bacd64bd0>
*** before self._lockLock requested 14890 Thread-2 for <mozlog.structuredlog.LoggedLock object at 0x7f3bacd64bd0>

Assignee

Comment 207

•

6 years ago

(In reply to James Graham [:jgraham] from comment #203)
> Irrespective of that, we don't actually seem to use the logger anywhere? So
> we could delete that line and stop using mozlog in the conftest file and the
> problem will go away. Apologies for not spotting this earlier. In retrospect

This doesn't help. We still hang even with the lines removed in `fixture.py`.

Assignee

Comment 208

•

6 years ago

The thread which actually hangs here is the one which gets created by `WdspecRun.run()`:

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/base.py#405

Comment hidden (mozreview-request)

Due to misbehavior of the threading module in how threads are forked,
the mozlog module and specifically the Lock instance in StructuredLogger
can cause a hang of random duration when trying to require the lock.

Given that mozlog isn't used anywhere workaround this problem by
removing all instances of it.

Review commit: https://reviewboard.mozilla.org/r/225132/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/225132/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/225134/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/225134/

Comment hidden (mozreview-request)

Comment on attachment 8951549 [details]
Bug 1407383 - [wdspec] Re-enable key.py on Linux64.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/220862/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8956221 [details]
Bug 1407383 - [wdspec] Remove usage of mozlog to prevent process lock.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/225132/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8956222 [details]
Bug 1407383 - [wdspec] Remove unused imports from fixtures module.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/225134/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8951549 [details]
Bug 1407383 - [wdspec] Re-enable key.py on Linux64.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/220862/diff/2-3/

Comment hidden (Intermittent Failures Robot)

17 failures in 122 pushes (0.139 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* mozilla-inbound: 7
* autoland: 6
* mozilla-beta: 2
* try: 1
* mozilla-central: 1

Platform breakdown:
* linux32: 9
* linux64: 5
* linux64-devedition: 1
* linux32-nightly: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-03-05&endday=2018-03-05&tree=all

Comment hidden (mozreview-request)

Comment on attachment 8956222 [details]
Bug 1407383 - [wdspec] Remove unused imports from fixtures module.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/225134/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8956221 [details]
Bug 1407383 - [wdspec] Remove usage of mozlog to prevent process lock.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/225132/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8951549 [details]
Bug 1407383 - [wdspec] Re-enable key.py on Linux64.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/220862/diff/3-4/

Assignee

Comment 219

•

6 years ago

mozreview-review

Comment on attachment 8956221 [details]
Bug 1407383 - [wdspec] Remove usage of mozlog to prevent process lock.

https://reviewboard.mozilla.org/r/225132/#review231182

::: testing/web-platform/tests/webdriver/tests/support/fixtures.py:20
(Diff revision 3)
>  def ignore_exceptions(f):
>      def inner(*args, **kwargs):
>          try:
>              return f(*args, **kwargs)
>          except webdriver.error.WebDriverException as e:
> -            logger.warning("Ignored exception %s" % e)
> +            print >> sys.stderr, "Ignored exception %s" % e

Not sure what I should make out of this lint failure:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=14d058dbd80c9cb2b5645f55012c61e5b3f8f454&selectedJob=166153169

Does it mean I have to use a print function or not? The wording is kinda strange. Also i don't see this failure when running the linter locally with the exact same arguments.

Assignee

Updated

•

6 years ago

Attachment #8956222 - Flags: review?(james)

Attachment #8956221 - Flags: review?(james)

Attachment #8951549 - Flags: review?(james)

Comment 220

•

6 years ago

mozreview-review

Comment on attachment 8956221 [details]
Bug 1407383 - [wdspec] Remove usage of mozlog to prevent process lock.

https://reviewboard.mozilla.org/r/225132/#review231224

Attachment #8956221 - Flags: review?(james) → review+

Comment 221

•

6 years ago

mozreview-review

Comment on attachment 8956222 [details]
Bug 1407383 - [wdspec] Remove unused imports from fixtures module.

https://reviewboard.mozilla.org/r/225134/#review231228

Attachment #8956222 - Flags: review?(james) → review+

Comment 222

•

6 years ago

mozreview-review

Comment on attachment 8951549 [details]
Bug 1407383 - [wdspec] Re-enable key.py on Linux64.

https://reviewboard.mozilla.org/r/220862/#review231226

Attachment #8951549 - Flags: review?(james) → review+

Comment hidden (mozreview-request)

Comment on attachment 8956221 [details]
Bug 1407383 - [wdspec] Remove usage of mozlog to prevent process lock.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/225132/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8951549 [details]
Bug 1407383 - [wdspec] Re-enable key.py on Linux64.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/220862/diff/4-5/

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Assignee

Comment 225

•

6 years ago

mozreview-review-reply

Comment on attachment 8956221 [details]
Bug 1407383 - [wdspec] Remove usage of mozlog to prevent process lock.

https://reviewboard.mozilla.org/r/225132/#review231182

> Not sure what I should make out of this lint failure:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=14d058dbd80c9cb2b5645f55012c61e5b3f8f454&selectedJob=166153169
> 
> Does it mean I have to use a print function or not? The wording is kinda strange. Also i don't see this failure when running the linter locally with the exact same arguments.

The problem was that the file needed to be added to the whitelist for the linter to allow the usage of `print` at all.

Pulsebot

Comment 226

•

6 years ago

Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/880cc555591a
[wdspec] Remove unused imports from fixtures module. r=jgraham
https://hg.mozilla.org/integration/autoland/rev/c26cb6d150d5
[wdspec] Remove usage of mozlog to prevent process lock. r=jgraham
https://hg.mozilla.org/integration/autoland/rev/bf1d69a11d6f
[wdspec] Re-enable key.py on Linux64. r=jgraham

Comment 227

•

6 years ago

Created web-platform-tests PR https://github.com/w3c/web-platform-tests/pull/9872 for changes under testing/web-platform/tests

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Updated

•

6 years ago

Whiteboard: [stockwell disable-recommended] → [stockwell disable-recommended][wptsync upstream]

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Comment 228

•

6 years ago

Can't merge web-platform-tests PR due to failing upstream checks:
Github PR https://github.com/w3c/web-platform-tests/pull/9872
* continuous-integration/travis-ci/pr (https://travis-ci.org/w3c/web-platform-tests/builds/349768613?utm_source=github_status&utm_medium=notification)

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Updated

•

6 years ago

Whiteboard: [stockwell disable-recommended][wptsync upstream] → [stockwell disable-recommended][wptsync upstream error]

Dorel Luca [:dluca]

Comment 229

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/880cc555591a
https://hg.mozilla.org/mozilla-central/rev/c26cb6d150d5
https://hg.mozilla.org/mozilla-central/rev/bf1d69a11d6f

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Updated

•

6 years ago

Whiteboard: [stockwell disable-recommended][wptsync upstream error] → [stockwell disable-recommended][wptsync upstream]

Assignee

Updated

•

6 years ago

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox60: disabled → fixed

Keywords: leave-open, test-disabled

Resolution: --- → FIXED

Whiteboard: [stockwell disable-recommended][wptsync upstream] → [stockwell fixed][wptsync upstream]

Target Milestone: --- → mozilla60

Assignee

Updated

•

6 years ago

Blocks: 1420189

Andreea Pavel [:apavel]

Comment 230

•

6 years ago

Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=167034813&repo=autoland&lineNumber=377485

Flags: needinfo?(hskupin)

Comment hidden (Intermittent Failures Robot)

31 failures in 747 pushes (0.041 failures/push) were associated with this bug in the last 7 days. 

This is the #44 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* mozilla-inbound: 10
* autoland: 10
* mozilla-release: 5
* mozilla-beta: 3
* mozilla-central: 2
* try: 1

Platform breakdown:
* linux32: 14
* linux64: 11
* linux32-nightly: 4
* linux64-devedition: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1407383&startday=2018-03-05&endday=2018-03-11&tree=all

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Assignee

Comment 232

•

6 years ago

(In reply to Andreea Pavel [:apavel] from comment #230)
> Recent failure log:
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=167034813&repo=autoland&lineNumber=377485

This has nothing to do with this bug, and is something completely different. I will file a new bug.

Flags: needinfo?(hskupin)

Summary: Intermittent Wd Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', → Intermittent Wd Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-'] due to hang in mozlog

Comment 233

•

6 years ago

Upstream PR merged

Comment hidden (Intermittent Failures Robot)

3 failures in 765 pushes (0.004 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-beta: 3

Platform breakdown:
* linux32-devedition: 3

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1407383&startday=2018-04-02&endday=2018-04-08&tree=trunk