Closed Bug 1414144 Opened 7 years ago Closed 6 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/cookies/delete_cookie.py | expected OK

Categories

(Remote Protocol :: Marionette, defect, P3)

Version 3
defect

Tracking

(firefox59 wontfix, firefox60 wontfix, firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Here an excerpt from the failure:

[task 2017-11-03T03:25:42.976Z] 03:25:42     INFO - PID 1453 | 1509679542967	Marionette	DEBUG	Received DOM event "DOMContentLoaded" for "about:neterror?e=dnsNotFound&u=http%3A//web-platform.test%3A8000/webdriver/tests/support/inline.py%3Fdoc%3D%253Ctitle%253EWD+doc+title%253C%252Ftitle%253E%26content-type%3Dtext%252Fhtml%253Bcharset%253Dutf-8&c=UTF-8&f=regular&d=Firefox%20can%E2%80%99t%20find%20the%20server%20at%20web-platform.test."
[task 2017-11-03T03:25:43.919Z] 03:25:43     INFO - STDOUT: FAILED
[task 2017-11-03T03:25:44.061Z] 03:25:44     INFO - PID 1453 | 1509679544051	Marionette	INFO	New connections will no longer be accepted
[task 2017-11-03T03:25:44.279Z] 03:25:44     INFO - PID 1453 | [Child 1673] WARNING: pipe error (17): Connection reset by peer: file /build/firefox-l1g98U/firefox-56.0+build6/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
[task 2017-11-03T03:25:44.280Z] 03:25:44     INFO - PID 1453 | [Child 1673] WARNING: pipe error (3): Connection reset by peer: file /build/firefox-l1g98U/firefox-56.0+build6/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
[task 2017-11-03T03:25:44.337Z] 03:25:44     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_handle_prompt_missing_value 

As it looks like we hit a network issue here, and as result the requested page could not be loaded. I don't know how DNS is handled by wptrunner and wptserve, so I would like to defer to James here.

Otherwise why do the pipe errors contain "/build/firefox-l1g98U/firefox-56.0+build6/" for a build on autoland which is Firefox 58?

For me the above sounds like a operating system failure, maybe a complete network stack shutdown?
Flags: needinfo?(ted)
Flags: needinfo?(james)
I suspect that's a symptom rather than the cause. I would guess that the test is failing in some way, we close the connection and then try to communicate over it for some reason. I think you'd need to find a way to reproduce the failure to see exactly what's happening though.
Flags: needinfo?(james)
I don't have any specific knowledge here, and I don't have time to dig into this. Sorry.
Flags: needinfo?(ted)
(In reply to Henrik Skupin (:whimboo) from comment #1)
> [task 2017-11-03T03:25:44.279Z] 03:25:44     INFO - PID 1453 | [Child 1673]
> WARNING: pipe error (17): Connection reset by peer: file
> /build/firefox-l1g98U/firefox-56.0+build6/ipc/chromium/src/chrome/common/
> ipc_channel_posix.cc, line 353
>
> Otherwise why do the pipe errors contain
> "/build/firefox-l1g98U/firefox-56.0+build6/" for a build on autoland which
> is Firefox 58?

Kan-Ru, do you have an explanation for the above question? Why do we show 56.0 for a current build from autoland?
Flags: needinfo?(kchen)
(In reply to Henrik Skupin (:whimboo) from comment #5)
> (In reply to Henrik Skupin (:whimboo) from comment #1)
> > [task 2017-11-03T03:25:44.279Z] 03:25:44     INFO - PID 1453 | [Child 1673]
> > WARNING: pipe error (17): Connection reset by peer: file
> > /build/firefox-l1g98U/firefox-56.0+build6/ipc/chromium/src/chrome/common/
> > ipc_channel_posix.cc, line 353
> >
> > Otherwise why do the pipe errors contain
> > "/build/firefox-l1g98U/firefox-56.0+build6/" for a build on autoland which
> > is Firefox 58?
> 
> Kan-Ru, do you have an explanation for the above question? Why do we show
> 56.0 for a current build from autoland?

I have no idea tbh. The only explanation I can think of is that maybe the test runner is using an older version of firefox to run something.
Flags: needinfo?(kchen)
Hm, the build has been downloaded from:
https://tools.taskcluster.net/groups/Lk2knvElS4KwOENmItI0Tw/tasks/J704BOBjRJ-Rho7TJmICcQ/runs/0/artifacts

Which is:

20171103021641
https://hg.mozilla.org/integration/autoland/rev/e6d86b7284bae701700b9d300ee1476ebe5f3eed

So it's nothing related to 56 beta at all.

AFAIK we always create a new docker instance for each job on Linux, so there shouldn't be any remaining Firefox process running.
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
This still happens:
https://treeherder.mozilla.org/logviewer.html#?job_id=155191260&repo=autoland&lineNumber=1983

The test exceeds the 25s default timeout because there seem to be a hang in `test_handle_prompt_accept` or during shutdown. Due to less logging output this is hard to say:

[task 2018-01-10T01:00:13.888Z] 01:00:13     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_handle_prompt_accept 
[task 2018-01-10T01:00:43.571Z] 01:00:43     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/cookies/delete_cookie.py | expected OK
[task 2018-01-10T01:00:43.571Z] 01:00:43     INFO - TEST-INFO took 30002ms
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Debug builds show the following here:

[task 2018-01-10T03:21:31.525Z] 03:21:31     INFO - PID 2965 | [Parent 2977, IPDL Background] WARNING: Some storage operations are taking longer than expected during shutdown and will be aborted!: file /builds/worker/workspace/build/src/dom/quota/ActorsParent.cpp, line 5824
[task 2018-01-10T03:21:46.397Z] 03:21:46     INFO - PID 2965 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
[task 2018-01-10T03:21:50.327Z] 03:21:50     INFO - PID 2965 | [Parent 2977, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/xhr/XMLHttpRequestMainThread.cpp, line 2772
[task 2018-01-10T03:21:50.327Z] 03:21:50     INFO - PID 2965 | [Parent 2977, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/xhr/XMLHttpRequestMainThread.cpp, line 2772
[task 2018-01-10T03:21:50.328Z] 03:21:50     INFO - PID 2965 | [Parent 2977, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/xhr/XMLHttpRequestMainThread.cpp, line 2772
[task 2018-01-10T03:21:50.329Z] 03:21:50     INFO - PID 2965 | [Parent 2977, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/xhr/XMLHttpRequestMainThread.cpp, line 2772
[task 2018-01-10T03:21:50.330Z] 03:21:50     INFO - PID 2965 | [Parent 2977, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/xhr/XMLHttpRequestMainThread.cpp, line 2772
[task 2018-01-10T03:21:50.330Z] 03:21:50     INFO - PID 2965 | [Parent 2977, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/xhr/XMLHttpRequestMainThread.cpp, line 2772
[task 2018-01-10T03:22:03.712Z] 03:22:03     INFO - PID 2965 | Hit MOZ_CRASH(Workers Hanging - A:3|S:0|Q:0-BC:0|WorkerHolderToken-BC:0|WorkerHolderToken-BC:0|WorkerHolderToken) at /builds/worker/workspace/build/src/dom/workers/RuntimeService.cpp:2187
[task 2018-01-10T03:22:03.712Z] 03:22:03     INFO - PID 2965 | ExceptionHandler::GenerateDump cloned child 3127
[task 2018-01-10T03:22:03.713Z] 03:22:03     INFO - PID 2965 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2018-01-10T03:22:03.714Z] 03:22:03     INFO - PID 2965 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...

So it also looks like that we are having a hang during shutdown caused by workers. Andrea, I assume that this might also be related to bug 1425559?
Flags: needinfo?(amarchesini)
This bug has failed 34 times in the last 7 days on Linux affecting pgo build types.
Failing test: opt-web-platform-tests-wdspec-e10

Link to a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=160000339&lineNumber=1868

Part of that log:
[task 2018-02-02T01:44:12.084Z] 01:44:12     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_handle_prompt_accept 
[task 2018-02-02T01:44:12.269Z] 01:44:12     INFO - PID 2836 | [Parent 2848, Gecko_IOThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709
[task 2018-02-02T01:44:41.768Z] 01:44:41     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/cookies/delete_cookie.py | expected OK
[task 2018-02-02T01:44:41.769Z] 01:44:41     INFO - TEST-INFO took 30001ms
[task 2018-02-02T01:44:41.799Z] 01:44:41  WARNING - u'runner_teardown': ()
[task 2018-02-02T01:44:41.800Z] 01:44:41     INFO - Starting runner
[task 2018-02-02T01:44:41.823Z] 01:44:41     INFO - PID 2992 | 1517535881815	geckodriver	INFO	geckodriver 0.19.1 ( '0690bf69410a'  '2018-02-01 12:29 -0800')
[task 2018-02-02T01:44:41.823Z] 01:44:41     INFO - PID 2992 | 1517535881818	geckodriver	INFO	Listening on 127.0.0.1:4447
[task 2018-02-02T01:44:42.322Z] 01:44:42     INFO - WebDriver HTTP server listening at http://127.0.0.1:4447/
[task 2018-02-02T01:44:42.323Z] 01:44:42     INFO - TEST-START | /webdriver/tests/cookies/get_named_cookie.py
[task 2018-02-02T01:44:42.500Z] 01:44:42     INFO - STDOUT: ============================= test session starts ==============================
[task 2018-02-02T01:44:42.501Z] 01:44:42     INFO - STDOUT: platform linux2 -- Python 2.7.12, pytest-unknown, py-1.5.2, pluggy-0.5.3.dev -- /builds/worker/workspace/build/venv/bin/python
[task 2018-02-02T01:44:42.502Z] 01:44:42     INFO - STDOUT: rootdir: /builds/worker/workspace/build, inifile:
[task 2018-02-02T01:44:42.502Z] 01:44:42     INFO - STDOUT: collecting ... 
[task 2018-02-02T01:44:42.538Z] 01:44:42     INFO - STDOUT: collected 3 items

:whimboo do you have any update on this?
Flags: needinfo?(hskupin)
Whiteboard: [stockwell needswork]
Can you check if this is fixed after bug 1422036?
Flags: needinfo?(amarchesini)
The intermittent failure is still present also with the backout of the patch from bug 1434618 which caused way more issues.

https://treeherder.mozilla.org/logviewer.html#?job_id=160525168&repo=mozilla-inbound&lineNumber=2005

[task 2018-02-06T01:44:05.900Z] 01:44:05     INFO - TEST-START | /webdriver/tests/cookies/delete_cookie.py
[task 2018-02-06T01:44:05.907Z] 01:44:05     INFO - STDOUT: ============================= test session starts ==============================
[task 2018-02-06T01:44:05.909Z] 01:44:05     INFO - STDOUT: platform linux2 -- Python 2.7.12, pytest-unknown, py-1.5.2, pluggy-0.5.3.dev -- /builds/worker/workspace/build/venv/bin/python
[task 2018-02-06T01:44:05.909Z] 01:44:05     INFO - STDOUT: rootdir: /builds/worker/workspace/build, inifile:
[task 2018-02-06T01:44:05.910Z] 01:44:05     INFO - STDOUT: collecting ... 
[task 2018-02-06T01:44:05.926Z] 01:44:05     INFO - STDOUT: collected 7 items
[task 2018-02-06T01:44:05.927Z] 01:44:05     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_no_browsing_context 
[task 2018-02-06T01:44:06.048Z] 01:44:06     INFO - STDOUT: PASSED
[task 2018-02-06T01:44:06.191Z] 01:44:06     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_handle_prompt_dismiss_and_notify 
[task 2018-02-06T01:44:06.193Z] 01:44:06     INFO - STDOUT: PASSED
[task 2018-02-06T01:44:06.193Z] 01:44:06     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_handle_prompt_accept_and_notify 
[task 2018-02-06T01:44:06.194Z] 01:44:06     INFO - STDOUT: PASSED
[task 2018-02-06T01:44:06.198Z] 01:44:06     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_handle_prompt_ignore 
[task 2018-02-06T01:44:06.199Z] 01:44:06     INFO - STDOUT: PASSED
[task 2018-02-06T01:44:06.199Z] 01:44:06     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/cookies/delete_cookie.py::test_handle_prompt_accept 
[task 2018-02-06T01:44:35.862Z] 01:44:35     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/cookies/delete_cookie.py | expected OK

All the tests work fine, but then we hang during shutdown of Firefox in test_handle_prompt_accept. Maybe it's somewhat related to bug 1425559? Sadly those are all opt builds where it happens so not that much logging is available.
Flags: needinfo?(hskupin) → needinfo?(amarchesini)
Priority: P5 → P3
Priority: P3 → P5
In the last 7 days, there have been 31 failures.

The failures are on Linux x64 and Linux platform and it affects opt and pgo build types.

Recent log file:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=165410711&lineNumber=2000

:whimboo, :baku Do you have any updates here?
Flags: needinfo?(hskupin)
Whiteboard: [stockwell unknown] → [stockwell needswork]
This is clearly a shutdown hang:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=166463179&lineNumber=17056

Andrea, do you think it could improve when we move to `will-xpcom-shutdown`? Sadly this hang only happens for opt/pgo builds, so no further information available.
Flags: needinfo?(hskupin)
FWIW I moved Marionette to use xpcom-will-shutdown in
https://bugzil.la/1428795, so if we think that will have an impact
here we should see a decline in the OF score pretty soon.
Oh, Andreas filed bug 1443524 which this should be dependent on.
Depends on: 1443524
(In reply to Andreas Tolfsen ‹:ato› from comment #26)
> FWIW I moved Marionette to use xpcom-will-shutdown in
> https://bugzil.la/1428795, so if we think that will have an impact
> here we should see a decline in the OF score pretty soon.

No, I was talking about bug 1434618.
Priority: P5 → P3
In most cases when this failures is happening the hang occurs in `test_handle_prompt_accept`, which is using a new session. So this failure should depend on a solution for bug 1449538.

But there are also a low number of cases when Firefox restarts quickly but then no "sessionstore-windows-restored" notification is received by Marionette:

https://treeherder.mozilla.org/logviewer.html#?job_id=170140630&repo=mozilla-inbound&lineNumber=20448-20464

This should basically be bug 1382162.
Depends on: 1434618
No longer depends on: 1434618
Fixed by bug 1449538.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Flags: needinfo?(amarchesini)
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Moving bug to Remote Protocol::Marionette component per bug 1815831.
Component: geckodriver → Marionette
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.