Closed Bug 1114895 Opened 10 years ago Closed 8 years ago

Intermittent test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index | TimeoutException: TimeoutException: Connection timed out| application crashed

Categories

(Testing :: Marionette Client and Harness, defect)

x86
macOS
defect
Not set
normal

Tracking

(firefox36 unaffected, firefox37 wontfix, firefox38 wontfix, firefox39 wontfix, firefox40 wontfix, firefox41 disabled, firefox-esr31 unaffected, firefox-esr38 wontfix)

RESOLVED WORKSFORME
Tracking Status
firefox36 --- unaffected
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 --- wontfix
firefox41 --- disabled
firefox-esr31 --- unaffected
firefox-esr38 --- wontfix

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: intermittent-failure, pi-marionette-intermittent)

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound debug test marionette

https://treeherder.mozilla.org/logviewer.html#?job_id=4903403&repo=mozilla-inbound

21:40:53 ERROR - TEST-UNEXPECTED-ERROR | test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index | TimeoutException: TimeoutException: Connection timed out
Dave, do you have spare cycles to look into this frequent 10.8 Mn failure?
Component: JavaScript Engine → Marionette
Flags: needinfo?(dburns)
Product: Core → Testing
Looking at this, I wonder if the test has found a real issue since the browser is crashing in a11y code
Flags: needinfo?(dburns)
Is this something that you can help with. It looks like Marionette is causing a crash in a11y code intermittently.
Flags: needinfo?(surkov.alexander)
Is the test running in multiprocess Firefox? Out of curiosity do you know how a11y gets enabled here?
Flags: needinfo?(surkov.alexander)
No, we are not running this in e10s, that is in a separate job on treeherder.

The test[1] is creating a OOP frame and when we check what process is not the main one[2] then we appear to hit the crash that is mentioned in above. 


[1] https://dxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/tests/unit/test_switch_remote_frame.py#91
[2] https://dxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/tests/unit/test_switch_remote_frame.py#106
(In reply to David Burns :automatedtester from comment #101)
> No, we are not running this in e10s, that is in a separate job on treeherder.
> 
> The test[1] is creating a OOP frame and when we check what process is not
> the main one[2] then we appear to hit the crash that is mentioned in above. 

sorry for my ignorance but what's difference between e10s and dealing with OOP frames? Does it crash in main process?
I can't see anywhere were we are explicitly switching on a11y stuff. When would someone in your team be able to look at it because from the marionette side we are just running some JS to check if we are in the main process. We are recreating OOP frames like b2g which, as far as I understand it were the precursor to e10s and have their own special snowflake sandboxing. This is making sure frames can't speak by moving them OOP instead of making tabs OOP.

We check if we are in the main thread because we shouldnt but running this of the main thread is causing the crash, I think.
Flags: needinfo?(surkov.alexander)
Trevor, do you have ideas what may happen here?
Flags: needinfo?(surkov.alexander) → needinfo?(tbsaunde+mozbugs)
(In reply to alexander :surkov from comment #147)
> Trevor, do you have ideas what may happen here?

not off hand, why doesn't someone try and debug it?
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to Trevor Saunders (:tbsaunde) from comment #174)
> (In reply to alexander :surkov from comment #147)
> > Trevor, do you have ideas what may happen here?
> 
> not off hand, why doesn't someone try and debug it?

Sure if no good ideas so far. Do you want to give it a try?
Has anyone had chance to look at this?
Flags: needinfo?(surkov.alexander)
Trevor, since you are focused currently on a11y multiprocessing stuff then can you add this bug to your queue please?
Flags: needinfo?(surkov.alexander) → needinfo?(tbsaunde+mozbugs)
(In reply to alexander :surkov from comment #385)
> Trevor, since you are focused currently on a11y multiprocessing stuff then
> can you add this bug to your queue please?

I don't think so.  This is showing up on beta which isn't effected by anything I'm doing.
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to David Burns :automatedtester from comment #145)
> I can't see anywhere were we are explicitly switching on a11y stuff.

would it be possible to add some logging into nsAccessibilityService constructor to dump a call stack? that would give us a hint what triggers a11y in that thread.
Summary: Intermittent test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index | TimeoutException: TimeoutException: Connection timed out → Intermittent test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index | TimeoutException: TimeoutException: Connection timed out| application crashed
Something has caused this to spike recently. Any ideas?
Flags: needinfo?(tbsaunde+mozbugs)
David please find someone else to deal with this.
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(dbolter)
Ryan how do I get the source line number for the top of the crash stack from treeherder crashes?
Flags: needinfo?(ryanvm)
From the most-recent log (comment 468):
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-aurora&job_id=941199

18:50:27 INFO - TEST-START | test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index
18:56:29 INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/LG2XX1SDRv6vlVocxc8kcQ/artifacts/public/build/firefox-41.0a2.en-US.mac64.crashreporter-symbols.zip
18:56:58 INFO - mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/FEA3BB25-8048-48A8-A807-FFD59B5433F7.dmp
18:56:58 INFO - mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/FEA3BB25-8048-48A8-A807-FFD59B5433F7.extra
18:56:58 ERROR - PROCESS-CRASH | test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index | application crashed [@ mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*)]
18:56:58 INFO - Crash dump filename: /var/folders/_8/2lqr7c5x071f9gmmkyvxp58w00000w/T/tmpRKq75f.mozrunner/minidumps/FEA3BB25-8048-48A8-A807-FFD59B5433F7.dmp
18:56:58 INFO - Operating system: Mac OS X
18:56:58 INFO - 10.10.2 14C109
18:56:58 INFO - CPU: amd64
18:56:58 INFO - family 6 model 42 stepping 7
18:56:58 INFO - 8 CPUs
18:56:58 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
18:56:58 INFO - Crash address: 0x130
18:56:58 INFO - Thread 0 (crashed)
18:56:58 INFO - 0 XUL!mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*) [nsCOMPtr.h:e796c66eb475 : 693 + 0x0]
18:56:58 INFO - rbx = 0x0000000000000000 r12 = 0x000000011b83ac40
18:56:58 INFO - r13 = 0x00000001070f35e0 r14 = 0x00000001070f35e0
18:56:58 INFO - r15 = 0x000000011b83ad60 rip = 0x000000010317e392
18:56:58 INFO - rsp = 0x00007fff5fbfb930 rbp = 0x00007fff5fbfb9c0
18:56:58 INFO - Found by: given as instruction pointer in context
Flags: needinfo?(ryanvm)
Yeah. Was hoping for line in ProcessFocusEvent... line in nsCOMPtr not going to help this guy ;)
Was chatting with Ryan. Since this is appearing on Aurora, this is likely 'a11y is not ready for e10s' fallout. We need to disable e10s for a11y users on Aurora. Filed as bug 1182097.

Aside: it would still be good to know exactly why Nightly is not crashing like Aurora; if there is some instability that e10s+a11y is uncovering...
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(jmathies)
(These need-info's are more like pings)
Flags: needinfo?(dbolter)
(In reply to David Bolter [:davidb] from comment #473)
> Was chatting with Ryan. Since this is appearing on Aurora, this is likely
> 'a11y is not ready for e10s' fallout. We need to disable e10s for a11y users
> on Aurora. Filed as bug 1182097.
> 
> Aside: it would still be good to know exactly why Nightly is not crashing
> like Aurora; if there is some instability that e10s+a11y is uncovering...

I really doubt this has anything to do with e10s.  That said I do agree there's no reason to deal with accessibility and e10s together in 41.
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to Trevor Saunders (:tbsaunde) from comment #475)
> (In reply to David Bolter [:davidb] from comment #473)

> I really doubt this has anything to do with e10s.

Yeah, and since e10s is on in nightly :)

Probably a timing thing.
1) AFAICT non-e10s specific, I see this happening on esr, beta, and release builds
2) a significant regression landed around 7-7, which regressed aurora (merge related?)
3) reproducible on try with a little work
4) does not happen in the wild, there are no crashes in:
 mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*)
5) Looks like all the failures are on:
 MacOSX Yosemite 10.10 mozilla-aurora debug test marionette in test_switch_remote_frame.py

https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=e796c66eb475


00:21:42 INFO - TEST-START | test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index
00:27:42 ERROR - TEST-UNEXPECTED-ERROR | test_switch_remote_frame.py TestSwitchRemoteFrame.test_we_can_switch_to_a_remote_frame_by_index | TimeoutException: TimeoutException: Connection timed out
00:27:42 INFO - Traceback (most recent call last):
00:27:42 INFO - File "/builds/slave/test/build/tests/marionette/marionette/marionette_test.py", line 290, in run
00:27:42 INFO - testMethod()
00:27:42 INFO - File "/builds/slave/test/build/tests/marionette/tests/testing/marionette/client/marionette/tests/unit/test_switch_remote_frame.py", line 105, in test_we_can_switch_to_a_remote_frame_by_index
00:27:42 INFO - self.marionette.switch_to_frame(0)
00:27:42 INFO - File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1203, in switch_to_frame
00:27:42 INFO - response = self._send_message('switchToFrame', 'ok', id=frame, focus=focus)
00:27:42 INFO - File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 36, in _
00:27:42 INFO - return func(*args, **kwargs)
00:27:42 INFO - File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 697, in _send_message
00:27:42 INFO - "Connection timed out", status=errors.ErrorCodes.TIMEOUT)
00:27:42 INFO - TEST-INFO took 360491ms
Flags: needinfo?(jmathies)
merge day commits have overwritten the one RyanVM did in comment 536. This hasnt been seen for a while so closing
Status: NEW → RESOLVED
Closed: 8 years ago
Keywords: leave-open
Resolution: --- → WORKSFORME
Product: Testing → Remote Protocol
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in before you can comment on or make changes to this bug.