Closed Bug 1271103 Opened 8 years ago Closed 8 years ago

Crashes inside mozilla::gl::CreateSurfaceForWindow, starting 2016-05-06

Categories

(Firefox for Android Graveyard :: Toolbar, defect)

Unspecified
Android
defect
Not set
critical

Tracking

(firefox49blocking fixed, fennec49+, firefox50+ fixed)

RESOLVED FIXED
Tracking Status
firefox49 blocking fixed
fennec 49+ ---
firefox50 + fixed

People

(Reporter: dbaron, Assigned: droeh)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(1 file, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-677b86b4-3d94-4432-95bc-3a06d2160506.
=============================================================

A new topcrash appeared in Fennec nightly starting 2016-05-06.

The one day regression window is:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=369a5ee3a2880a4a98df3a00bf3db8d8f36b181b&tochange=25d777f7efb357fc5478251913548521986abaa0
which means the obvious candidate appears to be:
https://hg.mozilla.org/mozilla-central/rev/78731bdec697

Query for all the crashes on nightly at:
https://crash-stats.mozilla.com/signature/?product=FennecAndroid&release_channel=nightly&platform=Android&date=%3E%3D2016-04-15&signature=mozilla%3A%3Agl%3A%3ACreateSurfaceForWindow
Flags: needinfo?(snorp)
Flags: needinfo?(droeh)
Several distinct crash signatures all match this one, I think -- they all go through mozilla::gl::CreateSurfaceForWindow, and they all crash at address 0x58.
Crash Signature: [@ mozilla::gl::CreateSurfaceForWindow] → [@ mozilla::gl::CreateSurfaceForWindow] [@ eglCreateWindowSurface] [@ libEGL_adreno.so@0x2c052] [@ libEGL_adreno.so@0x11856]
By my count there are 35 crashes with this signature in Nightly 20160506052037. That would normally easily be #1 topcrash on Fennec, though in this case it's #2 because bug 1271020 is even worse.
I also see a bunch more crash reports without stack traces that crash at address 0x58, and so may have the same root cause. The signatures involved include:

@0xb6749320
@0xb6754cb0
@0xb67b9320
@0xb6700cb0
@0xb6d49284
@0xf7034cb0
@0xf72e82b1
Crash Signature: [@ mozilla::gl::CreateSurfaceForWindow] [@ eglCreateWindowSurface] [@ libEGL_adreno.so@0x2c052] [@ libEGL_adreno.so@0x11856] → [@ mozilla::gl::CreateSurfaceForWindow] [@ eglCreateWindowSurface] [@ libEGL_adreno.so@0x2c052] [@ libEGL_adreno.so@0x11856] [@ libgsl.so@0x153bb] [@ libgsl.so@0x15607]
Crash Signature: [@ mozilla::gl::CreateSurfaceForWindow] [@ eglCreateWindowSurface] [@ libEGL_adreno.so@0x2c052] [@ libEGL_adreno.so@0x11856] [@ libgsl.so@0x153bb] [@ libgsl.so@0x15607] → [@ mozilla::gl::CreateSurfaceForWindow] [@ eglCreateWindowSurface] [@ libEGL_adreno.so@0x2c052] [@ libEGL_adreno.so@0x11856] [@ libgsl.so@0x153bb] [@ libgsl.so@0x15607] [@ libwebviewchromium.so@0x11be00f] [@ mozilla::layers::CompositorBridgeParent:…
Crash Signature: mozilla::layers::CompositorBridgeParent::InitializeLayerManager] → mozilla::layers::CompositorBridgeParent::InitializeLayerManager] [@ libEGL_adreno.so@0x1285e]
Crash Signature: mozilla::layers::CompositorBridgeParent::InitializeLayerManager] [@ libEGL_adreno.so@0x1285e] → mozilla::layers::CompositorBridgeParent::InitializeLayerManager] [@ libEGL_adreno.so@0x1285e] [@ std::rethrow_exception ]
More crashes appeared under the signature [@ std::rethrow_exception]
Attached patch Proposed patch (obsolete) — Splinter Review
A possible fix, though since I can't reproduce this I'm not sure this is the cause.
Flags: needinfo?(snorp)
Flags: needinfo?(droeh)
Attachment #8751493 - Flags: review?(snorp)
Bug 1271773 has STR that produces a crash report that's very similar to this bug. The STR is to install the ZXing Barcode Scanner and click the barcode icon in the Fennec URL bar to launch the scanner.
Alright, thanks Nick. Back to the drawing board I guess.
One problem with the patch in bug 1136364 is that it changed the type id in nsWindow::GetNativeData from NS_NATIVE_NEW_EGL_SURFACE to NS_NATIVE_WINDOW. NS_NATIVE_WINDOW is used by a lot of unrelated code, and can be called from the main thread, compositor thread, etc. Using NS_NATIVE_WINDOW is inefficient because we're getting a surface unnecessarily. It can also easily cause race conditions when multiple threads try to call GetNativeData(NS_NATIVE_WINDOW) at the same time.
Attachment #8751493 - Flags: review?(snorp) → review+
Attached patch Proposed patch (updated) (obsolete) — Splinter Review
Some minor changes to give us more info about the crash. Carrying over snorp's r+.
Attachment #8751493 - Attachment is obsolete: true
Attachment #8752301 - Flags: review+
Keywords: leave-open
Assignee: nobody → cpearce
Status: NEW → ASSIGNED
Attachment #8752638 - Attachment is obsolete: true
Sorry, I made a mistake with bzexport and exported the wrong changeset to the wrong bug.
Assignee: cpearce → nobody
Status: ASSIGNED → NEW
Assignee: nobody → droeh
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Still the #2 topcrash in fennec, with 147 crashes from 92 installs in the last week.
I have STR:

1) Close firefox
2) Open link from different app.
3) Just after firefox opens press home button to minimise app
4) The crash reporter should appear after a couple of seconds.

I'd guess that when the app is minimised the surface we get from AcquireNativeWindow() is no longer valid, causing eglCreateWindowSurface to crash?
I've just recently been able to somewhat reliably reproduce by running in an emulator on a Linux VM (at least, whenever I'm able to attach the debugger without getting an ElfLoader crash, I eventually hit this crash), and it doesn't require minimizing. I do think we're getting a stale surface somehow, but I'm still not able to figure out exactly what's going awry.
Crash Signature: mozilla::layers::CompositorBridgeParent::InitializeLayerManager] [@ libEGL_adreno.so@0x1285e] [@ std::rethrow_exception ] → mozilla::layers::CompositorBridgeParent::InitializeLayerManager] [@ libEGL_adreno.so@0x1285e] [@ std::rethrow_exception ] [@ libart.so@0x2f1b62]
This is the #1 Fennec crash on Nightly by such a long way that we might as well ignore all other crashes until this one is fixed.

For example, in the crashes for Nightly 20160527030220:

https://crash-stats.mozilla.com/search/?product=FennecAndroid&build_id=20160527030220&platform=Android&date=%3E%3D2016-05-27&release_channel=nightly&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

I see 6 separate crash signatures that definitely map to this bug, 2 that definitely don't, and among a sampling of the remaining 42 (which are all just addresses like @0xb6805ce0) the majority look to be this bug too. (You can tell because the crash address is 0x58).

Margaret, I'm needinfo'ing you in case this isn't already on your radar.
Flags: needinfo?(margaret.leibovic)
This is a platform issue, so snorp's team should handle it. It looks like Dylan already has a patch here, hopefully he can make more progress on that and land it soon.
tracking-fennec: --- → ?
Flags: needinfo?(margaret.leibovic) → needinfo?(snorp)
Attached patch Second proposed patch (obsolete) — Splinter Review
This prevents LayerSurfaceView.onLayout from calling LayerView.surfaceChanged when we do not yet have a valid surface.
Attachment #8758816 - Flags: review?(snorp)
Attachment #8758816 - Flags: review?(snorp) → review+
tracking-fennec: ? → 49+
Flags: needinfo?(snorp)
My bug 1277481 has been closed as a duplicate of this. On my HTC 9 this is happening every time I open settings and every time I pause and resume the app. It's basically unusable.
Is this ready to land?
Flags: needinfo?(droeh)
(ni? snorp as well in case droeh is away or something.)
Flags: needinfo?(snorp)
Unfortunately no, snorp was able to reproduce this crash with the latest patch.
Flags: needinfo?(snorp)
Flags: needinfo?(droeh)
Is it still viable to try backing out bug 1136364, which based on comment 0 seems to be the regressing bug?
Yes. If we can't come up with a last minute fix, that's pretty much the plan.
Attached patch Backout patchSplinter Review
This backs out the patch for bug 1136364 and related patches, and should certainly fix the crash.
Attachment #8752301 - Attachment is obsolete: true
Attachment #8758816 - Attachment is obsolete: true
Attachment #8760762 - Flags: review?(snorp)
Attachment #8760762 - Flags: review?(snorp) → review+
Pushed by droeh@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/dee200e969ee
Backs out the patch for Bug 1136364 and related patches. r=snorp
Comment on attachment 8760762 [details] [diff] [review]
Backout patch

Approval Request Comment
[Feature/regressing bug #]: 1136364
[User impact if declined]: Frequent crashes
[Describe test coverage new/current, TreeHerder]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9462e7f60687&selectedJob=22116066
[Risks and why]: Low-risk, this reverts changes that caused a major crash.
[String/UUID change made/needed]: N/A
Attachment #8760762 - Flags: approval-mozilla-aurora?
Thank you for backing out. This was a bad one :(
snorp, want to also do the back out on 49 (now aurora)? The merge was on June 6th so this missed the cut off.
Flags: needinfo?(snorp)
Never mind! I just saw the approval request.
Flags: needinfo?(snorp)
Comment on attachment 8760762 [details] [diff] [review]
Backout patch

Backouts to fix a topcrash. Please uplift to aurora 49.
Attachment #8760762 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Wes can you land this? Just want to make sure as I try to catch up with the backlog from the all hands meeting.
Flags: needinfo?(wkocher)
Presumably we want to confirm that the crash is gone first. I did go ahead and re-open bug 1136364 though.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #41)
> Presumably we want to confirm that the crash is gone first. I did go ahead
> and re-open bug 1136364 though.

At least for me, since the backout landed on nightly, no problem anymore
Yeah, I think we can safely call this resolved. On Aurora/49 we've backed out the changes that were causing the crash, and on Nightly/50 we backed them out and replaced them with updated patches that should fix the crashes (see bug 1136364, bug 1280369, and bug 1280594).
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: