Closed Bug 1140978 Opened 9 years ago Closed 9 years ago

b2g crash at IPC::Message::EnsureFileDescriptorSet()

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 1133865
Tracking Status
b2g-master --- affected

People

(Reporter: lixia, Unassigned)

References

Details

(Keywords: qablocker, regression, Whiteboard: [3.0-nexus-5-l] )

Attachments

(11 files, 1 obsolete file)

[1.Description]:
[Flame v3.0][Nexus 5 v3.0][First Time Experience]Sometimes,the prompt "Something just crashed" will pop up when you open some apps and do some random operations after FTU.
Found time:14:45.
Attch:crashed.MP4,something_just_crashed.png and logcat_1445.txt.

Title:
B2G 39.0a1 Crash Report [@ IPC::Message::EnsureFileDescriptorSet() ]
Crash report:
https://crash-stats.mozilla.com/report/index/f8c4b6ae-b9ff-43ad-872c-768e52150309

[2.Testing Steps]: 
1.Flash build (20150308160204).
2.Skip FTU.
3.Launch some apps on homescreen and do some random operations,such as importing from SD card and exporting contact via Bluetooth in Contacts,or ending call in Phone,or connecting wifi in Settings, or closing the app in card view. (No precise steps)

[3.Expected Result]: 
3.The device should not crash.

[4.Actual Result]: 
3.Sometimes,the prompt "Something just crashed" will pop up.

[5.Reproduction build]: 
Flame 3.0 build:
Build ID               20150308160204
Gaia Revision          fea83511df9ccba64259346bc02ebf2c417a12c2
Gaia Date              2015-03-08 06:36:28
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/eab4a81e4457
Gecko Version          39.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150308.192120
Firmware Date          Sun Mar  8 19:21:31 EDT 2015
Bootloader             L1TC000118D0

N5 3.0:
Build ID               20150308160204
Gaia Revision          fea83511df9ccba64259346bc02ebf2c417a12c2
Gaia Date              2015-03-08 06:36:28
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/eab4a81e4457
Gecko Version          39.0a1
Device Name            hammerhead
Firmware(Release)      5.0
Firmware(Incremental)  eng.cltbld.20150308.192431
Firmware Date          Sun Mar  8 19:24:47 EDT 2015
Bootloader             HHZ12d

[6.Reproduction Frequency]: 
Seldom Recurrence,5/30

[7.TCID]: 
Free Test
Attached file logcat_1445.txt
Attached video crashed.MP4
This issue is also occurring on the latest Flame 3.0 nightly

Crash report pages, and banners will appear within multiple apps intermittently (Gallery, Dialer, FTU, etc.)

Environmental Variables:
Device: Flame 3.0 (319mb)(Kitkat)(Full Flash)
Build ID: 20150309010232
Gaia: fea83511df9ccba64259346bc02ebf2c417a12c2
Gecko: eab4a81e4457
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 39.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Keywords: regression
[Blocking Requested - why for this release]:
Functional regression across multiple apps that fail smoke tests.

Requesting a window.
blocking-b2g: --- → 3.0?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
QA Contact: dharris
See Also: → 1123762
I see lots of crash-reports in the logcat, but none are submitted to crash stats. Can we please submit those next time this occurs? Thanks!
Attached file Crash 3
I can reproduce pretty reliably only after a clean flash. STR:

1) Do a full flash
2) go to settings, wifi
3) add hidden network
4) Something crashes.

Gecko git - 6e5e93073a9e11ff531570b3571f1136fde04255
Gaia - 1279c9ca5a489aa7fc9a7a173ba1dbae0f71b8f2
QA Contact: jmercado
This is a link to the crash encountered when reproducing this issue after speaking with Derek.

https://crash-stats.mozilla.com/report/index/64ea84aa-f1be-47a1-bdef-6ccc72150309

Proceeding to find a window now.
Can still reproduce comment 9 w/ bug 1123762 disabled. 

Might be able to pin it down by disabling gfx.vsync.refreshdriver to false.
(In reply to Jayme Mercado [:JMercado] from comment #10)
> This is a link to the crash encountered when reproducing this issue after
> speaking with Derek.
> 
> https://crash-stats.mozilla.com/report/index/64ea84aa-f1be-47a1-bdef-
> 6ccc72150309
> 
> Proceeding to find a window now.

Hmm, is there anyway to get symbols?
From comment 9, I can still reproduce this from Fridays' build before bug 1123762 landed. Gecko 7f9a12e9199f37ce2a708dd19f71abcf38ce4668.
Do you also have the link to the crash-stats site?  Just want to double check that it is not same as the bug 1137653, since it has a similiar STR.
oops, didn't see comment 10. sorry.
Flags: needinfo?(nhirata.bugzilla)
Mason's steps do not work for me but using automation scripts the way that Derek did and I found this as the central window.  We're going deeper now into the inbounds to verify.  Is it possible that there are multiple issues here?

Central Regression Window:

Last Working 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150307032729
Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d
Gecko: d9b06c673f80
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

First Broken 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150307191929
Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411
Gecko: ae68dca2cda6
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

Last Working gaia / First Broken gecko - Issue DOES occur
Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d
Gecko: ae68dca2cda6

First Broken gaia / Last Working gecko - 
Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411
Gecko: d9b06c673f80

Gaia Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7d85ac833cff&tochange=ae68dca2cda6
(In reply to Jayme Mercado [:JMercado] from comment #16)
> Mason's steps do not work for me but using automation scripts the way that
> Derek did and I found this as the central window.  We're going deeper now
> into the inbounds to verify.  Is it possible that there are multiple issues
> here?
> 
If you use automation scripts, then you will definitely hit bug 1137653, which has been happening for more than a week.  As nhirata says, I think this is a separate issue though.
(In reply to Jayme Mercado [:JMercado] from comment #16)
> Mason's steps do not work for me but using automation scripts the way that
> Derek did and I found this as the central window.  We're going deeper now
> into the inbounds to verify.  Is it possible that there are multiple issues
> here?
> 
> Central Regression Window:
> 
> Last Working 
> Environmental Variables:
> Device: Flame 3.0
> BuildID: 20150307032729
> Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d
> Gecko: d9b06c673f80
> Gonk: e7c90613521145db090dd24147afd5ceb5703190
> Version: 39.0a1 (3.0) 
> Firmware Version: v18D-1
> User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
> 
> First Broken 
> Environmental Variables:
> Device: Flame 3.0
> BuildID: 20150307191929
> Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411
> Gecko: ae68dca2cda6
> Gonk: e7c90613521145db090dd24147afd5ceb5703190
> Version: 39.0a1 (3.0) 
> Firmware Version: v18D-1
> User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
> 
> Last Working gaia / First Broken gecko - Issue DOES occur
> Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d
> Gecko: ae68dca2cda6
> 
> First Broken gaia / Last Working gecko - 
> Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411
> Gecko: d9b06c673f80
> 
> Gaia Pushlog:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=7d85ac833cff&tochange=ae68dca2cda6

I probably should have specified that after "Add Hidden Network", comes up, start typing in a network. When the keyboard comes up, something crashes.
Attached file omni.ja (obsolete) —
new omni.ja file:

1. download from attachments
2. adb remount
3. adb push omni.ja /system/b2g/omni.ja
4. adb reboot
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(nhirata.bugzilla)
A lot of our automated UI tests crashed out today, this seems like the probable cause. Marking this qablocker.
Keywords: qablocker
Attached file omni.ja
Attachment #8574822 - Attachment is obsolete: true
Flags: needinfo?(nhirata.bugzilla)
We tested this with both automation and manual testing and received the same window.  Bug 1123762 seems the likely cause for this issue.

B2g-inbound Regression Window

Last Working 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150306132631
Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66
Gecko: 716b424d27c0
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

First Broken 
Environmental Variables:
Device: Flame 3.0
BuildID: 20150306134530
Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66
Gecko: afd91b997c2e
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

Last Working gaia / First Broken gecko - Issue DOES occur
Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66
Gecko: afd91b997c2e

First Broken gaia / Last Working gecko - Issue does NOT occur
Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66
Gecko: 716b424d27c0

Gecko Pushlog: http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=716b424d27c0&tochange=afd91b997c2e
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Mason, can we get the landing for bug 1123762 backed out? It appears to be causing this crash.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(mchang)
Sure, it looks like it. Can you please provide how you're testing via automation and locally? I've been unable to reproduce anything. Thanks!
Flags: needinfo?(mchang) → needinfo?(ktucker)
See Also: → 1139090
Backout of 1123762 is in b2g-inbound.
The testing was the test_browser_navigation.py automation script for the automation and the manual was adding wifi networks in settings, using the browser and camera etc.
Flags: needinfo?(ktucker)
Attached file crash_dump.dmp
Dump requested by Mason.
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
The crash reason from comment 0 is SIGSEGV when we de-reference the 0x5a5a5a6e address. It seems we access this message object's member "file_descriptor_set_"[1] after the message object freed. B2G will poison the memory with 0x5a5a5a5a after free. And we check the offset of "file_descriptor_set_" is just 0x14.

And we create the message in ipdl generated code. We delete the message only when the channel is closed or it is already processed[2]. Both the creation and destruction are not controlled by user. So it's weird that we have this use-after-free problem.

[1]
https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/ipc/chromium/src/chrome/common/ipc_message.cc#l161
[2]
https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/ipc/chromium/src/chrome/common/ipc_channel_posix.cc#l762
I try to use this patch to check use-after-free problem, but it can't be reproduced with my flame device.
I will discuss with Mason for the STR.
There will be a large number of log. And I think the logcat buffer is not enough. So, I redirect the logcat log into a file.
I can't make sure we call ChooseTimer()[1] before nuwa cloning. This patch create the timer in RefreshDriver constructor. It will be called before nuwa cloning.
I think the original one is still available. We still create the vsync-base timer at [2] when we call ChooseTimer() at the first time.

[1]
https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/layout/base/nsRefreshDriver.cpp#l982
[2]
https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/layout/base/nsRefreshDriver.cpp#l879
This is clearly not a First Time Experience / Gaia issue - although that is where we'll first encounter the bug. Can we get this moved to the correct component?
Flags: needinfo?(hshih)
We have the same crash signature as bug 1133865
Status: NEW → RESOLVED
Closed: 9 years ago
Component: Gaia::System → Graphics
Flags: needinfo?(hshih)
Product: Firefox OS → Core
Resolution: --- → DUPLICATE
Summary: [Flame][First Time Experience]Sometimes,the prompt "Something just crashed" will pop up. → b2g crash at IPC::Message::EnsureFileDescriptorSet()
Duplicate of another smoketest blocker.
blocking-b2g: 2.5? → ---
Keywords: smoketest
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: