Closed Bug 1799823 Opened 2 years ago Closed 1 year ago

Crash in [@ AsyncShutdownTimeout | IOUtils: waiting for profileBeforeChange IO to complete | JSON store: writing data]

Categories

(Toolkit Graveyard :: OS.File, defect)

All
Windows
defect

Tracking

(firefox-esr102 wontfix, firefox106 unaffected, firefox107 wontfix, firefox108+ verified, firefox109 verified)

VERIFIED FIXED
109 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox106 --- unaffected
firefox107 --- wontfix
firefox108 + verified
firefox109 --- verified

People

(Reporter: bmaris, Assigned: nalexander)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/e3dc9b15-2dd2-44c5-9968-ee93f0221109

MOZ_CRASH Reason: [Parent 12616, Main Thread] ###!!! ABORT: file resource://gre/modules/JSONFile.sys.mjs:124

Top 10 frames of crashing thread:

0  xul.dll  NS_DebugBreak  xpcom/base/nsDebugImpl.cpp:496
1  xul.dll  nsDebugImpl::Abort  xpcom/base/nsDebugImpl.cpp:129
2  xul.dll  XPTC__InvokebyIndex  
3  xul.dll  NS_InvokeByIndex  xpcom/reflect/xptcall/md/win32/xptcinvoke_x86_64.cpp:57
3  xul.dll  CallMethodHelper::Invoke  js/xpconnect/src/XPCWrappedNative.cpp:1626
3  xul.dll  CallMethodHelper::Call  js/xpconnect/src/XPCWrappedNative.cpp:1179
3  xul.dll  XPCWrappedNative::CallMethod  js/xpconnect/src/XPCWrappedNative.cpp:1125
4  xul.dll  XPC_WN_CallMethod  js/xpconnect/src/XPCWrappedNativeJSOps.cpp:965
5  xul.dll  CallJSNative  js/src/vm/Interpreter.cpp:459
5  xul.dll  js::InternalCallOrConstruct  js/src/vm/Interpreter.cpp:547

Found in

  • Firefox 107.0 RC

Affected versions

  • Firefox 107.0 RC
  • Latest Nightly 108.0a1

Unaffected versions

  • Firefox 106.0 RC

Tested platforms

  • Affected platforms: Windows 10

Steps to reproduce:

  1. Open Task Manager
  2. Create a new profile
  3. Add the user.js attached to the profile
  4. Open Firefox
  5. Exit firefox in ~2sec after it started
  6. Take a look in the Task Manager

Actual result

  • The firefox process is still opened inside Task Manager and increasing the memory until it crashes.

Notes:

  • I think this is a Remote Settings issue here.
  • Essentially it crashes when switching from Remote Settings servers from Prod to Stage and I interrupt the process.
  • If I install the extension from here and I wait for the switch/sync process for stage to finish, exiting will not cause a crash anymore.
  • Please change the component if this is not the correct one. I took the component from bug 1782990
Flags: needinfo?(nalexander)

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on release
  • Top 10 desktop browser crashes on nightly

:serg, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(sgalich)
Keywords: topcrash

The bug is marked as tracked for firefox108 (nightly). We have limited time to fix this, the soft freeze is today. However, the bug still isn't assigned and has low severity.

:serg, could you please find an assignee and increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(sgalich)

Hi Nick:
I guess this issue is a duplicate of Bug 1782924? It looks like the crash still exists.

Component: Form Manager → OS.File
Flags: needinfo?(sgalich)

(In reply to Dimi Lee [:dimi][:dlee] from comment #4)

Hi Nick:
I guess this issue is a duplicate of Bug 1782924? It looks like the crash still exists.

It's possible. The JSONFile apparatus is used by multiple features/files; is there any indication that we're seeing issues with targeting.state.json vs. any other consumer? I don't see anything in the crash report, unfortunately. Keeping NI since the action here might be to annotate the crash reports more effectively.

Has STR: --- → yes

:nalexander is there a still a chance to fix this for fx108?

(In reply to Dianna Smith [:diannaS] from comment #6)

:nalexander is there a still a chance to fix this for fx108?

For 108? I doubt it, we don't have a lot of time. I'll try to investigate this now. I think the first order of business is to get the actual JSONFile instance into the crash report in some way, and then to figure out if we what we see on Nightly. Looks like that's Bug 1782990, which I've just attached a patch for. We could uplift to 108 pretty aggressively, I think, and hope to get data on which file is actually causing these crashes; if it is targeting.state.json, it's likely the set of targeting attributes has grown and one of the new attributes is causing issues. We shall see.

Flags: needinfo?(nalexander)
Depends on: 1782990

This fixes a missing await and tries to avoid updating snapshot data
during shutdown.

Neither seem particularly likely to impact shutdown crashes, but we
have no better theory for the underlying cause of these crashes. It's
possible that the amount of data in the targeting snapshot is
sufficiently large that we see sufficiently many sufficiently slow
writes to cause the shutdown mechanism to timeout.

Depends on D162891

Assignee: nobody → nalexander
Status: NEW → ASSIGNED
Pushed by nalexander@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/de83175de08f
Try to avoid shutdown crashes writing targeting snapshot. r=barret,bhearsum

This was backed out due to a weirdness: the GeckoView autofill storage layer isn't really a JSONFile, it just uses the bones to manage serialization. I've worked around that by exposing sanitizedBasename to the constructor. Try build percolating at https://treeherder.mozilla.org/jobs?repo=try&revision=027c20e43762c7de0e2038bc57f440392480b54d.

Flags: needinfo?(nalexander)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch

The patch landed in nightly and beta is affected.
:nalexander, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox108 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(nalexander)

Comment on attachment 9305003 [details]
Bug 1799823 - Try to avoid shutdown crashes writing targeting snapshot. r?barret

Beta/Release Uplift Approval Request

  • User impact if declined: Almost none: this is expected to reduce some crashes.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1782990
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It should reduce a certain type of crash and do little else.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(nalexander)
Attachment #9305003 - Flags: approval-mozilla-beta?

Comment on attachment 9305003 [details]
Bug 1799823 - Try to avoid shutdown crashes writing targeting snapshot. r?barret

Approved for 108.0b9

Attachment #9305003 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Using the steps from comment 0 I'm still getting a crash but with two different signatures:

Every time I test this I get a different crash signature though. Should we call this bug as verified fixed since I'm unable to get this particular crash signature?

Flags: needinfo?(nalexander)

(In reply to Bogdan Maris [:bogdan_maris], Release Desktop QA from comment #17)

Using the steps from comment 0 I'm still getting a crash but with two different signatures:

Every time I test this I get a different crash signature though. Should we call this bug as verified fixed since I'm unable to get this particular crash signature?

Yes, I think so, particularly because Bug 1782990 specifically annotates this crash signature. Could you attach a screen capture of this? My guess is that switching the Remote Settings service endpoint invalidates a bunch of data and starts fetching new data. Something in that long chain is likely not shutdown-aware, causing eventual shutdown hangs. Would you mind filing a new ticket for these crashes so that we can keep this ticket for this specific JSONFile instance blocking shutdown? Thanks!

Flags: needinfo?(nalexander) → needinfo?(bogdan.maris)

This landed (the autoland commit is missing here) and the signature changed to AsyncShutdownTimeout | IOUtils: waiting for profileBeforeChange IO to complete | JSON store: writing data for 'targeting.snapshot' which is the third-most frequent crash for Nightly.

Should this bug be reopened or a new one created?

Crash Signature: [@ AsyncShutdownTimeout | IOUtils: waiting for profileBeforeChange IO to complete | JSON store: writing data] → [@ AsyncShutdownTimeout | IOUtils: waiting for profileBeforeChange IO to complete | JSON store: writing data] [@ AsyncShutdownTimeout | IOUtils: waiting for profileBeforeChange IO to complete | JSON store: writing data for 'targeting.snapshot']
Flags: needinfo?(nalexander)

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #19)

This landed (the autoland commit is missing here) and the signature changed to AsyncShutdownTimeout | IOUtils: waiting for profileBeforeChange IO to complete | JSON store: writing data for 'targeting.snapshot' which is the third-most frequent crash for Nightly.

Should this bug be reopened or a new one created?

New ticket please. This is a pretty general "we're doing lots of things and sometimes we shut down in the middle, which is hard to handle" situation, but we might be able to annotate crashes with the actual task to hand, or we could restrict the set of tasks, sacrificing some future flexibility for current stability.

Flags: needinfo?(nalexander)
Flags: needinfo?(bogdan.maris)

(In reply to Nick Alexander :nalexander [he/him] from comment #18)

(In reply to Bogdan Maris [:bogdan_maris], Release Desktop QA from comment #17)

Using the steps from comment 0 I'm still getting a crash but with two different signatures:

Every time I test this I get a different crash signature though. Should we call this bug as verified fixed since I'm unable to get this particular crash signature?

Yes, I think so, particularly because Bug 1782990 specifically annotates this crash signature. Could you attach a screen capture of this? My guess is that switching the Remote Settings service endpoint invalidates a bunch of data and starts fetching new data. Something in that long chain is likely not shutdown-aware, causing eventual shutdown hangs. Would you mind filing a new ticket for these crashes so that we can keep this ticket for this specific JSONFile instance blocking shutdown? Thanks!

Not sure what screen capture should I make but I did added one with the whole steps in the bug 1804859 I logged for all the other crashes I get following the same steps as here. I feel strange marking this bug as verified fixed though after the comment from Sebastian but since there is a new bug logged for that signature I'll go ahead and change the status.

Status: RESOLVED → VERIFIED
No longer regressions: 1804757
See Also: → 1804757
Product: Toolkit → Toolkit Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: