Closed Bug 1520076 Opened 5 years ago Closed 5 years ago

Hit MOZ_CRASH(Zone not scheduled) at js/src/gc/GC.cpp:7150

Categories

(Core :: JavaScript Engine, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: gkw, Assigned: pbone)

References

Details

(4 keywords, Whiteboard: [jsbugmon:])

Attachments

(3 files, 2 obsolete files)

The following testcase crashes on mozilla-beta revision ee0890efbe44 (build with --enable-debug, run with --fuzzing-safe --no-threads --baseline-eager --no-ion):

See attachment.

Backtrace:

#0 CheckZoneIsScheduled (zone=<optimized out>, reason=2171326683, trigger=<optimized out>) at js/src/gc/GC.cpp:7150
#1 0x000055b780d96708 in js::gc::GCRuntime::budgetIncrementalGC (this=0x7fb3fdd1c680, nonincrementalByAPI=<optimized out>, reason=JS::gcreason::ALLOC_TRIGGER, budget=...) at js/src/gc/GC.cpp:7205
#2 0x000055b780d97650 in js::gc::GCRuntime::gcCycle (this=0x7fb3fdd1c680, nonincrementalByAPI=<optimized out>, budget=..., reason=<optimized out>) at js/src/gc/GC.cpp:7336
#3 0x000055b780d98e71 in js::gc::GCRuntime::collect (this=0x7fb3fdd1c680, nonincrementalByAPI=false, budget=..., reason=JS::gcreason::ALLOC_TRIGGER) at js/src/gc/GC.cpp:7543
#4 0x000055b780d78bbf in js::gc::GCRuntime::startGC (this=0x7fb3fdd1c680, gckind=GC_NORMAL, reason=JS::gcreason::ALLOC_TRIGGER, millis=0) at js/src/gc/GC.cpp:7621
#5 0x000055b780d58a26 in js::gc::GCRuntime::gcIfRequested (this=0x7fb3fdd1c680) at js/src/gc/GC.cpp:7806
#6 0x000055b780965749 in HandleInterrupt (cx=<optimized out>, invokeCallback=<optimized out>) at js/src/vm/Runtime.cpp:408
/snip

For detailed crash information, see attachment.

This seems mozilla-beta only, see bisection result below. Since it involves GC and bisection result also seems s-s and GC-related, setting s-s.

The testcase is hard-to-reduce. Please remember to set the top of the file to your mozilla tree.

autobisectjs shows this is probably related to the following changeset:

The first bad revision is:
changeset: 506177:b1216967cdd0
tag: DEVEDITION_65_0b4_BUILD1
tag: DEVEDITION_65_0b4_RELEASE
tag: FENNEC_65_0b4_BUILD1
tag: FENNEC_65_0b4_RELEASE
tag: FIREFOX_65_0b4_BUILD1
tag: FIREFOX_65_0b4_RELEASE
user: Jon Coppeard
date: Mon Dec 03 17:17:34 2018 -0500
summary: Bug 1510145 - Refactor GC resets and ensure the store buffer is always empty when we start sweeping. r=pbone, a=dveditz

Jon, is bug 1510145 a likely regressor?

Blocks: 1510145
Flags: needinfo?(jcoppeard)
Whiteboard: [jsbugmon:update] → [jsbugmon:]

crash in debug-only check, maybe worst case it's leaking memory?

(In reply to Daniel Veditz [:dveditz] from comment #4)

crash in debug-only check, maybe worst case it's leaking memory?

That's right.

Hi Jon,

I hope you don't mind if I take this, I don't know if you've already started on it. I wanted to start working on it since it's in beta and the train will leave soon.

I agree with Jon, it's not a security bug. I'll let Garry remove the security flags though in case his fuzzing stuff also has secrets.

I'm able to reproduce it.

At the beginning of a GC we check which zones should be collected based on a threshold. The zone in question was below this threshold so it was not scheduled for collection.

A GC callback was registered via builtin testing functions. This callback performs a nursery collection. This was executed next and evicted a lot of memory from the nursery into this zone. The nursery collection that executes before a GC usually executes after the following step.

The GC proper started and the zone was checked for the gcBytes threshold. This time the zone's memory exceeded the threshold. This executed the assertion which found that the zone hadn't been scheduled.

I don't know why it's happening since Bug 1510145, maybe it's just easier to happen.

This bug won't contribute to a memory leak, since the callback to do a minor collection won't normally be registered. IMHO, the only reason to fix this on the beta channel is to unblock further fuzzing.

Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(jcoppeard)
Priority: -- → P1
Priority: P1 → P3

Opening this up after your review that it's benign, is fine.

Group: javascript-core-security
Attachment #9037442 - Flags: review?(jcoppeard)
Comment on attachment 9037442 [details] [diff] [review]
Bug 1520076 - Remove an assertion that isn't always true

Review of attachment 9037442 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for taking this, and thanks for the analysis.

IIRC these assertions are there because there was a bug where we repeatedly triggered a GC for the atoms zone but didn't end up collecting that zone.  Therefore I'd like to keep the assertions if possible.

Can we split AutoScheduleZonesForGC in two and do the scheduling part after calling the GC callbacks in gcCycle() and the unscheduling part before we return in collect()?
Attachment #9037442 - Flags: review?(jcoppeard)

(In reply to Jon Coppeard (:jonco) from comment #9)

Can we split AutoScheduleZonesForGC in two and do the scheduling part after
calling the GC callbacks in gcCycle() and the unscheduling part before we
return in collect()?

I don't see a problem with that. But is there anything fancy that callbacks can/often do that could be a problem? Are callbacks allowed to call non-public APIs?

It also seems odd to me that we schedule zone at the beginning of each slice, rather than at the first slice and then collect the same set of zones until we're done.

My main concern is how zone scheduling interacts with the loop in collect() and GCs that are aborted or need to repeat. I have an idea but I'll be easier to explain with a patch, but it maybe there's no problem in which case the patch will be more complex than it needs to be.

Attachment #9037442 - Attachment is obsolete: true
Attachment #9037873 - Flags: review?(jcoppeard)

(In reply to Paul Bone [:pbone] from comment #10)

But is there anything fancy that callbacks can/often do that could be a problem? Are callbacks allowed to call non-public APIs?

Hopefully not but I guess callbacks will do whatever they need to do. This is the main use of the callback at the moment:

https://searchfox.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp#1368

I don't think this ends up doing anything too controversial.

It also seems odd to me that we schedule zone at the beginning of each slice, rather than at the first slice and then collect the same set of zones until we're done.

The point of this is that we don't want a long-running incremental GC to block GC of other zones if they pass their their trigger thresholds. What should happen is that this schedules an extra zone and when we notice that we reset the original GC and start a new one including the new zone. This should be rare in practice, hopefully.

Comment on attachment 9037873 [details] [diff] [review]
Bug 1520076 - Delay scheduling in AutoScheduleZonesForGC

Review of attachment 9037873 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/GC.cpp
@@ +6833,5 @@
> +  void scheduleZones() {
> +    // XXX: We could remove this to allow each gcCycle to schedule new zones.
> +    if (scheduled_) {
> +      return;
> +    }

I think we should do this every time.

@@ +7337,4 @@
>    // Note that GC callbacks are allowed to re-enter GC.
>    AutoCallGCCallbacks callCallbacks(*this);
>  
> +  asz.scheduleZones();

Rather than passing the Auto object, can we split it into two functions and call them in the appropriate places?  Maybe we can use mozilla::ScopeExit to call the unscheduling part.

RAII Auto classes work really well in a lot of situations, but it requires the setup and teardown actions to be in some way symmetrical, and i don't think they really are here.
Attachment #9037873 - Flags: review?(jcoppeard) → feedback+

(In reply to Jon Coppeard (:jonco) from comment #12)

(In reply to Paul Bone [:pbone] from comment #10)

But is there anything fancy that callbacks can/often do that could be a problem? Are callbacks allowed to call non-public APIs?

Hopefully not but I guess callbacks will do whatever they need to do. This is the main use of the callback at the moment:

https://searchfox.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp#1368

I don't think this ends up doing anything too controversial.

Looks like it just does cycle collector things and wont bother us.

It also seems odd to me that we schedule zone at the beginning of each slice, rather than at the first slice and then collect the same set of zones until we're done.

The point of this is that we don't want a long-running incremental GC to block GC of other zones if they pass their their trigger thresholds. What should happen is that this schedules an extra zone and when we notice that we reset the original GC and start a new one including the new zone. This should be rare in practice, hopefully.

Yeah, it makes sense that we'd want to collect the new zone also. Would it be possible to somehow add the collection of the new zone? probably in particular phases of the collection only to avoid resetting a GC.

(In reply to Jon Coppeard (:jonco) from comment #13)

Comment on attachment 9037873 [details] [diff] [review]
Bug 1520076 - Delay scheduling in AutoScheduleZonesForGC

Review of attachment 9037873 [details] [diff] [review]:

::: js/src/gc/GC.cpp
@@ +6833,5 @@

  • void scheduleZones() {
  • // XXX: We could remove this to allow each gcCycle to schedule new zones.
  • if (scheduled_) {
  •  return;
    
  • }

I think we should do this every time.

Okay.

@@ +7337,4 @@

// Note that GC callbacks are allowed to re-enter GC.
AutoCallGCCallbacks callCallbacks(*this);

  • asz.scheduleZones();

Rather than passing the Auto object, can we split it into two functions and
call them in the appropriate places? Maybe we can use mozilla::ScopeExit to
call the unscheduling part.

RAII Auto classes work really well in a lot of situations, but it requires
the setup and teardown actions to be in some way symmetrical, and i don't
think they really are here.

I'll remove it, I wasn't sure about it anyway but hoped to keep the patch minimal. Turns out it was a smaller change to remove the RAII class.

Attachment #9037873 - Attachment is obsolete: true
Attachment #9038136 - Flags: review?(jcoppeard)
Comment on attachment 9038136 [details] [diff] [review]
Bug 1520076 - Schedule zones for GC after running callbacks

Review of attachment 9038136 [details] [diff] [review]:
-----------------------------------------------------------------

Great, cheers for the updates.

::: js/src/gc/GC.cpp
@@ +7226,4 @@
>    return IncrementalResult::Ok;
>  }
>  
> +static void scheduleZones(GCRuntime* gc) {

nit: The style is to have static function names start with a capital letter, here and below.
Attachment #9038136 - Flags: review?(jcoppeard) → review+

(In reply to Paul Bone [:pbone] from comment #14)

Would it be possible to somehow add the collection of the new zone? probably in particular phases of the collection only to avoid resetting a GC.

Maybe, if we were still marking. I'm not sure what this does to the snapshot at the beginning invariant though if we start marking some things at different times. It might still be OK. It would make things more complicated though and as long as these resets are rare (and telemetry indicates they are) I'd like to avoid it.

(In reply to Jon Coppeard (:jonco) from comment #18)

(In reply to Paul Bone [:pbone] from comment #14)

Would it be possible to somehow add the collection of the new zone? probably in particular phases of the collection only to avoid resetting a GC.

Maybe, if we were still marking. I'm not sure what this does to the snapshot at the beginning invariant though if we start marking some things at different times. It might still be OK. It would make things more complicated though and as long as these resets are rare (and telemetry indicates they are) I'd like to avoid it.

Good point, if it was easy I'd like to avoid it. But I don't know enough about how inter-zone references are managed to know if this is still snapshot at the beginning okay. Not a priority now though..

Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb5546692ed6
Schedule zones for GC after running callbacks r=jonco
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: