Closed Bug 1406041 Opened 7 years ago Closed 7 years ago

Crash in js::wasm::Code::tiers when profiling Tank's demo

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla58

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox56

---

wontfix

firefox57

---

wontfix

firefox58

---

fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 5 obsolete files)

wip.patch 7 years ago Benjamin Bouvier [:bbouvier] (inactive) 30.61 KB, patch		Details \| Diff \| Splinter Review
wip-readonly.patch 7 years ago Benjamin Bouvier [:bbouvier] (inactive) 28.87 KB, patch		Details \| Diff \| Splinter Review
wip-unlockedstate.patch 7 years ago Benjamin Bouvier [:bbouvier] (inactive) 30.46 KB, patch		Details \| Diff \| Splinter Review
process.patch 7 years ago Benjamin Bouvier [:bbouvier] (inactive) 30.39 KB, patch	luke : feedback+	Details \| Diff \| Splinter Review
processwide.patch 7 years ago Benjamin Bouvier [:bbouvier] (inactive) 32.29 KB, patch	luke : review+	Details \| Diff \| Splinter Review
fastpath.patch 7 years ago Benjamin Bouvier [:bbouvier] (inactive) 4.38 KB, patch	luke : review+	Details \| Diff \| Splinter Review
fastpath.patch 7 years ago Benjamin Bouvier [:bbouvier] (inactive) 7.52 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Description

•

7 years ago

This bug was filed from the Socorro interface and is 
report bp-b0a102ab-32dc-4522-8074-0bdb20171005.
=============================================================

A wasm::Code isn't found in wasm::ProfilingFrameIterator::initFromExitFP. Needs to be investigated a bit.

Luke Wagner [:luke]

Comment 1

•

7 years ago

It'd be good to see if it repros before bug 1360211 (which changed how we do lookupCode()).

Luke Wagner [:luke]

Comment 2

•

7 years ago

Oh, I see you already noted that in 'Depends on' :)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 3

•

7 years ago

Attached patch wip.patch (obsolete) — Details — Splinter Review

WIP patch, with a process-wide (sometimes loosy) lookup.

The only issue is that if we fail the lookup in RedirectJitCodeToInterruptCheck, we never get back to it, which can happen when we timeout. So all the timeout test cases now fail.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 4

•

7 years ago

Attached patch wip-readonly.patch (obsolete) — Details — Splinter Review

An experiment with having two CodeSegment vectors swapped to do a lock-free lookup, but while it seems to work locally, I don't think it solves the problem in all the cases.

Luke Wagner [:luke]

Comment 5

•

7 years ago

For comment 3, why is the lock being held when RedirectJitCodeToInterruptCheck() happens?  It seems like either (1) we'd be in JIT code with the lock not held, (2) we'd be in C++ holding the lock, but also checking cx->interrupt at some point.

Luke Wagner [:luke]

Comment 6

•

7 years ago

I guess my question in comment 5 is more: why does this happen *frequently*; it makes sense that, with low probability, tier-2 would complete at the same time as the interrupt.

So the lock-free swapping seems good, but the worry is that the cost of adding a CodeSegment is O(n).  I had an idea that seems to be amortized O(1) though: you maintain two Vectors that, in the steady state, have the same contents.  When updating, you first lock one vector, insert the CodeSegment, then unlock it, then lock the other, insert the CodeSegment, then unlock.  At any time, one Vector is coherent and unlocked so LookupCodeSegment() can always succeed and update is amortized O(2) ;)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 7

•

7 years ago

(In reply to Luke Wagner [:luke] from comment #6)
> I guess my question in comment 5 is more: why does this happen *frequently*;
> it makes sense that, with low probability, tier-2 would complete at the same
> time as the interrupt.

Same reason as why we introduced the code segment lookup in the first place: in the ARM simulator, every single memory access goes through code segments lookup to ask if we're in wasm or not. So when running in the ARM sim, we spend most of our time under code lookup.

> So the lock-free swapping seems good, but the worry is that the cost of
> adding a CodeSegment is O(n).  I had an idea that seems to be amortized O(1)
> though: you maintain two Vectors that, in the steady state, have the same
> contents.  When updating, you first lock one vector, insert the CodeSegment,
> then unlock it, then lock the other, insert the CodeSegment, then unlock. 
> At any time, one Vector is coherent and unlocked so LookupCodeSegment() can
> always succeed and update is amortized O(2) ;)

Interesting, this sounds almost like the patch implemented in comment 4 (without proper attention paid to atomic swapping). The major issue I had with that attempt was regarding removals: when we start removing a CodeSegment, and until we removed the CodeSegment in one of the two vectors, a soon-to-be-stall value can be returned during lookup. If the code segment vectors could be append only, that'd solve it, but that would be a bad memory leak. I think the solution you describe also has this same issue, right?

Luke Wagner [:luke]

Comment 8

•

7 years ago

CodeSegments can be removed at any time (regardless of this issue); the caller has to know that the pc they are looking up is live on a paused stack (which is true for interrupt, profiling) so that the CodeSegment won't go away.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 9

•

7 years ago

Attached patch wip-unlockedstate.patch (obsolete) — Details — Splinter Review

A snapshot patch implementing the idea suggested in comment 6, but there's a catch: there can't be a good ordering of the two vector mutexes (since they can be taken in any order, and both at the same time (when an insertion is interrupted and a lookup happens)).

One idea that came was that each mutex have an optional owning group (just an unsigned id is enough), that would allow the ordering checks to pass if two mutexes were part of the same group.

That being said, I think the code swapping implemented in the previous patch has the same idea, O(2) behavior too and it doesn't break the mutex invariant.

Assignee: nobody → bbouvier

Status: NEW → ASSIGNED

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 10

•

7 years ago

Attached patch process.patch (obsolete) — Details — Splinter Review

Not entirely finished (OOM handling is incorrect), but having a first look would help. Thanks!

Attachment #8916075 - Attachment is obsolete: true

Attachment #8916077 - Attachment is obsolete: true

Attachment #8916707 - Flags: feedback?(luke)

Luke Wagner [:luke]

Comment 11

•

7 years ago

Comment on attachment 8916707 [details] [diff] [review]
process.patch

Review of attachment 8916707 [details] [diff] [review]:
-----------------------------------------------------------------

Generally, looks great!  Some nits and I think a few technical data races.

::: js/src/wasm/WasmCode.cpp
@@ +293,5 @@
>  }
>  
> +CodeSegment::~CodeSegment()
> +{
> +    UnregisterCodeSegment(this);

Because initialization can fail before getting to RegisterCodeSegment(), I think this call should be conditionalized on something; I'm thinking a new bool 'initialized_' set right after RegisterCodeSegment().

::: js/src/wasm/WasmCode.h
@@ +105,5 @@
>          outOfBoundsCode_(nullptr),
>          unalignedAccessCode_(nullptr)
>      {}
>  
> +    virtual ~CodeSegment();

CodeSegment doesn't have any derived classes nor intended to be polymorphic so can you make it non-virtual?

::: js/src/wasm/WasmFrameIter.cpp
@@ +1121,5 @@
>      JSContext* cx = TlsContext.get();
>      if (!cx)
>          return false;
>  
>      MOZ_RELEASE_ASSERT(!cx->handlingSegFault);

This function is not inherently tied to handling seg faults or TlsContext; those were just details of the time, so can you just remove the above lines so that the function is more true to its name?

::: js/src/wasm/WasmFrameIter.h
@@ +213,5 @@
>  LookupFaultingInstance(const CodeSegment& codeSegment, void* pc, void* fp);
>  
> +// Return whether the given PC is in wasm code. In loosy mode, the code segment
> +// lookup will fail if the process-wide there is already taken
> +// (see WasmProcess.h).

I think this comment can return to the original form now.

::: js/src/wasm/WasmInstance.h
@@ +21,5 @@
>  
>  #include "gc/Barrier.h"
>  #include "wasm/WasmCode.h"
>  #include "wasm/WasmDebug.h"
> +#include "wasm/WasmProcess.h"

nit: I think maybe this #include is not necessary anymore?

::: js/src/wasm/WasmProcess.cpp
@@ +19,5 @@
> +#include "wasm/WasmProcess.h"
> +
> +#include "mozilla/BinarySearch.h"
> +
> +#include "threading/ExclusiveData.h"

nit: I think ExclusiveData is no longer needed

@@ +41,5 @@
> +class ProcessCodeSegmentMap
> +{
> +    // Since insertions can happen on a background thread while another thread
> +    // lookups up segments, we need a lock.
> +    Mutex mutex_;

nit: I think the real need for the lock here is two concurrent mutators; the reader doesn't take the lock.  I'd also rename mutatorsMutex_.

@@ +43,5 @@
> +    // Since insertions can happen on a background thread while another thread
> +    // lookups up segments, we need a lock.
> +    Mutex mutex_;
> +
> +    Atomic<unsigned, mozilla::Relaxed> observers_;

I think Relaxed is actually too weak here: we want observing observers_==0 to imply writes after don't flow before or that reads before observes_==0 don't happen after.  Since this isn't ultra perf-critical, let's just go with the default of seqcst.

Also nit: let's do size_t so we don't even have to think about overflow on 64-bit

@@ +120,5 @@
> +        MOZ_ASSERT(index == otherIndex);
> +#endif
> +
> +        if (!codeSegments_->insert(codeSegments_->begin() + index, cs)) {
> +            // Wait until observers are done using the read-only vector.

It'd be good to expand the comment explaining why there could be dangling readers and why observers_==0 is the only way to flush them out.  Also good to explain why we're spin-locking here and why it is OK.

@@ +122,5 @@
> +
> +        if (!codeSegments_->insert(codeSegments_->begin() + index, cs)) {
> +            // Wait until observers are done using the read-only vector.
> +            while (observers_) {}
> +            Swap(readonlyCodeSegments_, codeSegments_);

Any field read by lookup() needs to be Atomic since otherwise it is by definition a data race and undefined, so in this case both pointers.  With Atomic ops, you want to be really clear about what the precise loads and stores are, and in what order, so I think we shouldn't use utility like Swap() here (e.g., what if it used the XOR swap trick (https://en.wikipedia.org/wiki/XOR_swap_algorithm)) or otherwise left codeSegments_ in a temporarily-invalid state for lookup() to observe?

@@ +147,5 @@
> +            return;
> +
> +        codeSegments_->erase(codeSegments_->begin() + index);
> +
> +        // Wait until observers are done using the read-only vector.

Maybe reference comment in insert(), as it should be beefy and you don't want to repeat it.

@@ +161,5 @@
> +
> +        codeSegments_->erase(codeSegments_->begin() + index);
> +    }
> +
> +    const CodeSegment* lookupCodeSegment(const void* pc) {

How about naming lookup()?

@@ +163,5 @@
> +    }
> +
> +    const CodeSegment* lookupCodeSegment(const void* pc) {
> +        if (!initialized_)
> +            return nullptr;

I think this unsynchronized use of initialized_ is a race.  Rather than making it Atomic, let's kill initialized_ altogether by:
 - storing the two CodeSegmentVectors by value (no pointer)
 - having the two (Atomic) fields point at these Vectors

@@ +168,5 @@
> +
> +        auto decObserver = mozilla::MakeScopeExit([&] {
> +            observers_--;
> +        });
> +        observers_++;

It'd be good to explain our motivation/constraints with the ref-counting here instead of taking a lock.  Also good to explain:
 1. why it's ok to return a bare CodeSegment* (our assumption about pc being live code on some stack keeping the CodeSegment alive)
 2. why it's ok if we end up looking up in a "stale" CodeSegmentVector (pc is assumed to be executing and thus the CodeSegment would have to have already been registered)

::: js/src/wasm/WasmProcess.h
@@ +29,5 @@
> +
> +const CodeSegment*
> +LookupCodeSegment(const void* pc);
> +const Code*
> +LookupCode(const void* pc);

nit: could you put a \n between these two declarations (and below)

Attachment #8916707 - Flags: feedback?(luke) → feedback+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 13

•

7 years ago

Attached patch processwide.patch — Details — Splinter Review

Thanks! I've written what I think are explicit comments, and local testing works fine (even by always enabling tiering in C++).

Attachment #8916641 - Attachment is obsolete: true

Attachment #8916707 - Attachment is obsolete: true

Attachment #8917048 - Flags: review?(luke)

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

Priority: -- → P3

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 14

•

7 years ago

(I did a full debug+optimized browser build and I could run the Tanks demo with profiling on without any issues)

Luke Wagner [:luke]

Comment 15

•

7 years ago

Comment on attachment 8917048 [details] [diff] [review]
processwide.patch

Review of attachment 8917048 [details] [diff] [review]:
-----------------------------------------------------------------

Nice comments and nice work!  Just some nits:

::: js/src/wasm/WasmCode.cpp
@@ +291,3 @@
>      SendCodeRangesToProfiler(*this, bytecode.bytes, metadata);
>  
> +    initialized_ = true;

nit: how about naming 'registered_' and putting right after RegisterCodeSegment() so it's unambiguous this bool needs to be set iff RegisterCodeSegment() succeeded (preventing a future refactoring from adding an 'if (!...) return false' between RegisterCodeSegment() and the setting of the bool).

::: js/src/wasm/WasmProcess.cpp
@@ +52,5 @@
> +    // thread running wasm might need to know to which CodeSegment the
> +    // current PC belongs, during a call to lookup(). A lookup is a
> +    // read-only operation, and we don't want to take a lock then
> +    // (otherwise, we could have a deadlock situation if an async lookup
> +    // happened on a given thread that was writing before getting

nit: s/writing before/holding mutatorsMutex_ while/

@@ +59,5 @@
> +    // there are any observers (i.e. calls to lookup()) of the atomic data.
> +
> +    Atomic<size_t, mozilla::SequentiallyConsistent> observers_;
> +
> +    CodeSegmentVector* mutableCodeSegments_;

I think the key inductive invariant that holds everything together is: Except during swapAndWait(), there are no listen() observers of the vector pointed to by mutableCodeSegments_.  The correctness of everything else assumes and preserves this, so could you comment that here?

@@ +60,5 @@
> +
> +    Atomic<size_t, mozilla::SequentiallyConsistent> observers_;
> +
> +    CodeSegmentVector* mutableCodeSegments_;
> +    Atomic<CodeSegmentVector*, mozilla::SequentiallyConsistent> readonlyCodeSegments_;

nit: SequentiallyConsistent is the default (and used as default in many places) so I'd just rely on the default. 
nit: Also, I know it's a type error when swapping, but could you make readonlyCodeSegments_ a const* and use a const_cast in swapAndWait()?  This seems to match the above invariant well.

@@ +76,5 @@
> +        }
> +    };
> +
> +    void swapAndWait() {
> +        // Both vectors are correct for look up at this point, although their

nit: s/correct/coherent/

@@ +79,5 @@
> +    void swapAndWait() {
> +        // Both vectors are correct for look up at this point, although their
> +        // contents are different: there is no way for the looked up PC to be
> +        // in the code segment that is getting registered, because the code is
> +        // not even attached to an instance yet.

nit: s/the code is not even attached to an instance yet/the code segment is not even fully created yet/ (and below)

@@ +123,5 @@
> +        LockGuard<Mutex> lock(mutatorsMutex_);
> +
> +        size_t index;
> +        BinarySearchIf(*mutableCodeSegments_, 0, mutableCodeSegments_->length(),
> +                       CodeSegmentPC(cs->base()), &index);

nit: could this and below be MOZ_ALWAYS_FALSE (symmetric with remove())?

@@ +138,5 @@
> +        MOZ_ASSERT(index == otherIndex);
> +#endif
> +
> +        // Although we could simply revert the insertion in the read-only
> +        // vector, it is simpler to just crash.

Could you add "and given that each CodeSegment consumes multiple pages, it is unlikely this insert() would OOM in practice"?

@@ +152,5 @@
> +
> +        size_t index;
> +        MOZ_ALWAYS_TRUE(BinarySearchIf(*mutableCodeSegments_, 0, mutableCodeSegments_->length(),
> +                                       CodeSegmentPC(cs->base()), &index));
> +        mutableCodeSegments_->erase(mutableCodeSegments_->begin() + index);

nit: could you put a \n before erase()?

@@ +175,5 @@
> +
> +        size_t index;
> +
> +        // Once atomically-read, the readonly vector is always a valid one (see
> +        // swapAndWait()).

nit: Well, not "always", rather just "while observers_ has been incremented".

@@ +177,5 @@
> +
> +        // Once atomically-read, the readonly vector is always a valid one (see
> +        // swapAndWait()).
> +
> +        CodeSegmentVector* readonly = readonlyCodeSegments_;

nits: could you:
 - make it a const CodeSegmentVector?
 - take out the \n between comment and 'readonly' and instead have a \n after 'readonly'/
 - move 'size_t index' to the line before BinarySearchIf()?

::: js/src/wasm/WasmProcess.h
@@ +25,5 @@
> +class CodeSegment;
> +class Code;
> +
> +// These methods return the wasm::CodeSegment (resp. wasm::Code) containing
> +// the given pc, if any exist in the process. These methods take a lock.

I think you mean "do _not_ take a lock", and could you then add "and thus are safe to use in a profiling or async interrupt context".

@@ +38,5 @@
> +// via pc in the methods described above.
> +
> +bool
> +RegisterCodeSegment(const CodeSegment* cs);
> +void

nit: \n between these two

Attachment #8917048 - Flags: review?(luke) → review+

Pulsebot

Comment 17

•

7 years ago

Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab5dcb5b8f1f
Implement process-wide wasm code lookup; r=luke

Pulsebot

Comment 18

•

7 years ago

Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/257ce836b882
Make testTypedArrayInit temporarily faster to run in ARM sim to fix bustage on a CLOSED TREE; rs=luke

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 19

•

7 years ago

Attached patch fastpath.patch (obsolete) — Details — Splinter Review

Attachment #8917451 - Flags: review?(luke)

Luke Wagner [:luke]

Comment 20

•

7 years ago

Comment on attachment 8917451 [details] [diff] [review]
fastpath.patch

Review of attachment 8917451 [details] [diff] [review]:
-----------------------------------------------------------------

Might be worth being able to do this all from an inline fast path that avoids even the function call overhead: have an namespace wasm { extern Atomic<bool> CodeExists; } in WasmProcess.h that Simulator-arm.cpp probes before even calling handleWasmFault()!  This is wartier, so perhaps you could measure if it significantly changes the runtime of testTypedArrayInit.js.

Attachment #8917451 - Flags: review?(luke) → review+

Ryan VanderMeulen [:RyanVM]

Comment 21

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ab5dcb5b8f1f
https://hg.mozilla.org/mozilla-central/rev/257ce836b882

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox58: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Ryan VanderMeulen [:RyanVM]

Comment 22

•

7 years ago

Do we want to nominate this for 57 uplift?

status-firefox56: --- → wontfix

status-firefox57: --- → affected

status-firefox-esr52: --- → unaffected

Flags: needinfo?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 23

•

7 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #22)
> Do we want to nominate this for 57 uplift?

No need, this particular crash reason is because of the refactorings made in bug 1360211 which landed on 58. Crashes with this signature we're observing on 57 may be caused by bug 1406897. Thanks for asking.

Flags: needinfo?(bbouvier)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 24

•

7 years ago

(In reply to Luke Wagner [:luke] from comment #20)
> Comment on attachment 8917451 [details] [diff] [review]
> fastpath.patch
> 
> Review of attachment 8917451 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Might be worth being able to do this all from an inline fast path that
> avoids even the function call overhead: have an namespace wasm { extern
> Atomic<bool> CodeExists; } in WasmProcess.h that Simulator-arm.cpp probes
> before even calling handleWasmFault()!  This is wartier, so perhaps you
> could measure if it significantly changes the runtime of
> testTypedArrayInit.js.

Wow, with this I could divide by two the runtime of this particular test, by putting the fast path check at the top of handleWasmFault. Uploading the updated patch and assuming carrying forward r+; try is still running a few times all the ARM tests.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 25

•

7 years ago

Attached patch fastpath.patch — Details — Splinter Review

Attachment #8917451 - Attachment is obsolete: true

Attachment #8917810 - Flags: review+

Luke Wagner [:luke]

Comment 26

•

7 years ago

Comment on attachment 8917810 [details] [diff] [review]
fastpath.patch

Review of attachment 8917810 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!

::: js/src/wasm/WasmProcess.cpp
@@ +135,5 @@
>  
>          if (!mutableCodeSegments_->insert(mutableCodeSegments_->begin() + index, cs))
>              return false;
>  
> +        CodeExists = true;

Could you also clear in the erase() path when the vector is empty?

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox57: affected → wontfix

Pulsebot

Comment 27

•

7 years ago

Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1863f88528cd
Add a fast-path for wasm code segment lookup; r=luke

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

7 years ago

Blocks: 1360211

No longer depends on: 1360211

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 28

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1863f88528cd

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

Depends on: 1480998

You need to log in before you can comment on or make changes to this bug.