Closed Bug 1406041 Opened 7 years ago Closed 7 years ago

Crash in js::wasm::Code::tiers when profiling Tank's demo

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 5 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-b0a102ab-32dc-4522-8074-0bdb20171005.
=============================================================

A wasm::Code isn't found in wasm::ProfilingFrameIterator::initFromExitFP. Needs to be investigated a bit.
It'd be good to see if it repros before bug 1360211 (which changed how we do lookupCode()).
Oh, I see you already noted that in 'Depends on' :)
Attached patch wip.patch (obsolete) — Splinter Review
WIP patch, with a process-wide (sometimes loosy) lookup.

The only issue is that if we fail the lookup in RedirectJitCodeToInterruptCheck, we never get back to it, which can happen when we timeout. So all the timeout test cases now fail.
Attached patch wip-readonly.patch (obsolete) — Splinter Review
An experiment with having two CodeSegment vectors swapped to do a lock-free lookup, but while it seems to work locally, I don't think it solves the problem in all the cases.
For comment 3, why is the lock being held when RedirectJitCodeToInterruptCheck() happens?  It seems like either (1) we'd be in JIT code with the lock not held, (2) we'd be in C++ holding the lock, but also checking cx->interrupt at some point.
I guess my question in comment 5 is more: why does this happen *frequently*; it makes sense that, with low probability, tier-2 would complete at the same time as the interrupt.

So the lock-free swapping seems good, but the worry is that the cost of adding a CodeSegment is O(n).  I had an idea that seems to be amortized O(1) though: you maintain two Vectors that, in the steady state, have the same contents.  When updating, you first lock one vector, insert the CodeSegment, then unlock it, then lock the other, insert the CodeSegment, then unlock.  At any time, one Vector is coherent and unlocked so LookupCodeSegment() can always succeed and update is amortized O(2) ;)
(In reply to Luke Wagner [:luke] from comment #6)
> I guess my question in comment 5 is more: why does this happen *frequently*;
> it makes sense that, with low probability, tier-2 would complete at the same
> time as the interrupt.

Same reason as why we introduced the code segment lookup in the first place: in the ARM simulator, every single memory access goes through code segments lookup to ask if we're in wasm or not. So when running in the ARM sim, we spend most of our time under code lookup.

> So the lock-free swapping seems good, but the worry is that the cost of
> adding a CodeSegment is O(n).  I had an idea that seems to be amortized O(1)
> though: you maintain two Vectors that, in the steady state, have the same
> contents.  When updating, you first lock one vector, insert the CodeSegment,
> then unlock it, then lock the other, insert the CodeSegment, then unlock. 
> At any time, one Vector is coherent and unlocked so LookupCodeSegment() can
> always succeed and update is amortized O(2) ;)

Interesting, this sounds almost like the patch implemented in comment 4 (without proper attention paid to atomic swapping). The major issue I had with that attempt was regarding removals: when we start removing a CodeSegment, and until we removed the CodeSegment in one of the two vectors, a soon-to-be-stall value can be returned during lookup. If the code segment vectors could be append only, that'd solve it, but that would be a bad memory leak. I think the solution you describe also has this same issue, right?
CodeSegments can be removed at any time (regardless of this issue); the caller has to know that the pc they are looking up is live on a paused stack (which is true for interrupt, profiling) so that the CodeSegment won't go away.
Attached patch wip-unlockedstate.patch (obsolete) — Splinter Review
A snapshot patch implementing the idea suggested in comment 6, but there's a catch: there can't be a good ordering of the two vector mutexes (since they can be taken in any order, and both at the same time (when an insertion is interrupted and a lookup happens)).

One idea that came was that each mutex have an optional owning group (just an unsigned id is enough), that would allow the ordering checks to pass if two mutexes were part of the same group.

That being said, I think the code swapping implemented in the previous patch has the same idea, O(2) behavior too and it doesn't break the mutex invariant.
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
Attached patch process.patch (obsolete) — Splinter Review
Not entirely finished (OOM handling is incorrect), but having a first look would help. Thanks!
Attachment #8916075 - Attachment is obsolete: true
Attachment #8916077 - Attachment is obsolete: true
Attachment #8916707 - Flags: feedback?(luke)
Comment on attachment 8916707 [details] [diff] [review]
process.patch

Review of attachment 8916707 [details] [diff] [review]:
-----------------------------------------------------------------

Generally, looks great!  Some nits and I think a few technical data races.

::: js/src/wasm/WasmCode.cpp
@@ +293,5 @@
>  }
>  
> +CodeSegment::~CodeSegment()
> +{
> +    UnregisterCodeSegment(this);

Because initialization can fail before getting to RegisterCodeSegment(), I think this call should be conditionalized on something; I'm thinking a new bool 'initialized_' set right after RegisterCodeSegment().

::: js/src/wasm/WasmCode.h
@@ +105,5 @@
>          outOfBoundsCode_(nullptr),
>          unalignedAccessCode_(nullptr)
>      {}
>  
> +    virtual ~CodeSegment();

CodeSegment doesn't have any derived classes nor intended to be polymorphic so can you make it non-virtual?

::: js/src/wasm/WasmFrameIter.cpp
@@ +1121,5 @@
>      JSContext* cx = TlsContext.get();
>      if (!cx)
>          return false;
>  
>      MOZ_RELEASE_ASSERT(!cx->handlingSegFault);

This function is not inherently tied to handling seg faults or TlsContext; those were just details of the time, so can you just remove the above lines so that the function is more true to its name?

::: js/src/wasm/WasmFrameIter.h
@@ +213,5 @@
>  LookupFaultingInstance(const CodeSegment& codeSegment, void* pc, void* fp);
>  
> +// Return whether the given PC is in wasm code. In loosy mode, the code segment
> +// lookup will fail if the process-wide there is already taken
> +// (see WasmProcess.h).

I think this comment can return to the original form now.

::: js/src/wasm/WasmInstance.h
@@ +21,5 @@
>  
>  #include "gc/Barrier.h"
>  #include "wasm/WasmCode.h"
>  #include "wasm/WasmDebug.h"
> +#include "wasm/WasmProcess.h"

nit: I think maybe this #include is not necessary anymore?

::: js/src/wasm/WasmProcess.cpp
@@ +19,5 @@
> +#include "wasm/WasmProcess.h"
> +
> +#include "mozilla/BinarySearch.h"
> +
> +#include "threading/ExclusiveData.h"

nit: I think ExclusiveData is no longer needed

@@ +41,5 @@
> +class ProcessCodeSegmentMap
> +{
> +    // Since insertions can happen on a background thread while another thread
> +    // lookups up segments, we need a lock.
> +    Mutex mutex_;

nit: I think the real need for the lock here is two concurrent mutators; the reader doesn't take the lock.  I'd also rename mutatorsMutex_.

@@ +43,5 @@
> +    // Since insertions can happen on a background thread while another thread
> +    // lookups up segments, we need a lock.
> +    Mutex mutex_;
> +
> +    Atomic<unsigned, mozilla::Relaxed> observers_;

I think Relaxed is actually too weak here: we want observing observers_==0 to imply writes after don't flow before or that reads before observes_==0 don't happen after.  Since this isn't ultra perf-critical, let's just go with the default of seqcst.

Also nit: let's do size_t so we don't even have to think about overflow on 64-bit

@@ +120,5 @@
> +        MOZ_ASSERT(index == otherIndex);
> +#endif
> +
> +        if (!codeSegments_->insert(codeSegments_->begin() + index, cs)) {
> +            // Wait until observers are done using the read-only vector.

It'd be good to expand the comment explaining why there could be dangling readers and why observers_==0 is the only way to flush them out.  Also good to explain why we're spin-locking here and why it is OK.

@@ +122,5 @@
> +
> +        if (!codeSegments_->insert(codeSegments_->begin() + index, cs)) {
> +            // Wait until observers are done using the read-only vector.
> +            while (observers_) {}
> +            Swap(readonlyCodeSegments_, codeSegments_);

Any field read by lookup() needs to be Atomic since otherwise it is by definition a data race and undefined, so in this case both pointers.  With Atomic ops, you want to be really clear about what the precise loads and stores are, and in what order, so I think we shouldn't use utility like Swap() here (e.g., what if it used the XOR swap trick (https://en.wikipedia.org/wiki/XOR_swap_algorithm)) or otherwise left codeSegments_ in a temporarily-invalid state for lookup() to observe?

@@ +147,5 @@
> +            return;
> +
> +        codeSegments_->erase(codeSegments_->begin() + index);
> +
> +        // Wait until observers are done using the read-only vector.

Maybe reference comment in insert(), as it should be beefy and you don't want to repeat it.

@@ +161,5 @@
> +
> +        codeSegments_->erase(codeSegments_->begin() + index);
> +    }
> +
> +    const CodeSegment* lookupCodeSegment(const void* pc) {

How about naming lookup()?

@@ +163,5 @@
> +    }
> +
> +    const CodeSegment* lookupCodeSegment(const void* pc) {
> +        if (!initialized_)
> +            return nullptr;

I think this unsynchronized use of initialized_ is a race.  Rather than making it Atomic, let's kill initialized_ altogether by:
 - storing the two CodeSegmentVectors by value (no pointer)
 - having the two (Atomic) fields point at these Vectors

@@ +168,5 @@
> +
> +        auto decObserver = mozilla::MakeScopeExit([&] {
> +            observers_--;
> +        });
> +        observers_++;

It'd be good to explain our motivation/constraints with the ref-counting here instead of taking a lock.  Also good to explain:
 1. why it's ok to return a bare CodeSegment* (our assumption about pc being live code on some stack keeping the CodeSegment alive)
 2. why it's ok if we end up looking up in a "stale" CodeSegmentVector (pc is assumed to be executing and thus the CodeSegment would have to have already been registered)

::: js/src/wasm/WasmProcess.h
@@ +29,5 @@
> +
> +const CodeSegment*
> +LookupCodeSegment(const void* pc);
> +const Code*
> +LookupCode(const void* pc);

nit: could you put a \n between these two declarations (and below)
Attachment #8916707 - Flags: feedback?(luke) → feedback+
Thanks! I've written what I think are explicit comments, and local testing works fine (even by always enabling tiering in C++).
Attachment #8916641 - Attachment is obsolete: true
Attachment #8916707 - Attachment is obsolete: true
Attachment #8917048 - Flags: review?(luke)
Priority: -- → P3
(I did a full debug+optimized browser build and I could run the Tanks demo with profiling on without any issues)
Comment on attachment 8917048 [details] [diff] [review]
processwide.patch

Review of attachment 8917048 [details] [diff] [review]:
-----------------------------------------------------------------

Nice comments and nice work!  Just some nits:

::: js/src/wasm/WasmCode.cpp
@@ +291,3 @@
>      SendCodeRangesToProfiler(*this, bytecode.bytes, metadata);
>  
> +    initialized_ = true;

nit: how about naming 'registered_' and putting right after RegisterCodeSegment() so it's unambiguous this bool needs to be set iff RegisterCodeSegment() succeeded (preventing a future refactoring from adding an 'if (!...) return false' between RegisterCodeSegment() and the setting of the bool).

::: js/src/wasm/WasmProcess.cpp
@@ +52,5 @@
> +    // thread running wasm might need to know to which CodeSegment the
> +    // current PC belongs, during a call to lookup(). A lookup is a
> +    // read-only operation, and we don't want to take a lock then
> +    // (otherwise, we could have a deadlock situation if an async lookup
> +    // happened on a given thread that was writing before getting

nit: s/writing before/holding mutatorsMutex_ while/

@@ +59,5 @@
> +    // there are any observers (i.e. calls to lookup()) of the atomic data.
> +
> +    Atomic<size_t, mozilla::SequentiallyConsistent> observers_;
> +
> +    CodeSegmentVector* mutableCodeSegments_;

I think the key inductive invariant that holds everything together is: Except during swapAndWait(), there are no listen() observers of the vector pointed to by mutableCodeSegments_.  The correctness of everything else assumes and preserves this, so could you comment that here?

@@ +60,5 @@
> +
> +    Atomic<size_t, mozilla::SequentiallyConsistent> observers_;
> +
> +    CodeSegmentVector* mutableCodeSegments_;
> +    Atomic<CodeSegmentVector*, mozilla::SequentiallyConsistent> readonlyCodeSegments_;

nit: SequentiallyConsistent is the default (and used as default in many places) so I'd just rely on the default. 
nit: Also, I know it's a type error when swapping, but could you make readonlyCodeSegments_ a const* and use a const_cast in swapAndWait()?  This seems to match the above invariant well.

@@ +76,5 @@
> +        }
> +    };
> +
> +    void swapAndWait() {
> +        // Both vectors are correct for look up at this point, although their

nit: s/correct/coherent/

@@ +79,5 @@
> +    void swapAndWait() {
> +        // Both vectors are correct for look up at this point, although their
> +        // contents are different: there is no way for the looked up PC to be
> +        // in the code segment that is getting registered, because the code is
> +        // not even attached to an instance yet.

nit: s/the code is not even attached to an instance yet/the code segment is not even fully created yet/ (and below)

@@ +123,5 @@
> +        LockGuard<Mutex> lock(mutatorsMutex_);
> +
> +        size_t index;
> +        BinarySearchIf(*mutableCodeSegments_, 0, mutableCodeSegments_->length(),
> +                       CodeSegmentPC(cs->base()), &index);

nit: could this and below be MOZ_ALWAYS_FALSE (symmetric with remove())?

@@ +138,5 @@
> +        MOZ_ASSERT(index == otherIndex);
> +#endif
> +
> +        // Although we could simply revert the insertion in the read-only
> +        // vector, it is simpler to just crash.

Could you add "and given that each CodeSegment consumes multiple pages, it is unlikely this insert() would OOM in practice"?

@@ +152,5 @@
> +
> +        size_t index;
> +        MOZ_ALWAYS_TRUE(BinarySearchIf(*mutableCodeSegments_, 0, mutableCodeSegments_->length(),
> +                                       CodeSegmentPC(cs->base()), &index));
> +        mutableCodeSegments_->erase(mutableCodeSegments_->begin() + index);

nit: could you put a \n before erase()?

@@ +175,5 @@
> +
> +        size_t index;
> +
> +        // Once atomically-read, the readonly vector is always a valid one (see
> +        // swapAndWait()).

nit: Well, not "always", rather just "while observers_ has been incremented".

@@ +177,5 @@
> +
> +        // Once atomically-read, the readonly vector is always a valid one (see
> +        // swapAndWait()).
> +
> +        CodeSegmentVector* readonly = readonlyCodeSegments_;

nits: could you:
 - make it a const CodeSegmentVector?
 - take out the \n between comment and 'readonly' and instead have a \n after 'readonly'/
 - move 'size_t index' to the line before BinarySearchIf()?

::: js/src/wasm/WasmProcess.h
@@ +25,5 @@
> +class CodeSegment;
> +class Code;
> +
> +// These methods return the wasm::CodeSegment (resp. wasm::Code) containing
> +// the given pc, if any exist in the process. These methods take a lock.

I think you mean "do _not_ take a lock", and could you then add "and thus are safe to use in a profiling or async interrupt context".

@@ +38,5 @@
> +// via pc in the methods described above.
> +
> +bool
> +RegisterCodeSegment(const CodeSegment* cs);
> +void

nit: \n between these two
Attachment #8917048 - Flags: review?(luke) → review+
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab5dcb5b8f1f
Implement process-wide wasm code lookup; r=luke
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/257ce836b882
Make testTypedArrayInit temporarily faster to run in ARM sim to fix bustage on a CLOSED TREE; rs=luke
Attached patch fastpath.patch (obsolete) — Splinter Review
Attachment #8917451 - Flags: review?(luke)
Comment on attachment 8917451 [details] [diff] [review]
fastpath.patch

Review of attachment 8917451 [details] [diff] [review]:
-----------------------------------------------------------------

Might be worth being able to do this all from an inline fast path that avoids even the function call overhead: have an namespace wasm { extern Atomic<bool> CodeExists; } in WasmProcess.h that Simulator-arm.cpp probes before even calling handleWasmFault()!  This is wartier, so perhaps you could measure if it significantly changes the runtime of testTypedArrayInit.js.
Attachment #8917451 - Flags: review?(luke) → review+
https://hg.mozilla.org/mozilla-central/rev/ab5dcb5b8f1f
https://hg.mozilla.org/mozilla-central/rev/257ce836b882
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Do we want to nominate this for 57 uplift?
Flags: needinfo?(bbouvier)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #22)
> Do we want to nominate this for 57 uplift?

No need, this particular crash reason is because of the refactorings made in bug 1360211 which landed on 58. Crashes with this signature we're observing on 57 may be caused by bug 1406897. Thanks for asking.
Flags: needinfo?(bbouvier)
(In reply to Luke Wagner [:luke] from comment #20)
> Comment on attachment 8917451 [details] [diff] [review]
> fastpath.patch
> 
> Review of attachment 8917451 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Might be worth being able to do this all from an inline fast path that
> avoids even the function call overhead: have an namespace wasm { extern
> Atomic<bool> CodeExists; } in WasmProcess.h that Simulator-arm.cpp probes
> before even calling handleWasmFault()!  This is wartier, so perhaps you
> could measure if it significantly changes the runtime of
> testTypedArrayInit.js.

Wow, with this I could divide by two the runtime of this particular test, by putting the fast path check at the top of handleWasmFault. Uploading the updated patch and assuming carrying forward r+; try is still running a few times all the ARM tests.
Attached patch fastpath.patchSplinter Review
Attachment #8917451 - Attachment is obsolete: true
Attachment #8917810 - Flags: review+
Comment on attachment 8917810 [details] [diff] [review]
fastpath.patch

Review of attachment 8917810 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!

::: js/src/wasm/WasmProcess.cpp
@@ +135,5 @@
>  
>          if (!mutableCodeSegments_->insert(mutableCodeSegments_->begin() + index, cs))
>              return false;
>  
> +        CodeExists = true;

Could you also clear in the erase() path when the vector is empty?
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1863f88528cd
Add a fast-path for wasm code segment lookup; r=luke
Blocks: 1360211
No longer depends on: 1360211
Depends on: 1480998
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: