Closed Bug 1277562 Opened 8 years ago Closed 7 years ago

Wasm: tiered compiler

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(20 files, 44 obsolete files)

100.23 KB, patch
luke
: review+
Details | Diff | Splinter Review
5.82 KB, patch
lth
: review+
Details | Diff | Splinter Review
7.85 KB, patch
lth
: review+
Details | Diff | Splinter Review
12.54 KB, patch
lth
: review+
Details | Diff | Splinter Review
9.73 KB, patch
lth
: review+
Details | Diff | Splinter Review
7.67 KB, patch
lth
: review+
Details | Diff | Splinter Review
10.48 KB, patch
lth
: review+
Details | Diff | Splinter Review
19.73 KB, patch
lth
: review+
Details | Diff | Splinter Review
9.41 KB, patch
lth
: review+
Details | Diff | Splinter Review
20.21 KB, patch
lth
: review+
Details | Diff | Splinter Review
10.01 KB, patch
lth
: review+
Details | Diff | Splinter Review
8.70 KB, patch
lth
: review+
Details | Diff | Splinter Review
1.33 KB, patch
luke
: review+
Details | Diff | Splinter Review
4.13 KB, patch
lth
: review+
Details | Diff | Splinter Review
2.28 KB, patch
luke
: review+
janv
: review+
Details | Diff | Splinter Review
4.54 KB, patch
luke
: review+
Details | Diff | Splinter Review
3.37 KB, patch
billm
: review+
Details | Diff | Splinter Review
16.59 KB, patch
lth
: review+
Details | Diff | Splinter Review
37.11 KB, patch
lth
: review+
Details | Diff | Splinter Review
9.41 KB, patch
luke
: review+
Details | Diff | Splinter Review
The Wasm baseline compiler is landing in bug 1232205 (x64), bug 1277008 (x86), and bug 1277011 (ARM-32).  But it is not a tiered compiler: we select baseline or ion compilation per function based on a switch and on the ability of the baseline compiler to compile the function, and once the code has been compiled that's it.

Instead we want to create a system where we baseline-compile "all" functions first (modulo the ones the baseline compiler can't handle) and then either ion-compile everything on a slow background thread and swap code in as it becomes available, or we profile code in the baseline compiler and demand-compile in ion.

Policy and mechanisms TBD.
Priority: -- → P2
Merge has happened. Moving the bugs from P2 to P1. I think Lars will be the driver of this bug?
Assignee: nobody → lhansen
Priority: P2 → P1
Yes, tiering is probably appropriate for FF53.
Status: NEW → ASSIGNED
The simplest tiering solution is to first baseline-compile all code and start running the application.  Then, start a full ion compilation in the background.  When the ion compilation is finished, and the system is at an idle point (return to outermost runloop), just switch to the ion code.  The ion compilation should run at low priority and module-by-module.

Workers and the main thread can be handled independently, if/when workers run wasm.  One would hope that workers and the main thread share a compilation subsystem so that we don't suddenly have very many threads competing for time.

The most significant downside with the simple strategy is memory pressure: both object programs will be present in memory at the same time.
(In reply to Lars T Hansen [:lth] from comment #3)
> One would hope that workers and the main thread share a
> compilation subsystem so that we don't suddenly have very many threads
> competing for time.

The helper thread system (on which parallel-ion and parallel-WebAssembly.compile jobs are scheduled) is indeed process-wide, so your hopes are well founded.  So I think what you'd have here is a new kind of task that ran wasm::IonCompileFunction() (which itself would dispatch thousands of jobs to the same helper threads).

There is a dining philosophers hazard here due to the fixed set of worker threads (if you imagine N workers dispatching N wasm::IonCompileFunction() tasks that each grab all N helper threads and then all wait for their starved compilation jobs).  We actually already had this hazard with the background parsing tasks (when they contain asm.js) and this was solved by having GlobalHelperThreadState::maxParseThreads()=1 so that only one parse task can be executing at a time.  I think we'd do the same for the Ion compile tasks.
Recording some notes from a conversation last week:

Let's assume for now that we want to swap in the Ion-compiled code when it is all compiled (and we can serialize it), not ion-compiled functions piecemeal.  There are several questions:

- How do we do the swapping?
- How do we reap the baseline code after swapping is done?
- How do we deal with threads?

The initial thought was to just swap ion code for baseline code in all functions at the runloop and then reap the baseline code.  Threads complicate this.  (Let's just assume workers that share the compiled code, for now.)  A thread may not return to its runloop in the way the main thread would.

Luke's initial idea was that we could force the ion compilation to finish the first time we send shared memory from one agent to another.  (Without shared memory, a worker would eventually return to its runloop and could be patched.  I'm not sure that's 100% sound but it's at least 99% - any sync API would make it not true.  But assume it.)  This is OK, and if it is a documented fact then programmers will write code that do work on the main thread while Ion is compiling so as to avoid jank.  A similar idea is to force ion to finish when we create a worker.

Another idea is instead to just patch the machine code while it's running and let the baseline code leak.  All architectures define some granularity that is patchable and the kinds of instructions you can patch with, so if the baseline code contains patch points near the entry we can insert code that causes control to be diverted to the ion code.  (ABIs are compatible.)  The baseline code can leak because it only leaks on the first run; on subsequent runs we'll use cached Ion code.  There are complications here: at least W^X prevents us from writing to pages that we may execute out of.  But it would allow execution with workers and shared memory to go ahead while ion compilation is ongoing.

(From memory, some details may be slightly off.)
Yeah, I think the second option is way better.

Keeping the baseline code alive until the whole Module goes away seems like a fine way to start and I think there are multiple alternatives if we decide we want to release it more eagerly.

W^X is a good point.  IIRC, this was only actually only mandated by the kernel on iOS and some secure BSDs; on those platforms we could start by disabling the baseline.  For the rest, it's just good security hygiene so we could simply enable W+X during the small window of patching.  If we really wanted to enable, though, we could use a signal handler to park threads during the window of time the code was -X.
(In reply to Luke Wagner [:luke] from comment #6)
> W^X is a good point.  IIRC, this was only actually only mandated by the
> kernel on iOS and some secure BSDs; on those platforms we could start by
> disabling the baseline.  For the rest, it's just good security hygiene so we
> could simply enable W+X during the small window of patching.

We really shouldn't do this if there are other options. A lot of work went into W^X and I know it's very convenient to bypass it "just for this" and toggle RWX, but if we add mechanisms for it people will start to use it elsewhere too.

> If we really
> wanted to enable, though, we could use a signal handler to park threads
> during the window of time the code was -X.

Could the signal handler detect we're running baseline code and wait until patching is done? I know that also adds some complexity, but I'd appreciate considering all options.
Or maybe something like this: when we share baseline wasm code with a worker, and Ion code is not ready yet, we patch the baseline code to do a slightly slower check at each patch point (check some flag, call a little thunk, etc). Then for such shared code, we can swap in Ion code without any patching, and at the same time we don't penalize main-thread-only baseline code.
(In reply to Jan de Mooij [:jandem] from comment #7)
> Could the signal handler detect we're running baseline code and wait until
> patching is done? I know that also adds some complexity, but I'd appreciate
> considering all options.

Yes, that's what I was thinking about.  Not super-complex, but non-negligible.

(In reply to Jan de Mooij [:jandem] from comment #8)

Well, this would apply even in the main-thread case: the main-thread is running baseline code and a helper thread finishes ion compilation and immediately patches it in.

But it's a good idea to consider penalizing baseline prologue performance a bit to separate out the patchable bits into a separate r/w data section.  This also avoids worrying about the corner cases which enforce w^x.
> But it's a good idea to consider penalizing baseline prologue performance a
> bit to separate out the patchable bits into a separate r/w data section. 
> This also avoids worrying about the corner cases which enforce w^x.

In particular, we could probably do a:
  load addr, $reg
  test $reg, $reg
  jnz $reg
which I'm guessing will not be a major perf penalty and then we just write in the ion entry to `addr`.
We're doing this for FF54, so P2.
Priority: P1 → P2
Depends on: 1316814
Priority: P2 → P1
Depends on: 1335068
Depends on: 1338217
Depends on: 1351488
Depends on: 1359027
Enforces tier-specific lookup on metadata, linkdata, and code.  This replaces calls such as o->metadataTier() with calls such as o->metadata(t) where t is a tier designator.

The tier designator has five(!) values: ONE, TWO, FIRST, BEST, and TBD.  TBD is a placeholder and will disappear.  ONE and TWO are baseline and ion code respectively.  FIRST and BEST are always valid; FIRST produces the lowest tier or the only tier, while BEST produces the highest tier or the only tier.  TBD is synonymous with FIRST and is used only when I could not (yet) work out which tier to use; likely the code around most TBD instances will change with tiering.

The code that performs lookup based on those values will change later, once tiering is implemented, but that's just local adjustment in WasmCode and WasmModule.

One open question is whether the Tier enum should merge with the CompileMode enum, since they overlap: CompileMode::Baseline becomes Tier::ONE, for example.  One can argue that one either way.  If we do merge, there will be some new asserts here and there where only some of the tier values are meaningful.

Another open question is whether the names are appropriate or whether they should be "One", "Two", etc.  I like all caps myself.

We could land this now, because I don't think it'll change much; it would reduce future churn.  Let me know what you think.
Attachment #8870066 - Flags: feedback?(luke)
Comment on attachment 8870066 [details] [diff] [review]
bug1277562-enforce-tier-specific-lookup.patch

Review of attachment 8870066 [details] [diff] [review]:
-----------------------------------------------------------------

Ah, very understandable w/ all the preceding patches landed.  A few suggestions:

::: js/src/wasm/WasmCompartment.cpp
@@ +30,5 @@
> +// With tiering, instances can have one or two code segments, and since
> +// instances are sorted by code segment we will have a richer structure here, so
> +// that we can lookup an instance by a PC in either code segment, while being
> +// able to iterate across instances visiting each only once.  For example, we
> +// can have a separate list for the second tier.  Search for Tier::TBD below.

I think comment it out-of-date w.r.t current plan of process-wide map.

::: js/src/wasm/WasmDebug.h
@@ +159,4 @@
>  
> +    uint32_t debugFuncToCodeRangeIndex(uint32_t funcIndex) const {
> +        // Any tier will do here.
> +        return metadata(Tier::FIRST).debugFuncToCodeRange[funcIndex];

Because code range indices are assigned in order of compilation job completion, I think the code range indices are not stable between baseline/ion.  However, since this is debugging, there will be only one (baseline) tier.  Might be good to have explicit Tier::Debug alias for Tier::Baseline so this is super-explicit?

::: js/src/wasm/WasmInstance.cpp
@@ +350,5 @@
>  #endif
>      tlsData()->stackLimit = *(void**)cx->stackLimitAddressForJitCode(JS::StackForUntrustedScript);
>  
> +    Tier callerTier = Tier::TBD;
> +    Tier calleeTier = Tier::TBD;

w.r.t calleeTier, I think you'd use Tier::Best.  However, you can't just do that since there are two uses of calleeTier per callee (below) and thus it'd be possible that the tier racily changes between the first and second use and thus we need to explicitly pre-resolve Best to a specific Ion/Baseline once up front.  But this highlights a pretty subtle hazard we'll have to guard against in many places.  It feels like there should be a more fundamental distinction between "a dynamic request for the best tier" and "a specific tier I want that I know exists".

What if instead:
 - rename CompileMode to Tier (killing the current Tier)
 - have a method `Tier wasm::Code::racyBestTier()` that returns Ion|Baseline, racily
 - callers are expected to do this query once and then use the returned Tier consistently, calling the current set of Tier-taking methods (which can then do a simple array-index of a two-element array, no switch)

For the other case, Tier::FIRST, there isn't the race hazard (since, until we release baseline tiers, the first tier will *always* be the first tier).  However, for symmetry, (and to prepare for releasing), I think it makes sense to have a `Tier wasm::Code::anyTier()` (using "any" in place of "first").

::: js/src/wasm/WasmJS.cpp
@@ +815,5 @@
>  
>      obj->initReservedSlot(MODULE_SLOT, PrivateValue(&module));
>      module.AddRef();
> +    // We account for the FIRST tier here; the BEST tier, if different, will be
> +    // accounted for separately when it's been compiled.

Hmm, this will be a challenge: full compilation will finish in a helper thread which can't call cx->zone()->updateJitCodeMallocBytes().  Probably that's fine b/c these counters are heuristics anyways and the only thing that's important is updating them proportionally to code allocation and if we account for baseline that should "cover" ion.

@@ +1709,4 @@
>      if (value) {
>          RootedWasmInstanceObject instanceObj(cx, ExportedFunctionToInstanceObject(value));
>          uint32_t funcIndex = ExportedFunctionToFuncIndex(value);
> +        Tier tier = Tier::TBD;  // Perhaps the tier that the function is at?

This bit of code made me realize that, in addition to updating the call-table when the Ion tier is ready, we'll also have to update all Tables pointing at the baseline versions.  I think what we'll need to do is give each WasmInstanceObject a weak set of WasmTableObject (to update) that is maintained by WasmTableObject::setImpl().  This, assuming there is some lock protecting a Module's Tier field, I think setImpl would take the lock, if Tier != TWO, add itself to the instance's weak set, and then use the tier value below.

Also all the FuncImportTls::code fields of all importing Instances, hence a weak set of dependent instances.

::: js/src/wasm/WasmModule.cpp
@@ +596,4 @@
>      for (const ElemSegment& seg : elemSegments_) {
>          Table& table = *tables[seg.tableIndex];
>          uint32_t offset = EvaluateInitExpr(globalImports, seg.offset);
> +        Tier tier = Tier::TBD;

For this, I think Module::instantiate() should call Code::racyBestTier() (as proposed above) once and pass it to these methods as a parameter.
(In reply to Luke Wagner [:luke] from comment #13)
> Comment on attachment 8870066 [details] [diff] [review]
> bug1277562-enforce-tier-specific-lookup.patch
> 
> Review of attachment 8870066 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/wasm/WasmDebug.h
> @@ +159,4 @@
> >  
> > +    uint32_t debugFuncToCodeRangeIndex(uint32_t funcIndex) const {
> > +        // Any tier will do here.
> > +        return metadata(Tier::FIRST).debugFuncToCodeRange[funcIndex];
> 
> Because code range indices are assigned in order of compilation job
> completion, I think the code range indices are not stable between
> baseline/ion.  However, since this is debugging, there will be only one
> (baseline) tier.  Might be good to have explicit Tier::Debug alias for
> Tier::Baseline so this is super-explicit?

That's a fine idea.  (Even so, it may be desirable to stabilize the indices if we can, but I see this can be troublesome for some of the tables.)

> ::: js/src/wasm/WasmInstance.cpp
> @@ +350,5 @@
> >  #endif
> >      tlsData()->stackLimit = *(void**)cx->stackLimitAddressForJitCode(JS::StackForUntrustedScript);
> >  
> > +    Tier callerTier = Tier::TBD;
> > +    Tier calleeTier = Tier::TBD;
> 
> w.r.t calleeTier, I think you'd use Tier::Best.  However, you can't just do
> that since there are two uses of calleeTier per callee (below) and thus it'd
> be possible that the tier racily changes between the first and second use
> and thus we need to explicitly pre-resolve Best to a specific Ion/Baseline
> once up front.  

Yes indeed.  The larger patch we had earlier has this, it has a "best()" accessor on Code, in line with what you suggest.

> But this highlights a pretty subtle hazard we'll have to
> guard against in many places.  It feels like there should be a more
> fundamental distinction between "a dynamic request for the best tier" and "a
> specific tier I want that I know exists".

Yes.

> What if instead:
>  - rename CompileMode to Tier (killing the current Tier)
>  - have a method `Tier wasm::Code::racyBestTier()` that returns
> Ion|Baseline, racily
>  - callers are expected to do this query once and then use the returned Tier
> consistently, calling the current set of Tier-taking methods (which can then
> do a simple array-index of a two-element array, no switch)
> 

Right.

> For the other case, Tier::FIRST, there isn't the race hazard (since, until
> we release baseline tiers, the first tier will *always* be the first tier). 
> However, for symmetry, (and to prepare for releasing), I think it makes
> sense to have a `Tier wasm::Code::anyTier()` (using "any" in place of
> "first").

Right.  In the tiering system in this patch, Tier::FIRST was always an OK solution but Tier::BEST was a placeholder.

> 
> @@ +1709,4 @@
> >      if (value) {
> >          RootedWasmInstanceObject instanceObj(cx, ExportedFunctionToInstanceObject(value));
> >          uint32_t funcIndex = ExportedFunctionToFuncIndex(value);
> > +        Tier tier = Tier::TBD;  // Perhaps the tier that the function is at?
> 
> This bit of code made me realize that, in addition to updating the
> call-table when the Ion tier is ready, we'll also have to update all Tables
> pointing at the baseline versions.  I think what we'll need to do is give
> each WasmInstanceObject a weak set of WasmTableObject (to update) that is
> maintained by WasmTableObject::setImpl().  This, assuming there is some lock
> protecting a Module's Tier field, I think setImpl would take the lock, if
> Tier != TWO, add itself to the instance's weak set, and then use the tier
> value below.
> 
> Also all the FuncImportTls::code fields of all importing Instances, hence a
> weak set of dependent instances.

OK.  I don't know that I understand all of that yet but clearly setImpl() is one of the spots that did not yet have any kind of solution.
(This patch is quite a bit larger because I included some renaming work that was reasonable in light of translating CompileMode to Tier.)

This addresses most of the remarks on the previous patch.  Notably, CompileMode is gone and Tier is shrunk, and anyTier() is now an accessor on Code, which works fine.  In a few cases, tier determination by anyTier() has been lifted out of loops to clarify that it is done once.

I have not introduced racyBestTier() here and I have not modified most of the uses of Tier::TBD, as I feel that work belongs to later patches in this series.  Tier::TBD has reasonable semantics since there's only one tier available, and it's not used very broadly.  The uses in WasmCompartment go away once we land the patch that changes how instances are managed.
Attachment #8870066 - Attachment is obsolete: true
Attachment #8870066 - Flags: feedback?(luke)
Attachment #8870339 - Flags: review?(luke)
Depends on: 1365069
(In reply to Lars T Hansen [:lth] from comment #14)
> That's a fine idea.  (Even so, it may be desirable to stabilize the indices
> if we can, but I see this can be troublesome for some of the tables.)

If we need to fix this: the code-range-index order issues arises from the fact that we eagerly asmMergeWith each task's masm when the task completes and code ranges must be ordered (for binary lookup).  If we make the optimization discussed in bug 1338217 comment 52, then we can wait until the end of compilation and order the functions however we like before the final copy.
Comment on attachment 8870339 [details] [diff] [review]
bug1277526-enforce-tier-specific-lookup-v2.patch

Review of attachment 8870339 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!

In a future patch, when Tier::TBD is removed, I hope we can have Tier::Debug = Tier::Baseline and then we can drop all the "tier == Baseline || tier == Ion" asserts as there will only be two cases.

::: js/src/wasm/WasmCode.cpp
@@ +841,4 @@
>  
> +        size_t match;
> +        if (BinarySearch(MemoryAccessOffset(metadata(t).memoryAccesses), lowerBound, upperBound,
> +                         target, &match))

nit: can you hoist `metadata(t).memoryAccesses` for sharing with the two uses?

::: js/src/wasm/WasmInstance.cpp
@@ +455,5 @@
>  
> +    Tier tier = Tier::TBD;
> +
> +    for (unsigned i = 0; i < metadata(tier).funcImports.length(); i++) {
> +        FuncImportTls& import = funcImportTls(metadata(tier).funcImports[i]);

I think you can use anyTier() here: there's only 1 TlsData and FuncImport::tlsDataOffset() is invariant (it's defined up front based on ModuleEnvironment; in the future, it'd be good to assert all this tier-invariant stuff at the end of tier compilation).

@@ +504,5 @@
>      TraceEdge(trc, &object_, "wasm instance object");
>  
> +    for (auto t : metadata().tiers()) {
> +        for (const FuncImport& fi : metadata(t).funcImports)
> +            TraceNullableEdge(trc, &funcImportTls(fi).obj, "wasm import");

For the above reason, I don't think you need a for loop here; anyTier() will do.

@@ +816,5 @@
>  void
>  Instance::deoptimizeImportExit(uint32_t funcImportIndex)
>  {
> +    for (auto tier : code().tiers()) {
> +        const FuncImport& fi = metadata(tier).funcImports[funcImportIndex];

ditto
Attachment #8870339 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #17)
> Comment on attachment 8870339 [details] [diff] [review]
> bug1277526-enforce-tier-specific-lookup-v2.patch
> 
> Review of attachment 8870339 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Great, thanks!
> 
> In a future patch, when Tier::TBD is removed, I hope we can have Tier::Debug
> = Tier::Baseline and then we can drop all the "tier == Baseline || tier ==
> Ion" asserts as there will only be two cases.

Yes.

> @@ +816,5 @@
> >  void
> >  Instance::deoptimizeImportExit(uint32_t funcImportIndex)
> >  {
> > +    for (auto tier : code().tiers()) {
> > +        const FuncImport& fi = metadata(tier).funcImports[funcImportIndex];
> 
> ditto

ie, "I think you can use anyTier() here: there's only 1 TlsData and FuncImport::tlsDataOffset() is invariant (it's defined up front based on ModuleEnvironment; in the future, it'd be good to assert all this tier-invariant stuff at the end of tier compilation)."

For this one access, yes, but what about the access to codeBase two lines below?  I will revert this to a TBD because the story is probably a little more complicated.
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4fc25ea324ca
Enforce tier-specific lookup on Code, Metadata, and LinkData, in most cases.  r=luke
(In reply to Lars T Hansen [:lth] from comment #18)
> For this one access, yes, but what about the access to codeBase two lines
> below?  I will revert this to a TBD because the story is probably a little
> more complicated.

Ah, you're right; I was just looking at the TlsData use.
Attached patch bug1277562-part1-setup.patch (obsolete) — Splinter Review
Part 1: WIP, but generally clean.  There's a mix here of the mundane (change the meaning of prefs and flags, adapt test cases) and the essential (augment data structures to carry tier-2 data).
Part 2: WIP, and not completely clean.  This implements background compilation.  For sure missing is storage management for the CompileTier2Task; I have something pending but I'm not happy with it, so it's not here.
Attached patch bug1277562-part3-patching.patch (obsolete) — Splinter Review
Part 3: WIP, and incomplete, but not in terrible shape.  This gets rid of the TBD tier, but most importantly creates the jump table and adapts baseline code so that it is aware of it.  Definitely missing here is patching of other tables, so exports and imports that are initially patched to tier-1 code stay that way, even if those functions will forward to tier-2 code once that's available.
Depends on: 1371216
Depends on: 1372883
Attachment #8874880 - Attachment is obsolete: true
Attachment #8874883 - Attachment is obsolete: true
Attachment #8874884 - Attachment is obsolete: true
What follows are 17 patches that together implement tiering.  I've opted to offer these in the form I have them locally instead of combining them; all should compile and generally pass testing without subsequent patches.  Very occasionally there will be a TODO comment that indicates a spot for a larger chunk of code provided by the next patch.

This implementation is not quite complete but worth reviewing even so, as we'll likely want to
debate some parts of it...  The missing pieces are:

- Patch points in baseline code are a little too fat, see comments in the very last patch.
  We can land as-is though.

- There is no updating of the import tables, so when wasm calls imported wasm functions the
  latter may go through a baseline prologue.  This is not fatal, and if the patch points can
  be made a little better it'll be even less of an issue.  We can land the current code as-is
  and consider this a cleanup item.

- There is a simplistic policy for tiered compilation, in two ways: when tiering is possible,
  then all code is compiled with tiering, and the resources available for background
  compilation are limited to two threads.  We've discussed other approaches (don't tier
  something that is quite small or very large; use thread priorities to compile resource
  use more dynamically) but again, we can land this as-is to get testing started, and then
  improve it.  Discuss.

- We are not yet properly hooked into the IndexedDB machinery.  Right now, serialization blocks
  the thread that is attempting to serialize on the availability of Ion code, if that code 
  will be forthcoming; it's possible this is not landable in its current form.

- Though some new testing modes are available here we probably want even more, eg, more
  chaos-y modes.
Cleanup:

Equate Tier::Debug with Tier::Baseline and remove redundant asserts.  Tier::TBD is still in the enum but is removed in a subsequent patch.
Attachment #8879179 - Flags: review?(luke)
Scaffolding:

Introduce logic to Code, LinkData, and Metadata to hold tier-2 data.  The method hasTier2() will be used to safely gate access to this data.
Attachment #8879181 - Flags: review?(luke)
We missed a spot:  Do to elemSegments what we did to Code, Metadata, and LinkData, because the code ranges in the elemSegments are tier-dependent.

I looked for ways to avoid this, but so far as I could tell it would mean re-introducing these tiered data elsewhere, for no real gain.  As it is, it's fairly well contained.
Attachment #8879184 - Flags: review?(luke)
Create logic for inserting tier2 data into the tiered data structures, and implement hasTier2().  This centers on an atomic flag in Metadata; patching code will update all data structures and set that flag, while code that needs to decide whether to look at one or two tiers of data will consult the flag.  The flag is only set once all data are in place.
Attachment #8879186 - Flags: review?(luke)
We have two switches, --wasm-always-baseline in the shell and javascript.options.wasm_baselinejit in the browser.  They both currently mean  "use Rabaldr instead of Baldr".  The new meaning is simply "enable Rabaldr".  If Ion is also enabled (as it is, barring --no-ion), and other conditions line up (see subsequent patches), then tiering will be enabled.

I think the only tricky bit here now is that some test cases need to test that debugging is supported, so there's new functionality for testing that.
Attachment #8879188 - Flags: review?(ydelendik)
Attachment #8879188 - Flags: review?(luke)
Enable Rabaldr by default in the browser, thus enabling tiering by default.
Attachment #8879189 - Flags: review?(luke)
This abstracts the code that makes decisions about compilation.  Previously it was just "if debugging then baseline otherwise ion", now we need to decide whether we're tiering as well.  This centers on a new enum, CompileMode, which complements Tier.  CompileMode is a state value: CompileMode::Unified means, compile once (with a compiler determined by some other predicate).  CompileMode::Tier1 means, compile twice, first with Baseline, then with Ion; once the second compilation succeeds, a state of Tier1 is updated to Tier2; if the second compilation fails or is canceled, a state of Tier1 is updated to Unified.

The predicates here will evolve a little in a subsequent patch, where the issue of compilation resources comes up.
Attachment #8879191 - Flags: review?(luke)
Expose HelperThreads to tiers.

In order to execute background compilations, we need a new Tier2 task type, and then we need to keep the compilation tasks for tier2 on separate available/finished queues so that they can be run at lower priorities.

Only one tier1 compilation can be in progress at a time, as before; a tier2 compilation (for an earlier tier1 compilation) can be in progress concurrently, though generally it will only receive left-over resources.
Attachment #8879194 - Flags: review?(luke)
What happens here is that when a compilation is finished, MaybeStartTier2Compile is called to kick off a tier-2 compilation if appropriate.  That creates a CompileTier2Task, which runs on a helper thread and itself starts a parallel compilation (with compile tasks at low priority).  Once that task is done, instead of obtaining a Module from the ModuleGenerator we obtain those parts that contain tier-specific data, and the CompileTier2Task hangs onto those until the helper thread that owns that task can hand the data off to the patching code.  The important function is Module::finishTier2Compile, which installs tier-2 data and sets the flag that exposes those data.

We also introduce a lock/cond pair that is used to block on tier2 compilation being complete, and to signal completeness.

This patch is not quite standalone; the cancellation code is in the next patch.
Attachment #8879197 - Flags: review?(luke)
For cancellation I have opted to let the tier-2 compilation run its course but to be short-circuited by a "cancelled" flag, this had the fewest pitfalls.  Effectively the compiler does nothing and just returns, and updates data structures as per normal.  It requires that a few data structures, notably the FuncOffset, have sane dummy values; it's a little ugly but it ought to be OK.
Attachment #8879198 - Flags: review?(luke)
Make lookupCode search the instance list linearly, since we can have two tiers of code but the instances are only ordered by the first tier.

This requires bug 1365069 to land for good performance; I'll take care of that too.
Attachment #8879202 - Flags: review?(luke)
Remove Tier::TBD, and introduce Code::bestTier() to compute the best code.

The trickiest part here is that deoptimizeImportExit must worry about code at two tiers, and must not deoptimize unconditionally.
Attachment #8879210 - Flags: review?(luke)
(I guess this could have been folded into Part 12.)

When optimizing an import exit under tiering, we need to ensure that a dependency is added to the BaselineJIT only once.
Attachment #8879216 - Flags: review?(luke)
Tests written using wasmExtractCode frequently need to worry about the tier; for example, in the IndexedDB tests only Ion code is acceptable, and when the tests extract code they should block if Tier-2 compilation is ongoing.  So add a tier parameter to wasmExtractCode.

(Test cases updated in subsequent patch.)
Attachment #8879220 - Flags: review?(luke)
Make callers of wasmExtractCode pass the tier they want.
Attachment #8879224 - Flags: review?(luke)
Debugging command line switches and environment variables.

I've find various bugs with these and they all seem useful.  I would like to add more (we've discussed various chaos modes), but these seem like a good start.
Attachment #8879225 - Flags: review?(luke)
And we patch tier2 code into tier1 code.  This does what we've discussed: there's a table of pointers, and tier1 functions jump via this table to tier2 code.  The table is updated racily.  There's a large block comment in here that explains the details, as well as some optimizations I think I'd like to implement, even if this code works.
Attachment #8879230 - Flags: review?(luke)
Blocks: 1365069
No longer depends on: 1365069
Attachment #8879179 - Flags: review?(luke) → review+
Comment on attachment 8879181 [details] [diff] [review]
Part 2: bug1277562-tier-variant-data.patch

Review of attachment 8879181 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmModule.cpp
@@ +205,5 @@
>  LinkData::initTier(Tier tier)
>  {
> +    MOZ_ASSERT(!linkData1_);
> +    linkData1_ = js::MakeUnique<LinkDataTier>(tier);
> +    return linkData1_ != nullptr;

Perhaps it should be named initTier1 now?
Attachment #8879181 - Flags: review?(luke) → review+
Comment on attachment 8879184 [details] [diff] [review]
Part 3: bug1277562-elemSegment-tiering.patch

Review of attachment 8879184 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.

::: js/src/wasm/WasmTypes.h
@@ +357,5 @@
> +enum class Tier
> +{
> +    Baseline,
> +    Debug = Baseline,
> +    Ion,

Just like we're using Tier::Debug to symbolically explain that we know we have Tier::Baseline available because we're in debug mode, could you add a Tier::Serialized = Ion and use it in this patch (and I think once or twice in the previous patch) anywhere where we use Tier::Ion in a serialization context?
Attachment #8879184 - Flags: review?(luke) → review+
Comment on attachment 8879186 [details] [diff] [review]
Part 4: bug1277562-implement-hasTier2.patch

Review of attachment 8879186 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/wasm/WasmCode.cpp
@@ +500,3 @@
>  {
> +    MOZ_RELEASE_ASSERT(metadata2_.get());
> +    hasTier2_ = true;

Could you also assert !hasTier2_?

@@ +811,4 @@
>  
>  const uint8_t*
>  Code::deserialize(const uint8_t* cursor, const SharedBytes& bytecode, const LinkData& linkData,
> +                  MutableMetadata& metadata)

Can just pass a `Metadata&` here instead?

::: js/src/wasm/WasmModule.cpp
@@ +320,4 @@
>      if (!cursor)
>          return nullptr;
>  
> +    MutableMetadata metadata = Metadata::ensureMetadata(maybeMetadata, Tier::Ion);

I think there's only one use of this static method, so I'd just inline it here so it can make more sense in context.
Attachment #8879186 - Flags: review?(luke) → review+
Comment on attachment 8879188 [details] [diff] [review]
Part 5: bug1277562-wasm-baseline-switch.patch

Review of attachment 8879188 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/debug/bug1330339.js
@@ +1,1 @@
> +// |jit-test| test-also-no-wasm-baseline; error: TestComplete

It makes sense to add test-also-no-wasm-baseline to all the directives.txt in this patch, but since the debugger enables baseline anyway, I don't really see the testing value in adding test-also-no-wasm-baseline to all the individual debugger test files in this patch.  It seems like it would test the same thing.  If all those are removed, could wasmDebuggingIsSupported() be removed as well?
Attachment #8879188 - Flags: review?(luke) → review+
Comment on attachment 8879189 [details] [diff] [review]
Part 6: bug1277562-browser-tiering-by-default.patch

Clearing this for now until it's time.
Attachment #8879189 - Flags: review?(luke)
Comment on attachment 8879191 [details] [diff] [review]
Part 7: bug1277562-tiering-logic.patch

Review of attachment 8879191 [details] [diff] [review]:
-----------------------------------------------------------------

Nice job isolating that logic, very clean.

::: js/src/wasm/WasmCompile.cpp
@@ +20,4 @@
>  
>  #include "jsprf.h"
>  
> +#include "wasm/WasmBaselineCompile.h" // For BaselineCanCompile only

nit: we don't usually add these "for X" comments (and they often go stale as more things get implicitly pulled in over time).

@@ +132,5 @@
> +    bool baselineEnabled, debugEnabled, ionEnabled;
> +    CompilerAvailability(isAsmJS, args, &baselineEnabled, &debugEnabled, &ionEnabled);
> +
> +    return (baselineEnabled && ionEnabled && !debugEnabled) ? CompileMode::Tier1
> +                                                            : CompileMode::Unified;

nit: the magic SM style for this puts the ? on the next line, lining up with "(baselineEnabled"

@@ +166,4 @@
>      if (!DecodeModuleEnvironment(d, env.get()))
>          return nullptr;
>  
> +    CompileMode compileMode = GetPrimaryCompileMode(env.get()->isAsmJS(), args);

wasm::Compile is only called for wasm

::: js/src/wasm/WasmCompile.h
@@ +71,5 @@
> +// unavailable even if selected, if Rabaldr is unavailable or the module is not
> +// compilable by Rabaldr.
> +
> +bool
> +GetDebugEnabled(bool isAsmJS, const CompileArgs& args);

nit: here and below x2, can you change 'bool isAsmJS' to 'ModuleKind kind' and make it the second arg with a default arg of ModuleKind::Wasm so that wasm doesn't have to say anything by default?

@@ +80,5 @@
> +// second compilation.  Otherwise, the tier is "Unified" and we'll compile once,
> +// with the appropriate compiler.
> +
> +CompileMode
> +GetPrimaryCompileMode(bool isAsmJS, const CompileArgs& args);

nit: could you change "Primary" to "Initial" or "First" (which also matches your comment above)?

::: js/src/wasm/WasmGenerator.h
@@ +285,5 @@
>      ~ModuleGenerator();
>  
>      MOZ_MUST_USE bool init(UniqueModuleEnvironment env, const CompileArgs& args,
> +                           Metadata* maybeAsmJSMetadata = nullptr,
> +                           CompileMode compileMode = CompileMode::Unified);

nit: can you switch the order so 'maybeAsmJSMetadata' is the last arg (so wasm doesn't have to pass it)?

::: js/src/wasm/WasmTypes.h
@@ +367,5 @@
> +// how many times we compile it).
> +
> +enum class CompileMode
> +{
> +    Unified,

nit: how about "Once" instead of "Unified"?  ("Unified" makes me think of two things coming together, like mac unified builds.)
Attachment #8879191 - Flags: review?(luke) → review+
Comment on attachment 8879191 [details] [diff] [review]
Part 7: bug1277562-tiering-logic.patch

Review of attachment 8879191 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmGenerator.h
@@ +91,4 @@
>      const SigWithId& sig() const { return *sig_; }
>      uint32_t lineOrBytecode() const { return lineOrBytecode_; }
>      const Uint32Vector& callSiteLineNums() const { return callSiteLineNums_; }
> +    bool tier1Compiled() const { return tier1Compiled_; }

Oh, also: I think the CompileMode added to CompileTask in the next patch subsumes this field, so can you remove it?
Comment on attachment 8879194 [details] [diff] [review]
Part 8: bug1277562-separate-compilation-state.patch

Review of attachment 8879194 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/HelperThreads.h
@@ +96,5 @@
> +    // happen at a time.  This avoids race conditions on
> +    // wasmWorklist/wasmFinishedList/etc, for which there is one for each tier.
> +
> +    mozilla::Atomic<bool> wasmCompilationInProgress_tier1;
> +    mozilla::Atomic<bool> wasmCompilationInProgress_tier2;

Don't we already have to lock these various fields anyways (b/c different threads can read/write to them)?  I'd really love to kill this ancient limitation; it periodically causes bug reports when people run into.  It means that if, like Unity, you use a few tiny wasm/asm.js modules and one big one that if the tiny ones' compilation kicks off first, it can "lock out" the big one permanently.
(In reply to Lars T Hansen [:lth] from comment #32)
> Created attachment 8879188 [details] [diff] [review]
> Part 5: bug1277562-wasm-baseline-switch.patch
> 
> I think the only tricky bit here now is that some test cases need to test
> that debugging is supported, so there's new functionality for testing that.

Yury makes the observation that, even though only a few test cases currently need to guard with wasmDebuggingIsSupported for the test suite to pass, really every test case that touches the Debugger ought to have the guard.  The reason the guards are not needed right now is that only some debugger functions care what kind of code we're running.  (Stepping, breakpoints care.)  But this is brittle.
(In reply to Luke Wagner [:luke] from comment #52)
> Comment on attachment 8879194 [details] [diff] [review]
> Part 8: bug1277562-separate-compilation-state.patch
> 
> Review of attachment 8879194 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/HelperThreads.h
> @@ +96,5 @@
> > +    // happen at a time.  This avoids race conditions on
> > +    // wasmWorklist/wasmFinishedList/etc, for which there is one for each tier.
> > +
> > +    mozilla::Atomic<bool> wasmCompilationInProgress_tier1;
> > +    mozilla::Atomic<bool> wasmCompilationInProgress_tier2;
> 
> Don't we already have to lock these various fields anyways (b/c different
> threads can read/write to them)?  I'd really love to kill this ancient
> limitation; it periodically causes bug reports when people run into.  It
> means that if, like Unity, you use a few tiny wasm/asm.js modules and one
> big one that if the tiny ones' compilation kicks off first, it can "lock
> out" the big one permanently.

Yes, there is a lock, but the comment is probably a little contextual.  The "race condition" is that if we do not limit ourselves to one compilation per tier at a time, the code in ModuleGenerator::finishFuncDefs that grabs a task off the tier's finished list may get a finished task for the compilation it knows about, or a finished task for some other ongoing compilation.  That's not the end of the world but we do need to do some reengineering to allow tasks to proceed in parallel like that, to avoid that confusion.

I agree that it would be sweet to allow compiles to progress in parallel.  I think we should do this later; I can file a bug if you like.
See comment 28.  Adds Tier::Serialized.  Carry Luke's r+.
Attachment #8879179 - Attachment is obsolete: true
Attachment #8882561 - Flags: review+
Attachment #8882561 - Attachment description: bug1277562-debug-tier-is-baseline-v2.patch → Part 1: bug1277562-debug-tier-is-baseline-v2.patch
See comment 29.  Carry Luke's r+.
Attachment #8879181 - Attachment is obsolete: true
Attachment #8882562 - Flags: review+
See comment 30.  Carry Luke's r+.
Attachment #8879184 - Attachment is obsolete: true
Attachment #8882563 - Flags: review+
See comment 31.  Carry Luke's r+.
Attachment #8879186 - Attachment is obsolete: true
Attachment #8882565 - Flags: review+
See comment 32.  This factors out the code from that patch, the test cases will come next.  Carry Luke's r+.
Attachment #8879188 - Attachment is obsolete: true
Attachment #8879188 - Flags: review?(ydelendik)
Attachment #8882566 - Flags: review+
See comment 32, comment 53.  These are the test cases, with more general use of wasmDebuggingIsSupported().
Attachment #8882567 - Flags: review?(ydelendik)
See comment 34.  Carry Luke's r+.
Attachment #8879191 - Attachment is obsolete: true
Attachment #8882568 - Flags: review+
See comment 35.  Expose HelperThreads to tiers.
Attachment #8879194 - Attachment is obsolete: true
Attachment #8879194 - Flags: review?(luke)
Attachment #8882569 - Flags: review?(luke)
See comment 36.  Implement tier-2 compilation (cancellation is next patch).
Attachment #8879197 - Attachment is obsolete: true
Attachment #8879197 - Flags: review?(luke)
Attachment #8882570 - Flags: review?(luke)
See comment 37.  Tier-2 cancellation.
Attachment #8879198 - Attachment is obsolete: true
Attachment #8879198 - Flags: review?(luke)
Attachment #8882571 - Flags: review?(luke)
See comment 38.  Revamp the instance search logic.
Attachment #8879202 - Attachment is obsolete: true
Attachment #8879202 - Flags: review?(luke)
Attachment #8882572 - Flags: review?(luke)
See comment 39.
Attachment #8879210 - Attachment is obsolete: true
Attachment #8879210 - Flags: review?(luke)
Attachment #8882573 - Flags: review?(luke)
See comment 40.
Attachment #8879216 - Attachment is obsolete: true
Attachment #8879216 - Flags: review?(luke)
See comment 41.
Attachment #8879220 - Attachment is obsolete: true
Attachment #8879220 - Flags: review?(luke)
Attachment #8882578 - Flags: review?(luke)
See comment 42.
Attachment #8879224 - Attachment is obsolete: true
Attachment #8879224 - Flags: review?(luke)
Attachment #8882579 - Flags: review?(luke)
Attachment #8882574 - Flags: review?(luke)
See comment 43.
Attachment #8879225 - Attachment is obsolete: true
Attachment #8879225 - Flags: review?(luke)
Attachment #8882580 - Flags: review?(luke)
See comment 44, comment 37.  This has better (smaller, probably faster) patching code.
Attachment #8879230 - Attachment is obsolete: true
Attachment #8879230 - Flags: review?(luke)
Attachment #8882581 - Flags: review?(luke)
Sorry that should have been comment 27, not 37.
Comment on attachment 8882567 [details] [diff] [review]
Part 5b: bug1277562-adapt-test-cases-v2.patch

Review of attachment 8882567 [details] [diff] [review]:
-----------------------------------------------------------------

Changes in jit-tests/tests/debug/ looks good. I'm not sure about usage of wasmDebuggingIsSupported() in jit-tests/tests/wasm/timeout/{1,2,nterrupt-several-instances,while-profiling}.js -- can you confirm that this change is needed there?
Attachment #8882567 - Flags: review?(ydelendik) → review+
Comment on attachment 8882568 [details] [diff] [review]
Part 7: bug1277562-tiering-logic-v2.patch

Review of attachment 8882568 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmCompile.cpp
@@ +149,5 @@
> +    {
> +        return Tier::Baseline;
> +    }
> +
> +    return Tier::Ion;

Random nit re-reading this while reviewing other patch: could you MOZ_ASSERT(compileMode != Tier1) here?  Alternatively, seems like you could re-orient the logic around compileMode:
  switch (compileMode) {
    case Tier1:
      MOZ_ASSERT(baselineEnabled);
      return Baseline;
    case Tier2:
      MOZ_ASSERT(ionEnabled && !debugEnabled);
      return Ion;
    case Once:
      return (debugEnabled || !ionEnabled) ? Baseline : Ion;
  }
Comment on attachment 8882569 [details] [diff] [review]
Part 8: bug1277562-separate-compilation-state-v2.patch

Review of attachment 8882569 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/HelperThreads.h
@@ +93,5 @@
> +    wasm::CompileTaskPtrVector wasmWorklist_tier2_, wasmFinishedList_tier2_;
> +
> +    // For now, only allow a single parallel wasm compilation at each tier to
> +    // happen at a time.  This avoids race conditions on
> +    // wasmWorklist/wasmFinishedList/etc, for which there is one for each tier.

Makes sense to fix this later.  For now could you add: "TODO: remove this restriction by making work/finish lists per-compilation."
Attachment #8882569 - Flags: review?(luke) → review+
Comment on attachment 8882570 [details] [diff] [review]
Part 9: bug1277562-tier2-compilation-v2.patch

Review of attachment 8882570 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, everything very clean!  Mostly nits; just a few changes and questions that I'd like to see and understand.

::: js/src/vm/HelperThreads.cpp
@@ +1043,5 @@
>  size_t
> +GlobalHelperThreadState::maxWasmTier2CompilationThreads() const
> +{
> +    // This is the main thread, we don't want to do more than one tier2
> +    // compilation at a time.

nit: "main thread" is a bit misleading; how about "A single tier-2 module generator job spawns N compilation jobs so only allow one tier-2 module generator at a time."

@@ +1098,5 @@
>  
>      if (assumeThreadAvailable)
>          return true;
>  
> +    // For Tier1 and Unified compilation, honor the maximum allowed threads to

s/Unified/Once/

@@ +1102,5 @@
> +    // For Tier1 and Unified compilation, honor the maximum allowed threads to
> +    // compile wasm jobs at once, to avoid oversaturating the machine.
> +    //
> +    // For Tier2 compilation we need to allow other things to happen too, so for
> +    // now we only allow one thread.

Could you file a bug and add a TODO here to investigate the ideal number to allocate here?  It seems like this would be some function of cpuCount.

@@ +1108,5 @@
> +    // If Tier2 is very backlogged we must give priority to it, since the Tier2
> +    // queue holds onto Tier1 tasks.  Indeed if Tier2 is backlogged we will
> +    // devote more resources to Tier2 and not start any Tier1 work at all.
> +
> +    bool tier2oversubscribed = wasmTier2Worklist(lock).length() > 20;

I think this is the fix to the 32-bit OOM problem you mentioned earlier.  IIUC, the BackgroundWorkPossible() check in WasmCompile.cpp is to handle the single-core case (where no helper threads are making progress).  That's pretty unfortunate though since, a single-core machine will benefit the most (in absolute terms) from the baseline and it's likely that it'll have some idle cycles that will eventually allow tier-2 to complete.

I was thinking of an alternative that keeps tiering for single-core: what if GetInitialCompileMode() returned CompileMode::Once if wasmTier2Worklist.length > 20?  That way we have a hard bound on how much we're keeping alive.  With that, even single-core should be fine.

@@ +1799,5 @@
> +
> +    if (success)
> +        CompileTier2Finished(task);
> +    else
> +        CompileTier2Failed(task);

Is there a reason that these two functions need to be run outside AutoUnlockHelperThreadState?  I couldn't see any helper-thread lock dependencies and I think the logic could be simpler (and CompileTier2Task could remove all the fields under "Compilation results") if their two bodies were inlined into wasm::CompileTier2.  Oooh, the new ModuleGenerator::finish() function added by this patch could just take the tier-1 Module as an argument and updated it directly.  That saves a lot of shuffling about of the Code, Metadata, etc.

Second: these two functions delete 'task' while 'task' is still pointed to by currentTask (we only currentTask.reset() below).  How about instead handleWasmTier2Workload deletes 'task' at the end.  In that case, the helper thread system assumes ownership of the CompileTier2Task, so could you change StartOffThreadWasmTier2Compile() to take a UniquePtr<CompileTier2Task> (which tidies up the caller a tiny bit).

::: js/src/vm/HelperThreads.h
@@ +50,2 @@
>    typedef Vector<CompileTask*, 0, SystemAllocPolicy> CompileTaskPtrVector;
> +  typedef Vector<CompileTier2Task*, 0, SystemAllocPolicy> CompileTier2TaskPtrVector;

The patch uses "CompileTier2", "Tier2Compilation" and sometimes just "Tier2" to all refer to these new tasks that run a tier-2 ModuleGenerator.  This has also caused me a lot of confusion with the Tier2 CompileTasks.  How about we make a clean break and call these wasm::Tier2Generator tasks (uniformly in all the names), since they run the ModuleGenerator.

::: js/src/wasm/WasmCompile.cpp
@@ +168,5 @@
> +class CompileTier2Task
> +{
> +  public:
> +    // The module that wants the results of the compilation
> +    SharedModule            parentModule_;

nit: since there's not two modules here (parent v child), could we just name this 'module_'?  It'd shorten various expressions.

@@ +171,5 @@
> +    // The module that wants the results of the compilation
> +    SharedModule            parentModule_;
> +
> +    // The arguments for the compilation
> +    SharedCompileArgs       compileArgs_;

nit: if these are all public fields, could you take off the trailing _ and switch from 'class' to 'struct'?

@@ +181,5 @@
> +    UniqueModuleEnvironment env_;
> +
> +    CompileTier2Task(Module& parentModule, const CompileArgs& compileArgs)
> +        : parentModule_(&parentModule),
> +          compileArgs_(&compileArgs)

nit: two space indent to : and four space to member variable name

@@ +202,5 @@
> +        return false;
> +
> +    Maybe<CompileMode> compileMode(compileMode_);
> +    if (compileMode_.isNothing())
> +        compileMode = Some(GetInitialCompileMode(args));

nit: could Compile() take a `CompileMode compileMode` (no Maybe) such that wasm::Compile() would pass GetInitialCompileMode() instead of Nothing()?

@@ +235,5 @@
> +
> +void
> +wasm::MaybeStartTier2Compile(Module& parentModule, const CompileArgs& compileArgs)
> +{
> +    if (parentModule.mode() != CompileMode::Tier1)

Instead of requiring N callers to call MaybeStartTier2Compile(), could you instead have ModuleGenerator::finish() do it?

@@ +254,5 @@
> +
> +    mozilla::Unused << task.release();
> +}
> +
> +// This runs on the compilation thread.

nit: s/compilation thread/helper thread/

::: js/src/wasm/WasmGenerator.cpp
@@ +1212,5 @@
> +SharedModule
> +ModuleGenerator::finish(const ShareableBytes& bytecode)
> +{
> +    MOZ_ASSERT(!activeFuncDef_);
> +    MOZ_ASSERT(finishedFuncDefs_);

Can you MOZ_ASSERT(compileMode == Once || Tier1)?

@@ +1269,5 @@
> +                        UniqueLinkDataTier* linkData, UniqueMetadataTier* metadata,
> +                        UniqueModuleEnvironment* env)
> +{
> +    MOZ_ASSERT(!activeFuncDef_);
> +    MOZ_ASSERT(finishedFuncDefs_);

Could you MOZ_ASSERT(compileMode == Tier2)?

@@ +1283,5 @@
> +
> +    assertUsefulProperties();
> +
> +    if (!finishLinkData())
> +        return false;

All the above seems duplicated, could you factor out a finishCommon?  Or, the shared function could be finish() and these two public functions could be finishModule() and finishTier2() (which sounds nice to me)

::: js/src/wasm/WasmModule.cpp
@@ +239,5 @@
>  {
> +    if (maybeBytecodeSize) {
> +        blockOnIonCompileFinished();
> +        if (!code_->hasTier(Tier::Ion)) {
> +            *maybeBytecodeSize = 0;

The bytecode must always be saved, so I think you mean to set *maybeCompiledSize = 0 here if tier-2 compilation fails.  Could you instead move this blockOnIonCompileFinished() logic into the "if (maybeCompiledSize && !debugEnabled)" branch?

Also, nit: Tier::Serialized.

@@ +271,5 @@
>  {
> +    if (maybeBytecodeBegin) {
> +        blockOnIonCompileFinished();
> +        if (!code_->hasTier(Tier::Ion))
> +            return;

Same comment as above, except, since we can assume serializedSize() has already been computed if !!maybeCompiledSize I think you can MOZ_ASSERT(code_->hasTier(Tier::Serialized)).

@@ +317,5 @@
> +
> +    metadata().setTier2(Move(metadata2));
> +    linkData().setTier2(Move(linkData2));
> +    code().setTier2(Move(code2));
> +    for ( uint32_t i = 0; i < elemSegments_.length(); i++ )

nit: no spaces after ( and before )

::: js/src/wasm/WasmModule.h
@@ +29,1 @@
>  #include "wasm/WasmCode.h"

nit: js/TypeDecls.h through wasm/*.h are all the same category of #includes (as described in https://wiki.mozilla.org/JavaScript:SpiderMonkey:Coding_Style#.23include_ordering) so no \n between them

@@ +102,5 @@
>  
>      const LinkDataTier& linkData(Tier tier) const;
>      LinkDataTier& linkData(Tier tier);
>  
> +    LinkDataTier* takeLinkData(Tier tier) {

nit: could this and takeMetadata() return a UniquePtr?  (Then the caller would use = instead of reset.)

@@ +149,5 @@
> +    // completion of tier-2 compilation; see blockOnIonCompileFinished and
> +    // unblockOnTier2CompileFinished, below.
> +
> +    mutable Mutex                 tier2Lock;
> +    mutable ConditionVariable     tier2Cond;

nit: can you append _ to private field name?

@@ +152,5 @@
> +    mutable Mutex                 tier2Lock;
> +    mutable ConditionVariable     tier2Cond;
> +
> +    // Access mode_ only under the lock.  It will be changed from Tier1 to Tier2
> +    // once Tier2 compilation is finished, and from Tier1 to Unified if Tier2

s/Unified/Once/

@@ +155,5 @@
> +    // Access mode_ only under the lock.  It will be changed from Tier1 to Tier2
> +    // once Tier2 compilation is finished, and from Tier1 to Unified if Tier2
> +    // compilation is disabled (in testing modes) or cancelled.
> +
> +    mutable CompileMode           mode_;

Just to confirm my understanding: all 3 of these fields and the below block/unblock methods would be simply removed when I get my act together in bug 1351488?  I see one use of mode() in MaybeStartTier2Compile but my comment there would remove the need (you could use ModuleGenerator's CompileMode).

@@ +201,2 @@
>      const Code& code() const { return *code_; }
> +    const CodeSegment& code(Tier t) const { return code_->segment(t); }

nit: for regularity, could this be named codeSegment()?
Comment on attachment 8882571 [details] [diff] [review]
Part 10: bug1277562-tier2-cancellation-v2.patch

Review of attachment 8882571 [details] [diff] [review]:
-----------------------------------------------------------------

So I'm trying to understand what are the hard constraints here and why we can't just set the terminate flag and let things just sorta fail.  IIUC, one hard constraint is we need to notify the module's condvar if it's waiting on serialization, but that will all be removed eminently with bug 1351488.  I'm guessing another is that a tier-2 generator task is long-running so maybe it needs to poll the terminate flag in-between dispatching jobs?  Is there anything else?
Comment on attachment 8882572 [details] [diff] [review]
Part 11: bug1277562-instance-search.patch

Review of attachment 8882572 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmCompartment.cpp
@@ +52,5 @@
> +
> +        // Compare by the first tier, always.
> +
> +        Tier instanceTier = instance->code().anyTier();
> +        Tier targetTier = target.code().anyTier();

Just going on the name, "anyTier" sounds like it may change over time, breaking the ordering.  Could there be a firstTier() that returned the first tier the module was constructed with?

@@ +114,5 @@
>          return nullptr;
>  
> +    // Linear search because instances are only ordered by their first tiers,
> +    // but may have two.  This code should not be hot anyway, we avoid
> +    // lookupCode when we can.

So is bug 1365069 landing with this, or could you perhaps fold it into this patch?
Attachment #8882572 - Flags: review?(luke) → review+
Comment on attachment 8882573 [details] [diff] [review]
Part 12: bug1277562-no-more-TBD.patch

Review of attachment 8882573 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmInstance.cpp
@@ +820,5 @@
> +        const FuncImport& fi = metadata(t).funcImports[funcImportIndex];
> +        FuncImportTls& import = funcImportTls(fi);
> +        if (import.code == codeBase(t) + fi.jitExitCodeOffset()) {
> +            import.code = codeBase(t) + fi.interpExitCodeOffset();
> +            import.baselineScript = nullptr;

I think you should be able to unconditionally set import.code to bestTier here (the FuncImport is in the same tls data location regardless of tier and the bestTier won't ever go away).
Attachment #8882573 - Flags: review?(luke) → review+
Comment on attachment 8882574 [details] [diff] [review]
Part 13: bug1277562-no-redundant-dependency-add.patch

Review of attachment 8882574 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmInstance.cpp
@@ +247,5 @@
>      // Let's optimize it!
> +
> +    // The import may already have been optimized at a lower tier.  If so, we'll
> +    // compute a new code pointer and install that, but the dependency already
> +    // exists in the baseline compiler's tables and should not be added again.

Normally it shouldn't be possible to end up in Instance::callImport if the import was optimized to a lower tier since we wouldn't be in C++ callImport in the first place.  I think the only way this can happen is the Call() (into JS) above *reentrantly* optimized this call while we are on the stack.  This case is supposed to have been caught by the 'if' above commented "// The import may already have become optimized.", so I think the bug is that it needs to check against both tiers (perhaps with a (for t : tiers()) loop) and if either match, 'return true' as we do now.
Attachment #8882578 - Flags: review?(luke) → review+
Comment on attachment 8882580 [details] [diff] [review]
Part 16: bug1277562-debugging-code-v2.patch

Review of attachment 8882580 [details] [diff] [review]:
-----------------------------------------------------------------

These seem like useful options, especially for fuzzing.  If you want to land this, could you change from command-line args and environment variables to first-class arguments to JS shell functions, and then add some interesting JS tests in jit-tests/tests/wasm (which ends up seeding the fuzzer that mashes test cases together randomly).
Attachment #8882580 - Flags: review?(luke)
Comment on attachment 8882581 [details] [diff] [review]
Part 17: bug1277562-patching-v2.patch

Review of attachment 8882581 [details] [diff] [review]:
-----------------------------------------------------------------

You're right; with all the preceding work, the final fun patch is just a corollary :)  Very nice work on this whole stack!

::: js/src/wasm/WasmCode.h
@@ +448,5 @@
> +struct FreeJumpTable {
> +    void operator()(uintptr_t* table) { js_free(table); }
> +};
> +
> +typedef UniquePtr<uintptr_t, FreeJumpTable> UniqueJumpTable;

nit: I think you can use JS::FreePolicy instead.

@@ +463,5 @@
>      SharedMetadata                      metadata_;
>      ExclusiveData<CacheableCharsVector> profilingLabels_;
> +    UniqueJumpTable                     jumpTable_;
> +    uint32_t                            jumpNumImports_;
> +    uint32_t                            jumpNumFunctions_;

I think jumpNum* are dead fields and can be removed.  (If they are needed later, it seems like we should just make them jumpTable-agnostic fields of Metadata; I'm surprised we don't save those already...)

::: js/src/wasm/WasmFrameIterator.cpp
@@ +330,5 @@
>  
>  static void
>  GenerateCallablePrologue(MacroAssembler& masm, unsigned framePushed, ExitReason reason,
> +                         uint32_t* entry, uint32_t* tierEntry, uintptr_t* jumpTable,
> +                         uint32_t jumpTableIndex)

IIUC, the jumpTable is threaded from generator->task->baseline->here and ultimately it's just being used as a boolean below.  Could you instead remove the jumpTable along that whole path above and have these Generate*Prologue functions take Tier and funcIndex instead (both of which the CompileTask already has)?

::: js/src/wasm/WasmGenerator.cpp
@@ +1241,5 @@
> +    uint32_t tableSize = env_->numFuncImports() + env_->numFuncDefs();
> +    uintptr_t* jumpTable = &*jumpTable_;
> +    for (uint32_t i = env_->numFuncImports(); i < tableSize; i++) {
> +        MOZ_ASSERT(jumpTable[i]);
> +        jumpTable[i] += p;

... and with that, I think we can make the creation of jumpTable more functional by:
 - initializing jumpTable in one shot by iterating over CodeRanges and if cr.isFunction(), setting `jumpTable[cr.funcIndex] = codeBase + cr.funcTierEntry()`
 - renaming this function createJumpTable() and having it allocate, initialize and return a UniqueJumpTable
 - having Code's ctor take a UniqueJumpTable maybeJumpTable instead of the setJumpTable

@@ +1288,5 @@
>          if (!maybeDebuggingBytes)
>              return nullptr;
>      }
>  
> +    MutableCode code = js_new<Code>(Move(codeSegment), *metadata_);

... in which case this could stay SharedCode
Attachment #8882581 - Flags: review?(luke) → review+
Comment on attachment 8882570 [details] [diff] [review]
Part 9: bug1277562-tier2-compilation-v2.patch

Done with this round of reviews!  For clarity, clearing r? on all patches which I've reviewed with comments and just waiting to see revised patch one more time.
Attachment #8882570 - Flags: review?(luke)
Attachment #8882571 - Flags: review?(luke)
Attachment #8882574 - Flags: review?(luke)
Attachment #8882579 - Flags: review?(luke)
(In reply to Yury Delendik (:yury) from comment #73)
> Comment on attachment 8882567 [details] [diff] [review]
> Part 5b: bug1277562-adapt-test-cases-v2.patch
> 
> Review of attachment 8882567 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Changes in jit-tests/tests/debug/ looks good. I'm not sure about usage of
> wasmDebuggingIsSupported() in
> jit-tests/tests/wasm/timeout/{1,2,nterrupt-several-instances,while-
> profiling}.js -- can you confirm that this change is needed there?

Those changes are not needed, these files were just caught up in the dragnet.  Will fix.  Thanks.
Depends on: 1380033
(In reply to Luke Wagner [:luke] from comment #76)
> Comment on attachment 8882570 [details] [diff] [review]
> Part 9: bug1277562-tier2-compilation-v2.patch

(As usual, if I'm not replying to it you should expect it's happening.)

> 
> @@ +1102,5 @@
> > +    // For Tier1 and Unified compilation, honor the maximum allowed threads to
> > +    // compile wasm jobs at once, to avoid oversaturating the machine.
> > +    //
> > +    // For Tier2 compilation we need to allow other things to happen too, so for
> > +    // now we only allow one thread.
> 
> Could you file a bug and add a TODO here to investigate the ideal number to
> allocate here?  It seems like this would be some function of cpuCount.

Bug 1380033.

> @@ +1108,5 @@
> > +    // If Tier2 is very backlogged we must give priority to it, since the Tier2
> > +    // queue holds onto Tier1 tasks.  Indeed if Tier2 is backlogged we will
> > +    // devote more resources to Tier2 and not start any Tier1 work at all.
> > +
> > +    bool tier2oversubscribed = wasmTier2Worklist(lock).length() > 20;
> 
> I think this is the fix to the 32-bit OOM problem you mentioned earlier. 
> IIUC, the BackgroundWorkPossible() check in WasmCompile.cpp is to handle the
> single-core case (where no helper threads are making progress).  That's
> pretty unfortunate though since, a single-core machine will benefit the most
> (in absolute terms) from the baseline and it's likely that it'll have some
> idle cycles that will eventually allow tier-2 to complete.
> 
> I was thinking of an alternative that keeps tiering for single-core: what if
> GetInitialCompileMode() returned CompileMode::Once if
> wasmTier2Worklist.length > 20?  That way we have a hard bound on how much
> we're keeping alive.  With that, even single-core should be fine.

The definition of BackgroundWorkPossible reflects preexisting code in ModuleGenerator::startFuncDefs(), which will only background-compile under these same conditions.

I agree that disabling tiering for single-core systems is not great.  At the same time, single-core systems are now below 4% of our user population and dropping, it has fallen by half in the last year.

Instead of futzing around with this policy now I will add this remark to bug 1380033 (see above), and we can consider it as part of that work.

> 
> @@ +1799,5 @@
> > +
> > +    if (success)
> > +        CompileTier2Finished(task);
> > +    else
> > +        CompileTier2Failed(task);
> 
> Second: these two functions delete 'task' while 'task' is still pointed to
> by currentTask (we only currentTask.reset() below).  How about instead
> handleWasmTier2Workload deletes 'task' at the end.  

Ho hum, I had desired not to expose the structure Tier2GenereratorTask (as it is called now) to this code, without which no js_delete.  I'm still mulling this over.

> @@ +155,5 @@
> > +    // Access mode_ only under the lock.  It will be changed from Tier1 to Tier2
> > +    // once Tier2 compilation is finished, and from Tier1 to Unified if Tier2
> > +    // compilation is disabled (in testing modes) or cancelled.
> > +
> > +    mutable CompileMode           mode_;
> 
> Just to confirm my understanding: all 3 of these fields and the below
> block/unblock methods would be simply removed when I get my act together in
> bug 1351488?  I see one use of mode() in MaybeStartTier2Compile but my
> comment there would remove the need (you could use ModuleGenerator's
> CompileMode).

Certainly the lock and cond will be removed; the mode we'll have to see about.
(In reply to Luke Wagner [:luke] from comment #77)
> Comment on attachment 8882571 [details] [diff] [review]
> Part 10: bug1277562-tier2-cancellation-v2.patch
> 
> Review of attachment 8882571 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> So I'm trying to understand what are the hard constraints here and why we
> can't just set the terminate flag and let things just sorta fail.  IIUC, one
> hard constraint is we need to notify the module's condvar if it's waiting on
> serialization, but that will all be removed eminently with bug 1351488.  I'm
> guessing another is that a tier-2 generator task is long-running so maybe it
> needs to poll the terminate flag in-between dispatching jobs?  Is there
> anything else?

As you've figured out, the motivation is to avoid "sorta fail", which is just plain brittle, and making sure long-running computations stop in a tidy way and quickly.  Just setting the terminate flag is not tidy, we really do want to wind down the active computation to ensure that loops that wait for threads to terminate and so on exit reliably.  I did try to set the flag and bail out when it was set; I had various failures from that; I could fix them but it just seemed hacky; what I have here is, I think, fairly clean and predictable, even if the dummy values in the FuncOffsets aren't particularly pretty.
Depends on: 1379814
(In reply to Luke Wagner [:luke] from comment #78)
> Comment on attachment 8882572 [details] [diff] [review]
> Part 11: bug1277562-instance-search.patch
> 
> Review of attachment 8882572 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/wasm/WasmCompartment.cpp
> @@ +52,5 @@
> > +
> > +        // Compare by the first tier, always.
> > +
> > +        Tier instanceTier = instance->code().anyTier();
> > +        Tier targetTier = target.code().anyTier();
> 
> Just going on the name, "anyTier" sounds like it may change over time,
> breaking the ordering.  

Indeed it does.

> Could there be a firstTier() that returned the first
> tier the module was constructed with?

There was one, until comment 13 when you made me change it to "anyTier" :)

> 
> @@ +114,5 @@
> >          return nullptr;
> >  
> > +    // Linear search because instances are only ordered by their first tiers,
> > +    // but may have two.  This code should not be hot anyway, we avoid
> > +    // lookupCode when we can.
> 
> So is bug 1365069 landing with this, or could you perhaps fold it into this
> patch?

I will fold it in here.
(In reply to Luke Wagner [:luke] from comment #79)
> Comment on attachment 8882573 [details] [diff] [review]
> Part 12: bug1277562-no-more-TBD.patch
> 
> ::: js/src/wasm/WasmInstance.cpp
> @@ +820,5 @@
> > +        const FuncImport& fi = metadata(t).funcImports[funcImportIndex];
> > +        FuncImportTls& import = funcImportTls(fi);
> > +        if (import.code == codeBase(t) + fi.jitExitCodeOffset()) {
> > +            import.code = codeBase(t) + fi.interpExitCodeOffset();
> > +            import.baselineScript = nullptr;
> 
> I think you should be able to unconditionally set import.code to bestTier
> here (the FuncImport is in the same tls data location regardless of tier and
> the bestTier won't ever go away).

You're thinking eg like this?

    const FuncImport& fi = metadata(code().anyTier()).funcImports[funcImportIndex];
    FuncImportTls& import = funcImportTls(fi);
    import.code = codeBase(code().bestTier()) + fi.interpExitCodeOffset();
    import.baselineScript = nullptr;

Testing bears this out.  I've had a couple of nonreproducible failures but really nothing to suggest this should be the problem (only that I need to be testing and debugging more...)
(In reply to Lars T Hansen [:lth] from comment #87)
> > Could there be a firstTier() that returned the first
> > tier the module was constructed with?
> 
> There was one, until comment 13 when you made me change it to "anyTier" :)

Hehe, yes.  But in that case, we really were fully unconstrained (such that returning a different tier each time would have been fine).  My goal here is just to capture the precise assumption we're making when querying tier so it's easy to respect in the future.  But actually my suggestion of "firstTier" doesn't achieve that either.  Perhaps anyStableTier()?

(In reply to Lars T Hansen [:lth] from comment #89)
> You're thinking eg like this?
> 
>     const FuncImport& fi =
> metadata(code().anyTier()).funcImports[funcImportIndex];
>     FuncImportTls& import = funcImportTls(fi);
>     import.code = codeBase(code().bestTier()) + fi.interpExitCodeOffset();
>     import.baselineScript = nullptr;

Almost: I think the metadata(code().anyTier()) needs to use bestTier() instead so there is a later use of fi.interpExitCodeOffset() that is tier-variant.
(In reply to Luke Wagner [:luke] from comment #90)
> (In reply to Lars T Hansen [:lth] from comment #87)
> > > Could there be a firstTier() that returned the first
> > > tier the module was constructed with?
> > 
> > There was one, until comment 13 when you made me change it to "anyTier" :)
> 
> Hehe, yes.  But in that case, we really were fully unconstrained (such that
> returning a different tier each time would have been fine).  My goal here is
> just to capture the precise assumption we're making when querying tier so
> it's easy to respect in the future.  But actually my suggestion of
> "firstTier" doesn't achieve that either.  Perhaps anyStableTier()?

I agree, firstTier() was never completely satisfactory.  Maybe just "stableTier()" to be explicit?  The "any" still does not ring correctly in my ears, and it's an invariant that there is a stable tier present.
stableTier() sgtm!
(In reply to Luke Wagner [:luke] from comment #82)
> Comment on attachment 8882581 [details] [diff] [review]
> Part 17: bug1277562-patching-v2.patch
> 
> ::: js/src/wasm/WasmFrameIterator.cpp
> @@ +330,5 @@
> >  
> >  static void
> >  GenerateCallablePrologue(MacroAssembler& masm, unsigned framePushed, ExitReason reason,
> > +                         uint32_t* entry, uint32_t* tierEntry, uintptr_t* jumpTable,
> > +                         uint32_t jumpTableIndex)
> 
> IIUC, the jumpTable is threaded from generator->task->baseline->here and ...
> 
> ... and with that, I think we can make the creation of jumpTable more functional ...
>
> ... in which case this could stay SharedCode

A winner.
Blocks: 1388785
Rename anyTier() as stableTier().  Carrying Luke's implied r+ (comment 92).
Attachment #8895792 - Flags: review+
See comment 55.  Carry Luke's r+.
Attachment #8882561 - Attachment is obsolete: true
Attachment #8895793 - Flags: review+
See comment 56.  Carry Luke's r+.
Attachment #8882562 - Attachment is obsolete: true
See comment 57.  Carry Luke's r+.
Attachment #8882563 - Attachment is obsolete: true
Attachment #8895795 - Flags: review+
Attachment #8895794 - Flags: review+
See comment 48.  Carry Luke's r+.
Attachment #8882565 - Attachment is obsolete: true
Attachment #8895796 - Flags: review+
See comment 59.  Carry Luke's r+.
Attachment #8882566 - Attachment is obsolete: true
Attachment #8895798 - Flags: review+
See comment 60.  Carry Yury's r+.
Attachment #8882567 - Attachment is obsolete: true
Attachment #8895799 - Flags: review+
Set javascript.options.wasm_baselinejit to true by default.  Not intended to land at this time.
Attachment #8879189 - Attachment is obsolete: true
See comment 61.  Carry Luke's r+.
Attachment #8882568 - Attachment is obsolete: true
Attachment #8895804 - Flags: review+
See comment 62.  Carry Luke's r+.
Attachment #8882569 - Attachment is obsolete: true
Attachment #8895805 - Flags: review+
See comment 63, comment 76, comment 85.

I believe I have followed your suggestion on ownership and management of the Tier2GeneratorTask structure.  This required either (a) exposing the Tier2GeneratorTask very widely or (b) adding a DeleteTier2Generator function to be used for this purpose; I went with (b).
Attachment #8882570 - Attachment is obsolete: true
Attachment #8895806 - Flags: review?(luke)
See comment 64, comment 77, comment 86.

This is probably a little cleaner now, as a result of cleanup elsewhere.  It still follows the same logic, where the shutdown code kills future compilation work, and then flags ongoing work as cancelled and waits for that work to complete, being careful not to allow worker threads to be reaped until that work is actually done.
Attachment #8882571 - Attachment is obsolete: true
Attachment #8895807 - Flags: review?(luke)
See comment 65, comment 78.  As per comment 78, anyTier was renamed to stableTier by Part 0.  Carrying Luke's r+.
Attachment #8882572 - Attachment is obsolete: true
Attachment #8895808 - Flags: review+
See comment 66, comment 79.  Carrying Luke's r+.
Attachment #8882573 - Attachment is obsolete: true
Attachment #8895811 - Flags: review+
See comment 67, comment 80.  Your hypothesis in comment 80 appears to work just fine.
Attachment #8882574 - Attachment is obsolete: true
Attachment #8895819 - Flags: review?(luke)
See comment 68.  Carrying Luke's r+.
Attachment #8882578 - Attachment is obsolete: true
Attachment #8895822 - Flags: review+
The extract-wasm-code test cases should make use of the new API in patch 14.  That API allows clients to specify which of the compiled blobs in a tiered setting - baseline or ion or one of several other tokens - to receive.  The IndexedDB tests all should ask for "ion", since that's the only code they can use.

Luke, can you review the changed jit-test here?

Jan, can you review the changed indexedDB tests here?
Attachment #8882579 - Attachment is obsolete: true
Attachment #8895826 - Flags: review?(luke)
Attachment #8895826 - Flags: review?(jvarga)
Code in SpiderMonkey to allow ion to be disabled separately for wasm.  During some testing with --no-ion I found that disabling Ion for both wasm and JS really skewed my results, and I'd like to be able to control it separately for the two cases, just as we control baseline compilation separately for them.
Attachment #8882580 - Attachment is obsolete: true
Attachment #8895828 - Flags: review?(luke)
A little code to propagate the new javascript.option.wasm_ionjit flag through DOM and xpconnect, and notably, to whitelist it in dom/ipc/ContentPrefs.cpp, for which I need DOM peer review.

(See Part 16a for background if desired, but this should be pretty standard.)
Attachment #8895831 - Flags: review?(wmccloskey)
Make the tiering debugging modes (await-tier2, delay-tier2, disable-tier2) uniformly available as options to the JS shell and store their values in the context options.  I've found a lot of bugs with these so we should try to preserve them, IMO.

The next patch will provide testing functions to set them from JS.

I've factored out the code that set these values from environment variables.  That functionality is valuable for testing in the browser, but it does not need to land (yet, or at all).  See next-next patch.
Attachment #8895832 - Flags: review?(luke)
Control the debug settings from testing functions.  Also some very simple test cases to exercise those functions.

You previously remarked that test cases can be used to seed fuzzing.  I don't know if these are good enough for that, but I'll follow up on it and discuss the matter with :decoder.
Attachment #8895833 - Flags: review?(luke)
Environment variables.  Not intending to land this patch at this time.
See comment 71, comment 82, comment 93.  Carrying Luke's r+.
Attachment #8882581 - Attachment is obsolete: true
Attachment #8895835 - Flags: review+
Comment on attachment 8895826 [details] [diff] [review]
Part 15: bug1277562-wasm-extractCode-test-cases-v2.patch

Review of attachment 8895826 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8895826 - Flags: review?(jvarga) → review+
Attachment #8895831 - Flags: review?(wmccloskey) → review+
Comment on attachment 8895806 [details] [diff] [review]
Part 9: bug1277562-tier2-compilation-v3.patch

Review of attachment 8895806 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!  Just a few tiny requests:

::: js/src/vm/HelperThreads.cpp
@@ +1127,5 @@
> +    //
> +    // For Tier2 compilation we need to allow other things to happen too, so for
> +    // now we only allow one thread.
> +    //
> +    // TODO: We should investigate more intelligent strategies, see bug 1280033.

nit: bug 1380033

@@ +1822,5 @@
> +    // because all error checking was performed by the initial compilation.
> +    mozilla::Unused << success;
> +
> +    // Notify the main thread in case it's waiting.
> +    HelperThreadState().notifyAll(GlobalHelperThreadState::CONSUMER, locked);

I could be wrong but I don't *think* this notify is necessary (since we're not pushing a result into any sort of finished list).  That's the assumption I'm making in my patch in bug 1347644 (in handlePromiseHelperTaskWorkload which, similarly, doesn't stick anything into a Vector anyone is wait()ing on).  If you can think of a reason, I'd be most interested, though.

::: js/src/wasm/WasmCompile.cpp
@@ +236,5 @@
>      MOZ_ASSERT(!*error, "unreported error in decoding");
>  
> +    SharedModule module = mg.finishModule(bytecode);
> +
> +    if (mode == CompileMode::Tier1 && module) {

nit: could you have "if (!module) return nullptr" after the finishModule() call and remove this "&& module" here?

@@ +258,5 @@
>  }
> +
> +// This runs on a helper thread.
> +bool
> +wasm::Tier2Generator(Tier2GeneratorTask* task)

nit: It'd be nice if this function name was an action.  How about GenerateTier2?

::: js/src/wasm/WasmJS.cpp
@@ +387,5 @@
>  
> +    MutableCompileArgs compileArgs = js_new<CompileArgs>();
> +    if (!compileArgs)
> +        return false;
> +    if (!compileArgs->initFromContext(cx, Move(scriptedCaller)))

nit: here, and in AsmJS.cpp, can you "if (!compileArgs || !compileArgs->initFromContext(...)) return false"?

@@ +890,5 @@
>          return false;
>  
> +    MutableCompileArgs compileArgs = js_new<CompileArgs>();
> +    if (!compileArgs) {
> +        ReportOutOfMemory(cx);

nit: can you use cx->new_ (which will report for you so you can just 'return false').

@@ +2114,1 @@
>              return false;

cx->new_ so it reports

::: js/src/wasm/WasmModule.cpp
@@ +302,5 @@
>  }
>  
> +void
> +Module::finishTier2Generator(UniqueLinkDataTier linkData2, UniqueMetadataTier metadata2,
> +                           UniqueConstCodeSegment code2, UniqueModuleEnvironment env2)

nit: column alignment

@@ +322,5 @@
> +CompileMode
> +Module::mode() const
> +{
> +    LockGuard<Mutex> l(tier2Lock_);
> +    return mode_;

nit: from a quick scan, I think this method is dead now
Attachment #8895806 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #118)
> > +    HelperThreadState().notifyAll(GlobalHelperThreadState::CONSUMER, locked);
> 
> I could be wrong but I don't *think* this notify is necessary [...]

n/m, I see the notify is needed by the next patch.
Comment on attachment 8895807 [details] [diff] [review]
Part 10: bug1277562-tier2-cancellation-v3.patch

Review of attachment 8895807 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, agreed on general need to flush these out when shutting down; some questions, though:

::: js/src/vm/HelperThreads.cpp
@@ +142,5 @@
> +        wasm::CompileTaskPtrVector& worklist =
> +            HelperThreadState().wasmWorklist(lock, wasm::CompileMode::Tier2);
> +        for (size_t i = 0; i < worklist.length(); i++) {
> +            wasm::CompileTask* task = worklist[i];
> +            CancelPendingCompileTask(task);

I don't think this loop (and the whole CompileTask-cancellation path in general) is necessary, given that, at the end of this function, we know all Tier2Generator tasks are finished.  If we make tier2 cancellation work (in a separate comment), I wouldn't think cancelling compilation at the individual function level is a big win.

@@ +148,5 @@
> +    }
> +
> +    // If there is a running Tier2 generator task, shut it down in a predictable
> +    // way.  There is at most one of these.  It will be deleted by the normal
> +    // deletion logic.

Could you make a symbolic constant MaxTier2GeneratorTasks that is returned by canStartTier2GeneratorTask() and static_assert()ed here to be 1?

@@ +159,5 @@
> +        }
> +    }
> +
> +    // Wake up any sleeping Tier2Generator task.
> +    HelperThreadState().notifyAll(GlobalHelperThreadState::CONSUMER, lock);

With CompileTask cancellation removed, is this necessary?  If so, can you comment?

::: js/src/vm/HelperThreads.h
@@ +243,5 @@
> +    void incWasmTier2GeneratorsFinished() {
> +        wasmTier2GeneratorsFinished_++;
> +    }
> +
> +    uint32_t wasmTier2GeneratorsFinished() const {

Can you have these two methods take AutoLockHelperThreadState&?

::: js/src/wasm/WasmCompile.cpp
@@ +280,5 @@
> +// system.
> +void
> +wasm::CancelTier2GeneratorTask(Tier2GeneratorTask* task)
> +{
> +    task->cancelled = true;

I think this cancelled_ field is dead?  Given that CancelOffThreadWasmTier2Generator() waits for all Tier2Generator tasks to finish, I think any sort of cancellation is purely an optimization.  Since Ion compilation can take many seconds, though, I think it's a good optimization.  To make this work, I think:
 1. cancelled_ needs to be Atomic<bool>
 2. ModuleGenerator could take a 'const Atomic<bool>&' which it would poll and 'return false' if true from inside finishOutstandingTask()
Comment on attachment 8895819 [details] [diff] [review]
Part 13: bug1277562-no-redundant-dependency-add-v2.patch

Review of attachment 8895819 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmInstance.cpp
@@ +192,5 @@
>  
> +    // The import may already have become optimized, at either tier, and if so
> +    // we bail here.  If it was optimized at the Baseline tier but the Ion tier
> +    // has since become available we will not reoptimize here, only later (if at
> +    // all) after we have been through deoptimization.

nit: could you keep the comment the same as before?  "bailing" sounds like we're giving up or that this is a bad case.  Also, the second sentence, while true, seems unnecessary to mention since we're just mentioning that we're *not* doing an *additional* (very minor) optimization.
Attachment #8895819 - Flags: review?(luke) → review+
Attachment #8895826 - Flags: review?(luke) → review+
Attachment #8895828 - Flags: review?(luke) → review+
Comment on attachment 8895832 [details] [diff] [review]
Part 16c: bug1277562-debugging-switches-to-shell.patch

Review of attachment 8895832 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmCompile.cpp
@@ +268,5 @@
>  
>          mozilla::Unused << task.release();
> +
> +        if (args.awaitTier2)
> +            module->blockOnIonCompileFinished();

I see that this would be useful for testing, but I *really* want to kill this entire block/unblock system, the Mutex, CondVar and mode_, with bug 1351488.  What might be even power powerful is a shell function finishTier2Compilation() that did waits on the helper thread tier2generator job queue.  This could allow quite interesting interleavings and perhaps could be merged with the current CancelTier2GeneratorTasks() function (making "cancellation" a boolean flag).

@@ +285,5 @@
> +    if (task->compileArgs->delayTier2) {
> +        Mutex             m(js::mutexid::WasmTier2GeneratorComplete); // Any ID will do
> +        ConditionVariable c;
> +        LockGuard<Mutex>  guard(m);
> +        c.wait_for(guard, mozilla::TimeDuration::FromMilliseconds(task->compileArgs->delayTier2));

Hah, that's one way to do a portable sleep.  In theory we can use C++11 and thus std::this_thread::sleep_for; perhaps worth a try push?

::: js/src/wasm/WasmCompile.h
@@ +42,5 @@
>      bool baselineEnabled;
>      bool debugEnabled;
>      bool ionEnabled;
> +    bool awaitTier2;            // For testing
> +    bool disableTier2;          // For testing

Is disableTier2 necessary now that one can disable ion or baseline independently?
Attachment #8895832 - Flags: review?(luke)
Attachment #8895807 - Flags: review?(luke)
Comment on attachment 8895833 [details] [diff] [review]
Part 16d: bug1277562-debugging-functions-in-shell.patch

(Cancelling review when contingent on responses for clarity.)
Attachment #8895833 - Flags: review?(luke)
Updated part 9 for context.  Carrying Luke's r+.
Attachment #8895806 - Attachment is obsolete: true
Attachment #8897885 - Flags: review+
I think this is what you had in mind.  Testing has not turned up any problems so far.

A discussion could be had about the best places to put the test for cancellation.  I have added it two places: in launchBatchCompile (function-unit compilation will not start, and the counter of outstanding tasks will not be incremented, if compilation has been cancelled) and in finishTier2 (we will not create any results if compilation has been cancelled).  Together these should shut down compilation quickly and reliably.  The latter test could instead be hoisted into the only caller of finishTier2, namely, GenerateTier2() in WasmCompile.cpp, but to me it feels better where it is.  And more cancellation tests could be added, though I'm not sure they'd make a material difference in cancellation promptness.

(Leaving the old Part 10 patch visible for comparison.)
Attachment #8897889 - Flags: review?(luke)
Comment on attachment 8897889 [details] [diff] [review]
Part 10: bug1277562-tier2-cancellation-v4.patch

Review of attachment 8897889 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!

::: js/src/vm/HelperThreads.cpp
@@ +153,5 @@
> +    // Wait for the generator task to finish.  This avoids a shutdown race where
> +    // the shutdown code is trying to shut down helper threads and the ongoing
> +    // tier2 compilation is trying to finish, which requires it to have access
> +    // to helper threads.
> +    if (mustWait) {

nit: since the whole function is now under a single lock, how about removing 'mustWait' and sticking the body of this 'if' into the above for loop directly?

::: js/src/wasm/WasmCompile.cpp
@@ +189,3 @@
>      SharedCompileArgs       compileArgs;
>  
> +    // A flat that is set by the cancellation code and checked by any running

s/flat/flag/
Attachment #8897889 - Flags: review?(luke) → review+
Blocks: 1391196
Summary: Wasm: tiered compilation → Wasm: tiered compiler
Blocks: 1391197
Attachment #8895802 - Attachment is obsolete: true
Attachment #8895807 - Attachment is obsolete: true
Attachment #8895832 - Attachment is obsolete: true
Attachment #8895833 - Attachment is obsolete: true
Attachment #8895834 - Attachment is obsolete: true
No longer depends on: 1351488, 1380033
No longer blocks: 1365069
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/30d63997e262
Part 0: Rename anyTier as stableTier.  r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/4cdb90db1896
Part 1: Equate Tier::Debug with Tier::Baseline and Tier::Serialized with Tier::Ion.  r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b6c6d929cb3
Part 2: Data structure support for tier-variant data, for second tier. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/6e0202e06c35
Part 3: Tiering for elemSegments, because they contain CodeRange indices. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb7cc30b1c01
Part 4: Implement shared hasTier2 flag. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ea44ef0c07c
Part 5a: Change the meaning of the wasm-baseline switch. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/b425ac7ad630
Part 5b: Adapt test cases. r=yury
https://hg.mozilla.org/integration/mozilla-inbound/rev/293183f088dc
Part 7: Tiering control logic. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/21129f558137
Part 8: Track wasm compilation state separately for tiers. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f4d52995594
Part 9: Add Wasm Tier 2 compilation tasks. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8b7771cce0d
Part 10: Cancel background tier2 compilation correctly. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/85576ebe9c9c
Part 11: Adapt to tiering in instance list search. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8382039eedb
Part 12: Get rid of Tier::TBD. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/4573742fb429
Part 13: Guard against re-adding an import dependency when reoptimizing for tier2. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/fbb56bd3adad
Part 14: Make wasmExtractCode take a tier parameter. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/7115b14836da
Part 15: Pass tier parameter to wasmExtractCode - update test cases. r=luke, r=janv
https://hg.mozilla.org/integration/mozilla-inbound/rev/158b333a0a89
Part 16a: Make it possible to disable ion for wasm separately from ion for JS. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/02a265554b08
Part 16b: Implement javascript.options.wasm_ionjit. r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb4c86e50ed4
Part 17: Make Baseline code patchable, and patch in Ion code when available. r=luke
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cfeac44a1e77
Fix merge bug, pass isMaster=true when checking whether we can start a new tier2 generator. r=me
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: