Closed Bug 774119 Opened 12 years ago Closed 3 months ago

cpg makes DOS emulator demo very very slow

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox16 - ---
firefox17 - ---
firefox18 - ---

People

(Reporter: bzbarsky, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: regression, Whiteboard: [js:p2])

Attachments

(2 files)

See bug 773936 comment 1 and bug 773936 comment 2.

In brief, as far as I can tell this demo got 20x or more slower with cpg.  Every single setElem/getElem on a typed array, instead of being inlined by the jit, now has to go through a cross-compartment wrapper.  :(
Blocks: 773936
So per discussion with bholley on irc.

1)  We should fix bug 625199.  I can't add it as a dep because then we get a dependency
    loop.  :(
2)  We should consider adding JIT fast-paths for CrossCompartmentWrapper.  Apparently,
    this is supposed to always be completely transparent, so if we have a proxy and its
    handler is &CrossCompartmentWrapper::singleton we can just perform the same exact
    access on the wrapped object.  The only thing CrossCompartmentWrapper needs to do is
    enter the target compartment.

If adding general fast-paths for CrossCompartmentWrapper is hard, we may be able to get pretty far in this bug with just adding a special case for indexed access on a CrossCompartmentWrapper around a typed array.  In that situation, we can just do the access on the underlying typed array, I think; we may even be able to get away with not entering compartments if typed arrays don't depend on that.
A good question is what parts of the above we can do for 16...
That all sounds reasonable to me. Can some jit junkies weigh in?
So, a few things here:

- The general unwrapping thing should be doable for inline caches.  We wouldn't be able to generate caches which call getter hooks, unless someone wants to generate jitcode for AutoEnterCompartment (not raising my hand).  But normal caches would be fine; we'd unwrap the object, and then call special versions of the ic::GetProp etc. functions which compare cx->compartment to obj->compartment() and AutoEnterCompartment if necessary (and know not to generate getter hook caches).  We'd distinguish accesses which may need unwrapping from others with a profiling bit, as we already do in several cases.

- Typed arrays would need a completely different mechanism.  We don't use inline caches for accesses on them (plain JM does have a typed array IC, but it's incompatible with JM+TI regalloc and is very slow vs. typed paths anyways).  So we'd have to statically know when we're accessing a transparent wrapper for a typed array.  TI currently knows nothing at all about proxies, but we could change things so that there are special type objects for use by typed array wrappers, and infer the points those are accessed.  This would only work if the typed array wrappers can't be JSObject::swap()'ed with other objects (changing their type and other underlying properties) or otherwise have their handler, wrapped object or any other information change.  Is this the case?
(In reply to Brian Hackett (:bhackett) from comment #4)
> This would only work if the typed array wrappers can't be
> JSObject::swap()'ed with other objects (changing their type and other
> underlying properties) or otherwise have their handler, wrapped object or
> any other information change.  Is this the case?

AFAIK we don't currently have any reason to swap typed arrays (generally we only need to swap proxies and C++ reflectors). We can probably just get away with asserting against it in swap() and revisiting the issue if we ever really need to.
Brian is asking whether we would ever swap() the CrossCompartmentWrapper for a typed array.
(In reply to Boris Zbarsky (:bz) from comment #6)
> Brian is asking whether we would ever swap() the CrossCompartmentWrapper for
> a typed array.

Well, right. That's (sort of) what I was getting at. AFAIK, swap() is only used in brain transplanting, when an object moves across compartments. This means that we either replace "the real thing" with a cross-compartment wrapper (when the object is moving out of the compartment in question), or vice-versa. So as long as we don't transplant typed arrays, I don't think we'll need to swap them either.
OK, sounds good.  Can someone write a small testcase illustrating the problem?

Can the cross compartment wrapper change into another kind of wrapper, e.g. a dead object wrapper?  (Cross compartment pointers don't keep the target compartment alive, right?)
(In reply to Brian Hackett (:bhackett) from comment #8)
> Can the cross compartment wrapper change into another kind of wrapper, e.g.
> a dead object wrapper?

I believe nuking works by swapping a CCW to a dead wrapper.

> (Cross compartment pointers don't keep the target compartment alive, right?)

I expect they would.  If you do a compartmental GC, you need to find all references to a particular compartment without tracing compartments you aren't collecting.  I'd guess that CCWs must be what is used for that.
(In reply to Andrew McCreight [:mccr8] from comment #9)
> (In reply to Brian Hackett (:bhackett) from comment #8)
> > Can the cross compartment wrapper change into another kind of wrapper, e.g.
> > a dead object wrapper?
> 
> I believe nuking works by swapping a CCW to a dead wrapper.

More specifically, it changes the proxy handler. So yes, proxies can change types, unfortunately, but not often. Can we move the handler swapping code into a helper that just throws away all the relevant jit/TI stuff?

> > (Cross compartment pointers don't keep the target compartment alive, right?)

Pointers? no. Wrappers? yes.
More data: apparently the cross-global stuff in this case is auto-generated by Google Web Toolkit.  So it might be more widespread than we thought (and hence this bug might be higher priority)...
Blocks: WebJSPerf
Whiteboard: [js:t]
(In reply to Bobby Holley (:bholley) from comment #7)

> Well, right. That's (sort of) what I was getting at. AFAIK, swap() is only
> used in brain transplanting, when an object moves across compartments. This
> means that we either replace "the real thing" with a cross-compartment
> wrapper (when the object is moving out of the compartment in question), or
> vice-versa. So as long as we don't transplant typed arrays, I don't think
> we'll need to swap them either.

Actually, I was wrong here. We _do_ indeed swap typed arrays in the current world when we recompute cross-compartment wrappers between compartments (there are cases where we want to recompute _all_ wrappers entering and leaving a compartment, like document.domain). We could probably work around it, but could we instead just nuke all the jitcode if this ever happens? We already don't care about performance when we do this kind of thing.
Or sorry, I'm getting confused again. We swap wrapper<->wrapper, which is essentially a proxy swap that may change the proxy handler. But we never swap wrapper<->array unless we explicitly transplant that object between compartments.
Whiteboard: [js:t] → [js:p2]
Tracking given the fact that this appears to be a fairly major perf regression. Please ping if we do not believe this will be a user-facing issue post-release.
Sorry for the late reply here - we've decided to hold off on fixing this until IonMonkey lands (planned for Fx18), since it is much easier to address there than in the existing compiler.
Thanks for the update, removing tracking then and we can revisit when tracking is available for 18.
So we can fix this now?
Tracking this for 18 as we were waiting for IonMonkey landing to get this fixed.(Comment 15)
(In reply to David Anderson [:dvander] from comment #15)
> Sorry for the late reply here - we've decided to hold off on fixing this
> until IonMonkey lands (planned for Fx18), since it is much easier to address
> there than in the existing compiler.

David, do we have someone looking into this to get it in Fx18(Aurora now )  or please let us know if the landscape has changed on this since then ? Thanks !
If I understand the problem correctly: CPG introduced more wrappers, and JIT paths do not understand those. An emulator accessing a cross-compartment typed array is pretty much a perfect storm for creating the most pathological regression possible.

comment #11 hinted that this type of regression could be more widespread - have we seen evidence of that?

If not, fixing this particular regression sounds like it would be pretty easy. We'd just need ICs for cross-compartment typed array access, and those (I think) would not need any compartment-switching code. It wouldn't get us to perfect pre-CPG performance, but it wouldn't be 20X slower. We could do even better with help from TI, but maybe it's not necessary.

If we need a more general fix, it'd be good to understand exactly what work is required. I assume AutoEnterCompartment in ion::GetPropertyCache is not enough, because it could generate ICs that violate compartment separation? Like if |o.x| returns an object that is not in the current compartment. (Either way, we'd still need to introduce typed array ICs.)
> have we seen evidence of that?

We haven't had other reports of pathological performance.  Unfortunately, non-pathological general slowness rarely gets reported.  :(

For now, I think we should add ICs for cross-compartment typed arrays and focus on making proxys faster in the JIT (which we need to do anyway), which will make any other cases not quite as bad, incidentally.
Whoops - the URL here appears to be dead. Did the site move or do we know of another site where the typed array problem exists?
http://jsdosbox.appspot.com/ contains a demo of that project that runs DOOM.
If neither of those works out, let me know and I'll write a microbenchmarky thing... ;)
Attached file Microbenchmarky thing
I was curious about this case.
Running on x64 Linux Mint with Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz.

Firefox 16:
Same-compartment (all times in nanoseconds): 
  TypedArray.Length: 16.39
  TypedArray Element get: 1.63
  TypedArray Element set: 1.10
  DenseArray.Length: 15.57
  DenseArray Element get: 2.93
  DenseArray Element set: 26.91
Cross-compartment (all times in nanoseconds): 
  TypedArray.Length: 183.11
  TypedArray Element get: 112.64
  TypedArray Element set: 112.39
  DenseArray.Length: 134.96
  DenseArray Element get: 113.10
  DenseArray Element set: 128.00

Nightly:
Same-compartment (all times in nanoseconds): 
  TypedArray.Length: 1.50
  TypedArray Element get: 0.86
  TypedArray Element set: 0.85
  DenseArray.Length: 92.05
  DenseArray Element get: 13.79
  DenseArray Element set: 27.00
Cross-compartment (all times in nanoseconds): 
  TypedArray.Length: 264.45
  TypedArray Element get: 88.56
  TypedArray Element set: 87.96
  DenseArray.Length: 157.29
  DenseArray Element get: 79.57
  DenseArray Element set: 90.93
And what about FF14? I think CPG landed to FF15.
FF14 has no performance.now() so this kind of bad :(
You can use Date.now().  You'll just get times at lower resolution....
So you can get some sense:

FF 15
Same-compartment (all times in seconds): 
  TypedArray.Length: 0.00
  TypedArray Element get: 1.10
  TypedArray Element set: 1.10
  DenseArray.Length: 100.00
  DenseArray Element get: 10.00
  DenseArray Element set: 30.00
Cross-compartment (all times in seconds): 
  TypedArray.Length: 200.00
  TypedArray Element get: 115.20
  TypedArray Element set: 114.10
  DenseArray.Length: 100.00
  DenseArray Element get: 110.00
  DenseArray Element set: 130.00

FF14
Same-compartment (all times in seconds): 
  TypedArray.Length: 0.00
  TypedArray Element get: 1.10
  TypedArray Element set: 1.10
  DenseArray.Length: 0.00
  DenseArray Element get: 0.00
  DenseArray Element set: 10.00
Cross-compartment (all times in seconds): 
  TypedArray.Length: 0.00
  TypedArray Element get: 1.10
  TypedArray Element set: 1.20
  DenseArray.Length: 0.00
  DenseArray Element get: 10.00
  DenseArray Element set: 0.00
I added the part of the benchmark that tests cross compartment stuff to awfy assorted tests.
http://hg.mozilla.org/users/bhackett_mozilla.com/awfy-assorted/rev/6024a5bd3d0c
Is there a reason for testing CCW'd dense arrays here or on AWFY? This bug is about a typed array demo and I'm not sure where dense arrays come in.
It doesn't look like we're going to have time to get a patch for Firefox 18. This bug looks isolated to one site (that we've seen), and we're already shipping the regression, so I think this can wait another release.
(In reply to David Anderson [:dvander] from comment #35)
> It doesn't look like we're going to have time to get a patch for Firefox 18.
> This bug looks isolated to one site (that we've seen), and we're already
> shipping the regression, so I think this can wait another release.

Thanks dvander, untracking in that case.
Looking at awfy, this only got worse over time :(
Yep, that's normal for everything that's not explicitly being tracked by someone, as well as anything involving proxies.  :(
Depends on: 875452
Depends on: 907369
Assignee: general → nobody

Time spent on the "Welcome To DOSBox" screen:

Browser Time
Firefox 65 6:16
Nightly 0:26
Safari 12 0:21
Chrome 0:11

A profile shows we're now spending ~all time in JIT code, a lot of it in the gUb function. I think that function stays in Baseline for some reason, I'll see if I can figure out why (pretty annoying because the code is minimized/obfuscated).

At least we're back in the game.

Spectre mitigations also hurt us. Disabling all of them in about:config gets us closer to 20 seconds.

Severity: normal → S3

Testing this demo, we now spend < 10 seconds on the "Welcome to DOSBox" screen on my M1.

Chrome seems a little faster still, but it's pretty close and the big perf cliff from CPG has been fixed with same-compartment-realms.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: