Closed Bug 1235656 Opened 8 years ago Closed 8 years ago

GlobalObject::getSelfHostedFunction may return a function with inconsistent name.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: arai, Assigned: arai)

References

Details

Attachments

(2 files, 5 obsolete files)

Derived from bug 1067049.

GlobalObject::getSelfHostedFunction returns a function from intrinsic holder when it's cached.
But the cached function may have different name than "name" parameter.

For example:

  String.fromCodePoint(0);
  print(Number.isNaN.name); // prints "Number_isNaN"

there, Number_isNaN is called inside String_static_fromCodePoint, and it's cached with "Number_isNaN" name.
After that, Number constructor is resolved, and cached Number_isNaN is set to Number.isNaN.
So, all self-hosted builtin functions should have following 2 information:
  * global-object's property name to access to the function (Number_isNaN), in extended slot
    this is needed to re-lazify the function (am I correct?)
  * builtin's function name (isNaN), as "name" propety
    (either in JSFunction::atom_ or in object's property)

those information should be copied to all clones.

These rules will apply also to original functions in self-hosted global, if there is no other better way to get "Number_isNaN" while cloning it without GlobalObject::getSelfHostedFunction.
if all access to isNaN is done with "Number_isNaN" name, we could use the name passed to cloneSelfHostedValue, instead of storing that information into functions in self-hosted global
(In reply to Tooru Fujisawa [:arai] from comment #2)
> if all access to isNaN is done with "Number_isNaN" name, we could use the
> name passed to cloneSelfHostedValue, instead of storing that information
> into functions in self-hosted global

Also noticed that an alias like |var std_Array_indexOf = ArrayIndexOf;| will make 2 copies of the same function while cloning.
It should be better to just replace "std_Array_indexOf" with "ArrayIndexOf".
Prepared 5 patches.

Here's summary:
  * Change all self-hosted builtins to function expression with canonical name
    (including "get" prefix)
    and assign to global variable with the same name as now
  * Remove all aliases to self-hosted builtins
  * When cloning function, copy function's name property (both atom_ and resolved name property) and self-hosted name (the name used to access the function)
  * Remove unused parameter (name and id), that was used to name the function, from APIs
  * Assert the consistency of function's name and self-hosted name in several places


[Part 1]

Changed all self-hosted builtins to following format:
  var Number_isNaN = function isNaN(num) {
    ...
  };

For functions with a name which does not fit into function-expression syntax, changed it to following format:
  var String_iterator = function() {
    ...
  };
  std_Object_defineProperty(String_iterator, "name", { value: "[Symbol.iterator]" });

Getter function's name is also defined explicitly:
  var RegExpFlagsGetter = function() {
    ...
  };
  std_Object_defineProperty(RegExpFlagsGetter, "name", { value: "get flags" });


Following aliases defined in Utilities.js are removed, because it will introduce 2 copies of same function while cloning, from GlobalObject::getSelfHostedFunction and JSRuntime::cloneSelfHostedValue.
  * std_Array_indexOf
  * std_String_substring
  * std_Map_iterator_next
All references are replaced with actual names, like ArrayIndexOf.
Assignee: nobody → arai.unmht
Attachment #8703235 - Flags: review?(till)
[Part 2]

Now we don't need `name` parameter in GlobalObject::getSelfHostedFunction, as appropriate name is used in self-hosted JS, and it could be different from property name when the function is shared between 2 properties.

Then, self-hosted builtins are cloned in following places:
  * CloneObject (called by CloneValue <- JSRuntime::cloneSelfHostedValue)
  * GlobalObject::getSelfHostedFunction
While cloning function, JSFunction::atom_ and the resolved name are copied to the cloned function separately.

Then, now self-hosted functions have different name than self-hosted name ("isNaN" vs "Number_isNaN"), so inside CloneObject we should set the value of LAZY_FUNCTION_NAME_SLOT slot to the name used to access the function, that is, 'name' parameter of JSRuntime::cloneSelfHostedValue.  This is almost same situation as current GlobalObject::getSelfHostedFunction's "selfHostedName" parameter.

Also, function's resolved name is copied for functions with std_Object_defineProperty'ed name.


Just for simplicity, `name` parameter in GlobalObject::getSelfHostedFunction is removed in Part 3, and `id` parameter in JS::GetSelfHostedFunction is also removed in part 4.
Attachment #8703236 - Flags: review?(till)
[Part 5]

Now GlobalObject::getSelfHostedFunction does not receive the name for checking consistency.  So, instead of there, added js::VerifySelfHostedFunctionName function and call it after each GlobalObject::getSelfHostedFunction or JS::GetSelfHostedFunction function call, with more context, like getter/setter.

js::VerifySelfHostedFunctionName compares expected function name for the property with actual function's name property.
Following exceptional cases are handled there
  * RegExp.prototype.toSource.name == "toString"
  * Array.prototype[Symbol.iterator].name == "values"
  * %TypedArray%.prototype[Symbol.iterator].name == "values"
  * LegacyGenerator.prototype.send.name == "next"

js::VerifySelfHostedFunctionName should catch most cases of WIP patch that adds new self-hosted builtins with function statement style.

Also, added prefix parameter to js::IdToFunctionName to generate "get FOO" name.  This parameter is currently used only in debug code, but we can use it for native getters, maybe in bug 1180290.


green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d9d5983698e3
Attachment #8703239 - Flags: review?(till)
Comment on attachment 8703236 [details] [diff] [review]
Part 2: Use actual function name for self-hosted builtins.

Review of attachment 8703236 [details] [diff] [review]:
-----------------------------------------------------------------

Can't we introduce an intrinsic for setting a function's atom instead of actually defining the "name" property? That way we wouldn't ever have to deal with two different paths for naming functions.

std_Object_defineProperty(StarGeneratorThrow, "name", { value: "throw" });

would instead become

_SetFunctionAtom(StarGeneratorThrow, "throw");

That'd substantially reduce the complexity of this patch, and would save some memory and CPU during cloning. (Not that that matters too much.)

Cancelling review because a patch with that solution would look quite different from this one.

::: js/src/vm/GlobalObject.cpp
@@ +659,5 @@
>  {
> +    if (GlobalObject::maybeGetIntrinsicValue(cx, global, selfHostedName, funVal)) {
> +#ifdef DEBUG
> +        MOZ_ASSERT(funVal.isObject());
> +        MOZ_ASSERT(funVal.toObject().is<JSFunction>());

Nit: no need to have the above two asserts: the next line will cause exactly the same asserts to be made in the casts.

@@ +662,5 @@
> +        MOZ_ASSERT(funVal.isObject());
> +        MOZ_ASSERT(funVal.toObject().is<JSFunction>());
> +        RootedFunction fun(cx, &funVal.toObject().as<JSFunction>());
> +        MOZ_ASSERT(fun->nargs() == nargs);
> +        MOZ_ASSERT(fun->getAllocKind() == gc::AllocKind::FUNCTION_EXTENDED);

Same here

@@ +664,5 @@
> +        RootedFunction fun(cx, &funVal.toObject().as<JSFunction>());
> +        MOZ_ASSERT(fun->nargs() == nargs);
> +        MOZ_ASSERT(fun->getAllocKind() == gc::AllocKind::FUNCTION_EXTENDED);
> +        RootedValue funNameVal(cx, fun->getExtendedSlot(LAZY_FUNCTION_NAME_SLOT));
> +        MOZ_ASSERT(funNameVal.isString());

And here

@@ +666,5 @@
> +        MOZ_ASSERT(fun->getAllocKind() == gc::AllocKind::FUNCTION_EXTENDED);
> +        RootedValue funNameVal(cx, fun->getExtendedSlot(LAZY_FUNCTION_NAME_SLOT));
> +        MOZ_ASSERT(funNameVal.isString());
> +        RootedString funName(cx, funNameVal.toString());
> +        MOZ_ASSERT(funName->isAtom());

And here
Attachment #8703236 - Flags: review?(till)
I'm somewhat unhappy with the restrictions this entire approach imposes on self-hosted code. The fact that every self-hosted function has to be turned into the `var foo = function bar(){};` form means that self-hosted code will look even more different from normal JS. Since we want to open self-hosting to embeddings, that is unfortunate.

Couldn't we instead do something like the following?

- Define functions as in normal JS, with `function RegExpToString(){}`.
- For functions that are installed under different names on different builtins, set a canonical name using an intrinsic:
`_SetCanonicalFunctionName(RegExpToString, "toString");`
- That intrinsic sets the function's atom_, and sets its first extended slot to `true`.
- When cloning functions, the extended slots don't get cloned, so this is only visible in the self-hosting compartment itself.
- In GlobalObject::getSelfHostedFunction, if maybeGetIntrinsicValue succeeds, check if fun->atom() is the same as the `name` parameter.
  - If not, check if fun->atom() is the same as the `selfHostedName` parameter.
    - If so, change fun->atom() to `name`.
    - If not, this must have a canonical name, so assert that the *uncloned* function has its first extended slot set to `true`.
    - For this check to work, GlobalObject::getSelfHostedFunction should probably be moved to SelfHosting.cpp, so it can use `GetUnclonedValue`.


With this setup, it should be possible to only do special things in the JS code for those rare cases where the same self-hosted function is installed under multiple names. Self-hosted code itself can still use the original binding name as now, and we can still do the naming of the clones in GlobalObject::getSelfHostedFunction.

I might be overlooking something, but it seems like this should work.
Flags: needinfo?(arai.unmht)
Thank you for reviewing :)
Yeah, your idea looks much better and simpler than mine.

I'd add one more rule:
- In GlobalObject::getSelfHostedFunction, if maybeGetIntrinsicValue fails, check if *uncloned* function's atom is the same as the `name` parameter.
  - If so, use `name` as the newly created function's name.
  - If not, check if that the *uncloned* function's first extended slot is `true`
    - If so, this must have a canonical name,
      use fun->atom() as the newly created function's name.
    - If not, use `name` as the newly created function's name.


Here's draft patch, it's almost working but fails in SM(cgc).
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=50f0cdbc5e34&selectedJob=14997232

Will investigate it and post them when I can fix.


Another changes are following:

Moved some code from GlobalObject.cpp to SelfHosting.cpp, to access GetUnclonedValue,
also, they're moved from GlobalObject to JSRuntime, to access JSRuntime.selfHostingGlobal_.
  * JSRuntime::cloneLazySelfHostedFunction
  * JSRuntime::assertSelfHostedFunctionHasCanonicalName

Also, added following private methods because of much duplication of GetUnclonedValue callsite:
  * JSRuntime::getUnclonedSelfHostedValue
  * JSRuntime::getUnclonedSelfHostedFunction

Now functions in self-hosted global needs extended slot.
Modified Parser<ParseHandler>::newFunction to use FUNCTION_EXTENDED kind when the function is in self-hosted, and global scope.


> ::: js/src/vm/GlobalObject.cpp
> @@ +659,5 @@
> >  {
> > +    if (GlobalObject::maybeGetIntrinsicValue(cx, global, selfHostedName, funVal)) {
> > +#ifdef DEBUG
> > +        MOZ_ASSERT(funVal.isObject());
> > +        MOZ_ASSERT(funVal.toObject().is<JSFunction>());
> 
> Nit: no need to have the above two asserts: the next line will cause exactly
> the same asserts to be made in the casts.

Thanks :)
Maybe, we can remove a bunch of MOZ_ASSERT from the beginning of each intrinsic defined in SelfHosting.cpp?

e.g.
https://dxr.mozilla.org/mozilla-central/rev/ce643acfab14d95bea2fb6c4f56477413514b686/js/src/vm/SelfHosting.cpp#478
> static bool
> intrinsic_GetNextMapEntryForIterator(JSContext* cx, unsigned argc, Value* vp)
> {
>     CallArgs args = CallArgsFromVp(argc, vp);
>     MOZ_ASSERT(args.length() == 2);
>     MOZ_ASSERT(args[0].toObject().is<MapIteratorObject>());
>     MOZ_ASSERT(args[1].isObject());
> 
>     Rooted<MapIteratorObject*> mapIterator(cx, &args[0].toObject().as<MapIteratorObject>());
>     RootedArrayObject result(cx, &args[1].toObject().as<ArrayObject>());

or perhaps it's intentionally asserted there for readability or some other reason?
Flags: needinfo?(arai.unmht)
Updated the patch.
changes are in comment #11.

About the assertion failure, it happened in GC for worker thread, that tries to access function in main thread.
It happens inside NewScriptedFunction call in JSRuntime::cloneLazySelfHostedFunction,
and it disappears when I enclose RootedFunction for the function in self-hosted global (that is in main thread) with block.
Is this right solution?  or it's just hidden by the change?
Attachment #8703235 - Attachment is obsolete: true
Attachment #8703236 - Attachment is obsolete: true
Attachment #8703237 - Attachment is obsolete: true
Attachment #8703238 - Attachment is obsolete: true
Attachment #8703239 - Attachment is obsolete: true
Attachment #8703235 - Flags: review?(till)
Attachment #8703237 - Flags: review?(till)
Attachment #8703238 - Flags: review?(till)
Attachment #8703239 - Flags: review?(till)
Attachment #8703406 - Flags: review?(till)
separated from previous patch series' Part 1.
Attachment #8703407 - Flags: review?(till)
Comment on attachment 8703406 [details] [diff] [review]
Part 1: Set canonical name in self-hosted builtins.

Review of attachment 8703406 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this looks great. I added quite a few comments, but that's only because there were things I hadn't fully thought through myself before. My apologies for that. Most importantly, the HAS_SELFHOSTED_CANONICAL_NAME_SLOT flag is only needed in debug mode, which is nice, because that means we don't need to make all functions in the self-hosting compartment extended.

r=me with feedback addressed.

::: js/src/frontend/Parser.cpp
@@ +1630,5 @@
>        default:
> +        MOZ_ASSERT(kind == Statement);
> +        if (options().selfHostingMode && !pc->sc->isFunctionBox()) {
> +            isGlobalSelfHostedBuiltin = true;
> +            allocKind = gc::AllocKind::FUNCTION_EXTENDED;

I just realized that we only need to do this in debug mode. The HAS_SELFHOSTED_CANONICAL_NAME_SLOT flag is only needed to assert correctness, not to change runtime behavior. See the comment in cloneLazySelfHostedFunction for why.

::: js/src/vm/GlobalObject.cpp
@@ +663,5 @@
> +        if (fun->atom() == name)
> +            return true;
> +
> +        if (fun->atom() == selfHostedName) {
> +            // This function was called from self-hosted JS and cloned there.

This is all quite non-obvious so needs careful documenting. Perhaps something like:

"This function was initially cloned because it was called by other self-hosted code, so the clone kept it's self-hosted name, instead of getting the name it's intended to have in content compartments. This can happen when a lazy builtin is initialized after self-hosted code for another builtin used the same function. In that case, we need to change the function's name, which is ok because it can't have been exposed to content before."

@@ +669,5 @@
> +            fun->initAtom(name);
> +            return true;
> +        }
> +
> +        // This function's atom was modified by _SetCanonicalFunctionName.

'The function might be installed multiple times on the same or different builtins, under different property names, so its name might be neither "selfHostedName" nor "name". In that case, its canonical name must've been set using the `_SetCanonicalName` intrinsic.'

::: js/src/vm/Runtime.h
@@ +987,5 @@
>          return global == selfHostingGlobal_;
>      }
>      bool isSelfHostingCompartment(JSCompartment* comp) const;
>      bool isSelfHostingZone(const JS::Zone* zone) const;
> +    bool cloneLazySelfHostedFunction(JSContext* cx, js::HandlePropertyName selfHostedName,

I would call this "createLazySelfHostedFunctionClone".

::: js/src/vm/SelfHosting.cpp
@@ +2082,5 @@
> +                                       MutableHandleFunction fun)
> +{
> +    RootedAtom funName(cx, name);
> +    {
> +        RootedFunction selfHostedFun(cx);

It's unfortunate we need to root this in the first place: it's only needed because GetUnclonedValue does error reporting, which can gc. Limiting the rooted's scope seems like a fine solution, though.

@@ +2086,5 @@
> +        RootedFunction selfHostedFun(cx);
> +        if (!getUnclonedSelfHostedFunction(cx, selfHostedName, &selfHostedFun))
> +            return false;
> +
> +        if (selfHostedFun->atom() != name) {

If you change this check to `!= selfHostedName`, then the HAS_SELFHOSTED_CANONICAL_NAME_SLOT is only needed to assert correctness: if the uncloned function's name isn't `selfHostedName`, then it must have a canonical name, so we use that as the clone's name:

if (selfHostedFun->atom() != selfHostedName) {
    MOZ_ASSERT(selfHostedFun->getExtendedSlot(HAS_SELFHOSTED_CANONICAL_NAME_SLOT).toBoolean());
    funName = selfHostedFun->atom();
}

@@ +2090,5 @@
> +        if (selfHostedFun->atom() != name) {
> +            if (selfHostedFun->getExtendedSlot(HAS_SELFHOSTED_CANONICAL_NAME_SLOT).toBoolean())
> +                funName = selfHostedFun->atom();
> +        }
> +        // Free RootedFunction that may point a function in different thread

s/point a/point to a/

@@ +2091,5 @@
> +            if (selfHostedFun->getExtendedSlot(HAS_SELFHOSTED_CANONICAL_NAME_SLOT).toBoolean())
> +                funName = selfHostedFun->atom();
> +        }
> +        // Free RootedFunction that may point a function in different thread
> +        // before calling NewScriptedFunction.

", which might trigger a GC."

@@ +2148,5 @@
> +}
> +
> +bool
> +JSRuntime::getUnclonedSelfHostedFunction(JSContext* cx, HandlePropertyName name,
> +                                         MutableHandleFunction fun)

This can just return a JSFunction* instead of using an outparam value.

@@ +2178,5 @@
>  
>      return CloneValue(cx, selfHostedValue, vp);
>  }
>  
> +#ifdef DEBUG

If you move the ifdef inside the function's body, you can call it without ifdefs around the calls.

@@ +2179,5 @@
>      return CloneValue(cx, selfHostedValue, vp);
>  }
>  
> +#ifdef DEBUG
> +bool

This should be infallible: if getUnclonedSelfHostedFunction fails, something else must've gone wrong before that.

::: js/src/vm/SelfHosting.h
@@ +10,5 @@
>  #include "jsapi.h"
>  
>  class JSAtom;
>  
> +#define HAS_SELFHOSTED_CANONICAL_NAME_SLOT 0

Perhaps move this into SelfHostingDefines.h because that's where other slot defines live?
Attachment #8703406 - Flags: review?(till) → review+
Comment on attachment 8703407 [details] [diff] [review]
Part 2: Remove alias to selfhosted builtin from Utilities.js.

Review of attachment 8703407 [details] [diff] [review]:
-----------------------------------------------------------------

Yup, thanks.
Attachment #8703407 - Flags: review?(till) → review+
Thank you again :)

(In reply to Till Schneidereit [:till] from comment #14)
> @@ +2148,5 @@
> > +}
> > +
> > +bool
> > +JSRuntime::getUnclonedSelfHostedFunction(JSContext* cx, HandlePropertyName name,
> > +                                         MutableHandleFunction fun)
> 
> This can just return a JSFunction* instead of using an outparam value.

Ohh, this solves the inter-thread rooting issue in createLazySelfHostedFunctionClone.
With this change, the function in self-hosted compartment is rooted only in getUnclonedSelfHostedFunction via RootedValue.
We don't need enclosing block anymore :D
(In reply to Till Schneidereit [:till] from comment #14)
> This is all quite non-obvious so needs careful documenting. Perhaps
> something like:
> 
> "This function was initially cloned because it was called by other
> self-hosted code, so the clone kept it's self-hosted name, instead of

On the off chance this suggested comment is copied in verbatim, note s/it's/its/ here.  :-)

(Interestingly, if you go back far enough "it's" was the common, possibly even most-accepted spelling.  For example, Article 10, Section 10 of the US Constitution uses "it's" as the possessive form of "it": "for executing it's inspection Laws".  http://www.archives.gov/exhibits/charters/constitution_transcript.html#1.10 )
https://hg.mozilla.org/mozilla-central/rev/ef6a8586cb4d
https://hg.mozilla.org/mozilla-central/rev/affc1fd3d231
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: