Closed Bug 869788 Opened 11 years ago Closed 7 years ago

DOMTokenList stringifier should trim/compress white space and only emit unique values

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: heycam, Assigned: ayg)

References

Details

(Keywords: dev-doc-complete)

Attachments

(3 files, 7 obsolete files)

Attached file test
I think this test should pass, according to the DOM Standard.
Yep.  Want to fix?  ;)
Sure... :)
Assignee: nobody → cam
Attached patch patch (obsolete) — Splinter Review
Attachment #749105 - Flags: review?(bzbarsky)
Hmm. I'm worried about the nsAttrValue changes in terms of performance, and also in terms of something depending on our current behavior....
I think these changes are all supported by the spec.  IE will normalise white space when you modify the class list through the DOMTokenList, but it won't remove duplicate entries.  Chrome seems to match our current behaviour, from a couple of quick tests.

Is your worry big enough to file a bug on the spec?
I'm not sure.  Would need to see some real-life performance numbers....
Comment on attachment 749105 [details] [diff] [review]
patch

Olli, can you take this off my hands?
Attachment #749105 - Flags: review?(bzbarsky) → review?(bugs)
Comment on attachment 749105 [details] [diff] [review]
patch

This seems to slow down class attribute setting few %, around ~5%, but depends of course on the case.
Maybe GetAtomArrayString could use a bloom filter on stack, and if a value is 
found twice or more, actually create a set and stringify it and perhaps
update also the atom array to not contain duplicates.
(GetAtomArrayString should be called something else then.)
Atoms have pre-calculated hashcode, so BloomFilter handling should be fast.
How does that sound to you?
Attachment #749105 - Flags: review?(bugs) → review-
Sounds good, I can do that.
See Also: → 910121
FYI, we're planning to make this change in WebKit: https://bugs.webkit.org/show_bug.cgi?id=148589
Flags: needinfo?(cam)
Wouldn't it be better to only compute the duplicate removal when classList is actually accessed?  This should remove the performance problems completely without any optimization, I'd think, unless we're worried about someone calling .classList on an element with 67,000 classes.
Flags: needinfo?(bugs)
Attached patch Different approach (obsolete) — Splinter Review
Here's a different approach.  It will have no perf impact unless you access .classList.  In that case perf isn't great, but probably it's acceptable in real life?  I could change the O(N^2) duplicate check in Parse() to use a hashtable if you think it would speed things up in real-world cases.

I put the code in nsDOMTokenList instead of nsAttrValue because this really doesn't affect behavior when accessed not via nsDOMTokenList.  But I don't understand how all this fits together, so very possibly this is just wrong.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=833592b68654
Assignee: cam → ayg
Status: NEW → ASSIGNED
Flags: needinfo?(bugs)
Attachment #8671366 - Flags: review?(bugs)
(Failing tests are all either unexpected passes, or microdata tests that have been deleted upstream.)
microdata tests shouldn't still start to fail, right?
(we will probably remove microdata API in couple of months, but not right now)
AFAICT, the unexpected fails in the microdata tests were wrong.  But I can't tell for sure, because there's no spec anymore.  :)  Anyway, they've been removed upstream:

https://github.com/w3c/web-platform-tests/commit/e955cd447aa99fd000f211971c24ced87b04fe5b

So they're going away with the next wpt sync, even if we still support the feature for now.
Comment on attachment 8671366 [details] [diff] [review]
Different approach

> nsDOMTokenList::IndexedGetter(uint32_t aIndex, bool& aFound, nsAString& aResult)
> {
>-  const nsAttrValue* attr = GetParsedAttr();
>-
>-  if (attr && aIndex < static_cast<uint32_t>(attr->GetAtomCount())) {
>+  nsAutoString attr;
>+  mElement->GetAttr(kNameSpaceID_None, mAttrAtom, attr);
>+  nsTArray<nsString> list;
>+  Parse(attr, list);
>+  if (aIndex < static_cast<uint32_t>(list.Length())) {
>     aFound = true;
>-    attr->AtomAt(aIndex)->ToString(aResult);
>+    aResult = list[aIndex];
>   } else {
>     aFound = false;
>   }
> }
So you're making indexed getter _parse_  the already parsed attribute value
again each time it is accessed. Don't understand why. Why don't you just rely on 
nsAttrValue* ?
I assume this change would show up rather badly in some microbenchmarks.

>@@ -128,40 +178,27 @@ nsDOMTokenList::AddInternal(const nsAttrValue* aAttr,
>   }
> 
>   nsAutoString resultStr;
> 
>   if (aAttr) {
>     aAttr->ToString(resultStr);
>   }
> 
>-  bool oneWasAdded = false;
>-  nsAutoTArray<nsString, 10> addedClasses;
>+  nsAutoTArray<nsString, 10> list;
>+  Parse(resultStr, list);
> 
>   for (uint32_t i = 0, l = aTokens.Length(); i < l; ++i) {
>-    const nsString& aToken = aTokens[i];
>-
>-    if ((aAttr && aAttr->Contains(aToken)) ||
>-        addedClasses.Contains(aToken)) {
>-      continue;
>+    if (!list.Contains(aTokens[i])) {
>+      list.AppendElement(aTokens[i]);
>     }
>-
>-    if (oneWasAdded ||
>-        (!resultStr.IsEmpty() &&
>-        !nsContentUtils::IsHTMLWhitespace(resultStr.Last()))) {
>-      resultStr.Append(' ');
>-      resultStr.Append(aToken);
>-    } else {
>-      resultStr.Append(aToken);
>-    }
>-
>-    oneWasAdded = true;
>-    addedClasses.AppendElement(aToken);
>   }
> 
>+  Serialize(list, resultStr);
>+
Also here. Why all this parsing and serializing.

>   mElement->SetAttr(kNameSpaceID_None, mAttrAtom, resultStr, true);
... which SetAttr will then parse again so that it can store things in AtomArray



> nsDOMTokenList::RemoveInternal(const nsAttrValue* aAttr,
>                                const nsTArray<nsString>& aTokens)
has similar issue


Sorry, but this looks like a way to make TokenList behave really slowly, and I would be surprised if there isn't some
silly benchmark doing microbenchmarking on this interface too.
Attachment #8671366 - Flags: review?(bugs) → review-
Flags: needinfo?(cam)
I separated the BloomFilter code into a different patch to preserve attribution better, since this patch is still almost entirely Cameron's code.
Attachment #749105 - Attachment is obsolete: true
Attachment #8671366 - Attachment is obsolete: true
Attachment #8672386 - Flags: review?(bugs)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=af39cffe8c3d

I don't understand comment 8 -- why should this slow down attribute setting?  Parsing is only needed when classList is accessed, so why should this code run when the class attribute is set?  Surely we should defer it?  99.9% of the time that a class is set, we don't need to parse.
Attachment #8672387 - Flags: review?(bugs)
The first version was missing a somewhat important line.  :)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=131c142d212d
Attachment #8672387 - Attachment is obsolete: true
Attachment #8672387 - Flags: review?(bugs)
Attachment #8672397 - Flags: review?(bugs)
(In reply to :Aryeh Gregor (working until October 13) from comment #18) 
> I don't understand comment 8 -- why should this slow down attribute setting?
There is that Hashtable creation at least.
Why do we have to do any parsing when the class attribute is set?  Why not wait until the first access of .classList?
Well, the atom array is mostly needed by CSSRuleProcessor.
Sure, we could make nsIContent::GetClasses() or AttrValue::GetAtomCount() to parse the value first time it is needed, but that isn't what we've done so far with AttrValues.
Comment on attachment 8672386 [details] [diff] [review]
Part 1 -- Cameron's patch updated to apply to current code

hard to review this when this has tons of unrelated changes (all those test removals).
Comment on attachment 8672386 [details] [diff] [review]
Part 1 -- Cameron's patch updated to apply to current code

Waiting for some new patch.
Attachment #8672386 - Flags: review?(bugs)
So the eDiscardDuplicates approach takes care of only classList.
What about iframe.sandox = "some tokens" and other similar attributes?
SettableTokenList.value uses the same 'ordered set parser' what element's 'classes' use.

(But then, I don't see anything guaranteeing that iframe.setAttribute("sandbox", "new new token token") leads to unique tokens in the list. element.setAttribute("class", "class1 class1"); is defined to skip duplicates. )
Skip duplicates for unique tokens in the list (for other attribute than class) was covered by solving this bug in DOM spec: https://www.w3.org/Bugs/Public/show_bug.cgi?id=20660

And stringifier for DOMTokenList was reverted to previouse requirements so Firefox has it correct now (like other browsers): https://github.com/whatwg/dom/issues/105
Attached patch New patch (obsolete) — Splinter Review
Okay, how about this?  If you don't like the bloom filter or hash table usage on every assignment, I could try to do those more lazily.

Probably will need test adjustments for the try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c2d7d7f1083
Attachment #8672386 - Attachment is obsolete: true
Attachment #8672397 - Attachment is obsolete: true
Attachment #8672716 - Attachment is obsolete: true
Attachment #8739989 - Flags: review?(bugs)
Comment on attachment 8739989 [details] [diff] [review]
New patch

Not quite finished.  Will re-post for review when I get a green try run.
Attachment #8739989 - Attachment is obsolete: true
Attachment #8739989 - Flags: review?(bugs)
Attached patch New patchSplinter Review
Finished try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=235cdc528e0a

The failures there should be fixed in the new try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6cb5447ee18a494579659fce37dc10caa5d9ec5

I wrote more tests that I'll submit upstream to wpt, that test all the DOMTokenList attributes in the DOM spec (not just classList).  I based it off dom/base/test/test_classList.html, with XUL and mutation events removed.
Attachment #8740436 - Flags: review?(bugs)
Ok, so https://github.com/whatwg/dom/issues/105 makes it still a bit unclear what we want here.
Comment on attachment 8740436 [details] [diff] [review]
New patch

>+  BloomFilter<8, nsIAtom> filter;
>+  if (aDuplicateHandling == eDiscardDuplicates) {
>+    filter.add(classAtom);
>+  }
>+
>+  nsDataHashtable<nsPtrHashKey<nsIAtom>, bool> tokens;
Have you checked whether constructing and deconstructing a hashtable shows up in performance profiles?

>+  enum DuplicateHandling { eAllowDuplicates, eDiscardDuplicates };
>+  void ParseAtomArray(const nsAString& aValue,
>+                    DuplicateHandling aDuplicateHandling = eDiscardDuplicates);
Could you remove the default value. It is hard to say whether this patch might have some odd side effects since 
I don't know in which all cases it is being called. autocomplete seems to be one such special case, and there eAllowDuplicates is being used.

+void
+nsDOMTokenList::Reserialize(const nsAttrValue* aAttr)
+{
+  nsAutoString result;
+  aAttr->GetAtomArrayString(result);
+  mElement->SetAttr(kNameSpaceID_None, mAttrAtom, result, true);
+}
So we don't ever call this method.

> nsDOMTokenList::RemoveInternal(const nsAttrValue* aAttr,
>                                const nsTArray<nsString>& aTokens)
...
>-  mElement->SetAttr(kNameSpaceID_None, mAttrAtom, output, true);
>+  nsAutoString result;
>+  aAttr->GetAtomArrayStringWithSomeRemoved(result, aTokens);
>+  mElement->SetAttr(kNameSpaceID_None, mAttrAtom, result, true);
This is a bit odd. GetAtomArrayStringWithSomeRemoved serializes the attribute (with some parts removed) and then SetAttr parses the same string.
But ok, I guess this simplifies mutation handling quite a bit. Perhaps add a comment why we do it this way.


>@@ -134,17 +142,17 @@ nsDOMTokenList::AddInternal(const nsAttrValue* aAttr,
> {
>   if (!mElement) {
>     return;
>   }
> 
>   nsAutoString resultStr;
> 
>   if (aAttr) {
>-    aAttr->ToString(resultStr);
>+    aAttr->GetAtomArrayString(resultStr);
>   }
So given that we're using GetAtomArrayString now here, why do we still need !nsContentUtils::IsHTMLWhitespace(resultStr.Last() check few lines below?


I think I'd like to see a new patch with the default value for aDuplicateHandling removed. Just to make sure we aren't relying on the old behavior somewhere.
Also, it is not clear whether we'll land this. Need to ask what MS and Blink folks think.
I think I need to still re-read the spec, and figure out what https://github.com/whatwg/dom/issues/105#issuecomment-206327267 refers to.
(It is unclear now that what is in the spec vs. what different browsers have implemented.)
Attachment #8740436 - Flags: review?(bugs) → review-
The DOM Standard changed per https://github.com/whatwg/dom/issues/105.

Follow-up work is bug 1347998.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
It looks like the spec has been finalized (for now! :) ).  The conclusion seems to be that everything normalizes whitespace and removes duplicates, *except* stringification/.value, which returns the attribute value as-is.  So the patches need a bit of fixup but should be good.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Just as a heads-up, this bug is going to need a serious performance investigation before it can land.  That means some measurements on microbenchmarks (and possibly resulting optimization), measurements on actual sites, etc.
Olli, do you still want to review this?

Whoever's going to review this: please say exactly what performance tests you want me to run and I'll run them and see what I can do.  There are some obvious optimizations to do if the results aren't good enough.
Flags: needinfo?(bugs)
Just to put the request from comment 39 in perspective, I suspect that coming up with (not running!) the right set of performance tests here is a pretty nontrivial task.  I'd estimate it to take an hour or two of thought for someone who is already intimately familiar with the patch.
Hmm.  Well, I think we do want to make this change, so I guess it will have to wait until someone has time to come up with the right performance figures.
Think about worst case scenarios and write testcases and profile?

ParseAtomArray is very hot when dealing with class attribute for example.
Call setAttribute("class", <varying list of values>); in a loop? Setting class attribute is very common operation and it already shows up when profiling innerHTML
Flags: needinfo?(bugs)
And to profile the code, addon from https://perf-html.io/ gives often good enough numbers.
Would it be easier if we initialized the token list lazily only when a .classList method is called?  Then attribute setting would just have to null the token list, which would be reinitialized when next needed.  Would that mean we wouldn't have to do performance testing, since .classList methods are probably not hot paths?  We should still initialize the token list efficiently, of course, but if it slows down the first call to DOMTokenList.add/remove/toggle/replace/length somewhat it shouldn't be a big deal, right?
Flags: needinfo?(bugs)
We need to have _a_ token list (in the sense of array of nsIAtom) for CSS matching on class to work right.

We could leave that list as it is right now and perform the first normalization on actual DOMTokenList access, though.
(bz answered, clearing ni?)
Flags: needinfo?(bugs)
Should I add a dirty flag that tracks whether the token list needs normalization, to save repeated normalization if DOMTokenList methods are called repeatedly?  If so, where should I put it?
Flags: needinfo?(bzbarsky)
I haven't thought about this sufficiently to have an offhand answer to that question, sorry.
Flags: needinfo?(bzbarsky)
Yes, there should be a flag somewhere. Like, set the flag whenever an attribute is set or something.
Though, I'm not sure if we have currently any spare bits in nsINode. We may, or I do recall there being a bug about removing some exiting flag. (filed by bholley I think )
Bug 1338723 is what I was thinking.
Notes for reviewing the patch:

1) It's based on bug 1354066.  If this bug gets reviewed faster than that one, you could take over review for that one too if you want.

2) There are some spec changes in the pipe to this stuff, so it will probably need tweaking soon:

https://github.com/whatwg/dom/issues/442
https://github.com/whatwg/dom/issues/443
https://github.com/whatwg/dom/pull/444
https://github.com/whatwg/infra/pull/126

But the changes should be minor, so it still makes sense to review this.
Attachment #8854853 - Flags: review?(bugs)
smaug, do you want to review this, or should someone else?  I'm around for about another week.  Note comment 52.
Flags: needinfo?(bugs)
Attachment #8854853 - Flags: review?(bugs)
Have you done any performance measurements in the worst case scenarios?
How much does RemoveDuplicates(); slow down indexed getter for example when class attribute has several values?
Flags: needinfo?(bugs)
Comment on attachment 8854853 [details]
Bug 869788 - Normalize DOMTokenList for whitespace/dupes;

Reviewing once some performance test has been done. Like, does this slow down indexed getters significantly.
Attachment #8854853 - Flags: review?(bugs)
data:text/html,<!DOCTYPE html>
<span class="a b c d e f g h i j k l m n o p q r s t u v w x y z"></span>
<span class="a a a a a a a a a a a a a a a a a a a a a a a a a a"></span>
<span class="a a a a a a a a a a a a a a a a a a a a a a a a a b"></span>
<script>
var spans = document.querySelectorAll("span");
var results = [];
for (var i = 0; i < 3; i++) {
  var start = window.performance.now();
  var list = spans[i].classList;
  for (var j = 0; j < 1000000; j++) {
    list[7];
  }
  results.push(window.performance.now() - start);
}
document.body.textContent = results.join(", ");
</script>

I get figures of around 530-580 without the patch.  With the patch, the first is around 570-630, and the other two are around 720-760.  This is with 26 classes and is a super-microbenchmark, though.  If I change the inner loop to be spans[i].classList[7] instead of storing in a local variable, it already eliminates almost all the difference.
Flags: needinfo?(bugs)
Comment on attachment 8854853 [details]
Bug 869788 - Normalize DOMTokenList for whitespace/dupes;

https://reviewboard.mozilla.org/r/126806/#review135528

::: dom/base/nsDOMTokenList.cpp:58
(Diff revision 3)
>    }
>    return mElement->GetAttrInfo(kNameSpaceID_None, mAttrAtom).mValue;
>  }
>  
> +void
> +nsDOMTokenList::RemoveDuplicates()

Shouldn't this take nsAttrValue* as a param, since all the callers seem to have such anyhow.

::: dom/base/nsDOMTokenList.cpp:115
(Diff revision 3)
>  nsDOMTokenList::IndexedGetter(uint32_t aIndex, bool& aFound, nsAString& aResult)
>  {
> +  RemoveDuplicates();
> +
>    const nsAttrValue* attr = GetParsedAttr();
>  

Want to add the check here that RemoveDuplicates is called only if aIndex < atom count
Attachment #8854853 - Flags: review-
Comment on attachment 8854853 [details]
Bug 869788 - Normalize DOMTokenList for whitespace/dupes;

https://reviewboard.mozilla.org/r/126806/#review135840

::: dom/base/nsDOMTokenList.cpp:194
(Diff revision 4)
> -    aAttr->ToString(resultStr);
> +    RemoveDuplicates(aAttr);
> +    for (uint32_t i = 0; i < aAttr->GetAtomCount(); i++) {
> +      if (i != 0) {
> +        resultStr.AppendLiteral(" ");
> +      }
> +      resultStr.Append(nsDependentAtomString(aAttr->AtomAt(i)));

I assume there is wpt test for this behavior, that

<div class="a a a">
div.classList.add("a");
changes the attribute value to "a"

::: dom/base/nsDOMTokenList.cpp:359
(Diff revision 4)
> -  WhitespaceTokenizer tokenizer(attribute);
>  
>    bool sawIt = false;
> -  while (tokenizer.hasMoreTokens()) {
> -    auto currentToken = tokenizer.nextToken();
> -    if (currentToken.Equals(aToken) || currentToken.Equals(aNewToken)) {
> +  nsAutoString resultStr;
> +  for (uint32_t i = 0; i < aAttr->GetAtomCount(); i++) {
> +    if (aToken == nsDependentAtomString(aAttr->AtomAt(i)) ||

aAttr->AtomAt(i)->Equals(aToken) would be a bit faster I think
Attachment #8854853 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #61)
> I assume there is wpt test for this behavior, that
> 
> <div class="a a a">
> div.classList.add("a");
> changes the attribute value to "a"

There are a couple like that:

https://reviewboard-hg.mozilla.org/gecko/rev/7f37a8918fce#l7.28
https://reviewboard-hg.mozilla.org/gecko/rev/7f37a8918fce#l7.40
https://reviewboard-hg.mozilla.org/gecko/rev/7f37a8918fce#l7.40
Flags: needinfo?(bugs)
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s c0bb078e6bce -d e05f84ea2a33: rebasing 391456:c0bb078e6bce "Bug 869788 - Normalize DOMTokenList for whitespace/dupes; r=smaug" (tip)
local [dest] changed testing/web-platform/meta/dom/lists/DOMTokenList-iteration.html.ini which other [source] deleted
use (c)hanged version, (d)elete, or leave (u)nresolved? u
unresolved conflicts (see hg resolve, then hg rebase --continue)
Pushed by ayg@aryeh.name:
https://hg.mozilla.org/integration/autoland/rev/e8f01e428501
Normalize DOMTokenList for whitespace/dupes; r=smaug
https://hg.mozilla.org/mozilla-central/rev/e8f01e428501
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
I've added a section to the docs explaining this behaviour:

https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList#Trimming_of_whitespace_and_removal_of_duplicates

and added a note to the Fx55 release notes:

https://developer.mozilla.org/en-US/Firefox/Releases/55#DOM_HTML_DOM

As a bonus, I've also documented all of the members of DOMTokenList, as many of the pages were missing, and others needed updates.

Let me know if this looks OK!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: