Closed Bug 918771 Opened 11 years ago Closed 9 years ago

XMLHttpRequest (XHR) send() of an HTML document sends it as application/xml, not text/html

Categories

(Core :: DOM: Core & HTML, defect, P5)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: hsteen, Assigned: nika, Mentored)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-needed, site-compat, Whiteboard: [lang=c++])

Attachments

(3 files, 5 obsolete files)

Test case:
http://w3c-test.org/web-platform-tests/master/XMLHttpRequest/send-entity-body-document.htm

Not very XHR-relevant, but the extra xmlns attribute Gecko adds causes this test to say failed. 

Now let the flame wars on which behaviour is better begin.. :-/
Are we sending it as text/html, or as application/xml?  In the former case the xmlns is clearly needed...  And as far as I can tell we always send as application/xml.

Is there an existing bug on sending as text/html when the document is HTML?  And is it web-compatible?
> In the former case the xmlns is clearly needed...

I meant _latter_ case, of course.
The TC uses two documents (loaded as data: URLs in IFRAMEs) - one is text/xml, the other one is text/html. It is indeed being sent with content-type application/xml - that means the test needs fixing, I suppose it might take content-type into account and require a xmlns if it's labelled as XML.
Heh. No browser gets this exactly right so far:

* Chrome doesn't manage to serialize either document, it sends an empty string and no content-type header

* Opera is pretty close: it serializes both documents and sends one as application/xml and the other one as text/html. It gets the charset right too, specifies UTF-8 for both. However, it adds a superfluous prologue - <?xml version="1.0"?> - *even* to the document that it sends with content-type text/html. Oops.. :-p

Plan: test should accept that the document may be sent as text/html or application/xml. It should check encoding label to make sure it is correct. If document is sent as application/xml, the test should require a xmlns attribute specifying the HTML namespace. (If it's sent as text/html I guess I'll just ignore whether there is a xmlns or not).

Sounds good?
Rather than leaving the test's mime type up to chance, it would be better to have three tests:

* text/html (.html)
* application/xhtml+xml (.xhtml)
* application/xml (.xml)
I think the spec should be made clearer: HTML documents should be serialized as text/html.  It tries to say this, but not very clearly due to trying to be terse or something.

Then the bug on our end is simply that we're sending it as application/xml, not text/html. 

But yes, the test should require xmlns for the application/xml version.

> * application/xhtml+xml (.xhtml)

Per spec (or at least its intent), this would get sent as application/xml.
I'm trying to figure out what the encoding-related parts of this test should assert in discussion with Anne:
https://github.com/w3c/web-platform-tests/pull/343#issuecomment-24906702
..and it seems our plan will be to mandate encoding as UTF-8 regardless of what the input document's original encoding was. Gecko does something more complicated at the moment..I think it encodes according to the original encoding, which is what the spec *currently* says, although Anne is planning to change it.
Is always using UTF-8 web-compatible?
Chrome reportedly does not transmit Document objects at all (serializes them to the empty string).
Testcase, please?  Code inspection of https://chromium.googlesource.com/chromium/blink.git/+/99b8c9800ac123eddc3e199088d22569c5294b22/Source/core/xml/XMLHttpRequest.cpp starting line 593 suggests that they do something useful with it, unless createMarkup() is buggy (and my reading of it doesn't see obvious bugs like "produces no output").  Though it also looks like they always send UTF-8.
Boris, if this bug didn't have a test case already it would not exist ;-)

Anyway, loading http://w3c-test.org/web-platform-tests/master/XMLHttpRequest/send-entity-body-document.htm in my Chrome (which claims to be "Google Chrome	29.0.1547.76 (Offisiell delversjon 223446) m" - what's that m for??) shows two failures. The second one is described as 

expected "<html><head></head><body>�</body></html>" but got ""

and I've been stepping through the code in the debugger and checked HTTP traffic earlier and seen that, surprisingly, xhr.send(doc) does indeed send nothing at all.
(In reply to Boris Zbarsky [:bz] from comment #9)
> Is always using UTF-8 web-compatible?

Do websites generally pass documents to the XHR send() method? I don't recall seeing this used in the wild, so I think we're pretty free to spec and implement the approach that makes the most sense.
> Boris, if this bug didn't have a test case already it would not exist ;-)

Sorry I missed the link in comment 0.  :(

> Anyway, loading
> http://w3c-test.org/web-platform-tests/master/XMLHttpRequest/send-entity-body-document.htm 

Thank you for the testcase.

Blink doesn't treat data: iframes as same-origin with their ancestor document.  So the .contentDocument getter there returns null (and reports a security error in Chrome's error console).  Passing null to send() will of course send the empty string....

Having the testcase check that what it's sending is in fact a Document node would be a good addition to the testcase.  And changing it to not rely on the data:-handling vagaries of UAs, I guess.

> Do websites generally pass documents to the XHR send() method? 

That's a good
I meant:

That's a good question.  Worth checking.
D'oh.. I keep forgetting that they have a different data: URL security policy :-/ 

I'll sharpen up that test case (If we want to keep the encoding test aspect of it, I guess it needs to load real URLs from the server rather than, say, use createDocument(). No problem, I can set that up :-).)

Done - 
https://github.com/w3c/web-platform-tests/commit/6c27a3aff577be626758153c3c1ca82dbc337034
http://w3c-test.org/web-platform-tests/submissions/343/XMLHttpRequest/send-entity-body-document.htm
Test now works a lot better in Chrome (except that it balks because Chrome sends everything as UTF-8 but doesn't bother *labelling* it as such.. Everything is just "Content-Type: application/xml"). Looking at the HTTP traffic, it sends the expected source code.

Anyway - if Chrome already sends everything as UTF-8 regardless of input encoding, I think that's a safe thing to do. If you can somehow check if this is used in the wild it would be interesting to have data - in the absence of data I'll happily believe that we're in un-claimed territory and can do whatever we like.
OK.  Can we spin off a separate bug on the encoding issues?  This bug as filed was about the MIME type we serialize as, right?  And that still needs an answer to my compat question from comment 1....  In particular, what other UAs do.
Flags: needinfo?(hsteen)
Summary: Serializing a HTML document adds superfluous xmlns attribute to serialized document element → XMLHttpRequest (XHR) send() of an HTML document sends it as application/xml, not text/html
I'll defer to Hallvord, Anne and Henri here.

I'd love it if we could simply drop support for sending documents directly using XHR. There's always the ability to use the XMLSerializer to do that. If chrome is in fact getting away with that it'd be nice to gather telemetry. But if all browsers are in fact supporting sending XML and/or HTML documents, then I suspect that won't be possible.
(In reply to Boris Zbarsky [:bz] from comment #17)
> what other UAs do.

Below is a cross-browser summary of behaviours exposed by this test - using the fixed TC w/o data: URLs - 
http://w3c-test.org/web-platform-tests/submissions/343/XMLHttpRequest/send-entity-body-document.htm

IE 10:
* Keeps MIME types (XML sent as application/xml, HTML as text/html)
* Keeps original charset (and the content on the wire *does* match the indicated charset too)
* Invalid characters are dropped

Opera (Presto):
* Keeps MIME types  (XML sent as application/xml, HTML as text/html)
* Always sends UTF-8, correctly labelled as UTF-8
* Invalid characters represented by the FFFD replacement character

Chrome / Opera Blink:
* Changes MIME types (always sends application/xml)
* Doesn't add xmlns attributes - even when original document was HTML
* Always sends UTF-8 (but fails to label it as such)
* Invalid characters represented by the FFFD replacement character

Firefox:
* Changes MIME types (always sends application/xml)
* Adds xmlns attributes when original document was HTML (presumably for other documents in known namespaces too)
* Preserves MIME type of original document (correctly labelled and encoded)
* Invalid characters represented by the FFFD replacement character


The test expects the Opera (Presto) implementation - and IMHO that makes sense.

Jonas:
> I'd love it if we could simply drop support for sending documents directly using XHR.

Sorry.. I don't think this functionality is used much, but it doesn't seem safe to drop it given that the feature is actually implemented widely.
Flags: needinfo?(hsteen)
OK.  I could live with the Presto behavior here, I guess.  Given the Chrome behavior, it's probably not rampantly web-incompatible.
Whiteboard: [mentor=bz][lang=c++]
Oops.. Just for clarity, where I wrote

> Firefox:
.
.
> * Preserves MIME type of original document (correctly labelled and encoded)

I meant encoding, not MIME type. So Firefox always changes the MIME type to application/xml, but it uses the encoding of the original document.
Keywords: dev-doc-needed
Hallvord, what do UAs do for things like text/plain or image/gif documents?
Flags: needinfo?(hsteen)
(In reply to Hallvord R. M. Steen from comment #21)
> Oops.. Just for clarity, where I wrote
> 
> > Firefox:
> .
> .
> > * Preserves MIME type of original document (correctly labelled and encoded)
> 
> I meant encoding, not MIME type. So Firefox always changes the MIME type to
> application/xml, but it uses the encoding of the original document.

Hi,

I'm having a go at this bug; but without any patching I'm getting for 
the result of the first test:

assert_equals: expected "<html><head></head><body>�</body></html>" but got "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>404 Not Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p>The requested URL /web-platform-tests/master/XMLHttpRequest/resources/content.py was not found on this server.</p>\n</body></html>\n"
I forgot to add a needinfo for comment 23.

I mentioned this to bz, and he said it could be a buggy server.
There's some backend work going on here, to be able to run the tests from a Python harness - see "running the tests" section on https://github.com/w3c/web-platform-tests/ . However, they should also be running from w3c-test.org - James, why isn't this working?

(And Boris: sorry for being late responding to the needinfo, will write extra tests today..)
Flags: needinfo?(james)
w3c-test.org is temporarily not usable for tests like this as we have migrated away from PHP but they are still in the process of standing up the python-based server.

In the meantime you can run the tests on your local machine by following the instructions in the readme of https://github.com/w3c/web-platform-tests i.e. clone that repo, add in the submodules, set up your hosts file and |python serve.py| to get a full test environment.
Flags: needinfo?(james)
(In reply to Boris Zbarsky [:bz] from comment #22)
> Hallvord, what do UAs do for things like text/plain or image/gif documents?

When you say "text/plain or image/gif documents" I assume you mean the documents UAs generate if you do for example <iframe src="foo.txt">. In this case, all I tested (Gecko, Presto, Blink) happily serialize the internally generated document and send it. The content-type header matches the one for normal HTML documents in the respective browsers.

(There is obviously no point in standardising this behaviour, or make the test suite assert something specific. We'd have to standardise the markup UAs wrap plain text and images in.. I guess we could test that a text/plain URL loaded in an IFRAME is sent as text/html;charset=UTF-8 but even that seems to be outside of what we should standardise..)

If there's a different way to come up with a "text/plain document" please explain what you have in mind ;-)
Flags: needinfo?(hsteen)
> I assume you mean the documents UAs generate if you do for example
> <iframe src="foo.txt">. 

Yes.

> There is obviously no point in standardising this behaviour

Actually, there obviously is.  The good news is that the spec as written right now ends up sort of having the behavior UAs have, so that's good enough.

> We'd have to standardise the markup UAs wrap plain text and images in..

It's standardized, to a large extent.   See <http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#read-text> and <http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#read-media>.  The only leeway is in what you can put in the <head> of the document.

The upshot is that we should not base our serialization on the MIME type of the document resource but just on the nsIDocument::IsHTML().
I've set up a local test instance, and ran the test again.

I get 1 Pass and 1 Fail; but what is confusing me is that sometimes
the first send() document fails and the second passes, and sometimes
the first send() document passes, and the second fails.

When a test fails, it would fail with the following error:

assert_equals: expected "<html><head></head><body>�</body></html>" but got "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head></head><body>�</body></html>"

Is there something I'm missing? (i.e. I set up the test system wrong)

So the xmlns=... is always added to the responsetext, am I right?
That's the simplest way to observe this bug, yes.
> It's standardized, to a large extent.

OK, I will add those things to the test after all :-)

Edmund: Seems the order the tests will be listed in depends on when iframe load events fire - i.e. a race condition. That's why the failing test(s) might end up in different places in the list. Sorry about that, I probably ought to fix it somehow..
Attached patch proposed patch (v1) (obsolete) — Splinter Review
Assignee: nobody → ewong
Status: NEW → ASSIGNED
Attachment #8372746 - Flags: review?(bzbarsky)
Comment on attachment 8372746 [details] [diff] [review]
proposed patch (v1)

This is a good start!

>+   * @param aRoot The root of the document.

No, the root of the subtree to be encoded.

You need to document what this method does.

Please make this an nsINode instead of nsIDOMNode.

>+  static nsresult SetUpEncoder(nsIDOMNode *aRoot, const nsACString& aCharset,

"nsINode* aRoot" ('*' to the left).

And maybe set up SetUpEncoderForNode?

Also, you either need a boolean to force XML or something?  Otherwise you're making the XML serializer not always serialize XML, which I'm 99% sure will make you fail tests and web compat.

>+             nsIDocumentEncoder **aEncoder);

Please fix the indent.

> +  if (doc->IsHTML()) {

What if "doc" is null?  What you probably want to do is set doc to aNode->OwnerDoc() instead.  And then QI domDoc from doc.

>+  dType.AssignLiteral(NS_DOC_ENCODER_CONTRACTID_BASE);

Why not just pass that to the constructor?

>+  nsString docType = NS_LITERAL_STRING("");

Drop the '=' and everything after it.

>+  bool entireDocument = true;

Once you have aRoot as an nsINode, this would be:

  bool entireDocument = (aRoot == doc);

>+  if (NS_FAILED(rv))
>+    return rv;

Curly braces around if bodies, please.

>+    *aEncoder = encoder.get();
>+    NS_ADDREF(*aEncoder);

  encoder.forget(aEncoder);

> GetRequestBody(nsIDOMDocument* aDoc, nsIInputStream** aResult,

This needs to stop using the XML serializer.  So it'll need to set up the encoder with the "use HTML if needed" option and then EncodeToString() on it.
Attachment #8372746 - Flags: review?(bzbarsky) → review-
(In reply to Boris Zbarsky [:bz] from comment #33)

There are a few things that I don't understand.

Changing nsIDOMNode to nsINode causes quite a few issues that I'm
still figuring out how to resolve.  For example, in 
nsDOMSerializer::SerializeToStream(nsIDOMNode *aRoot, ..) [I'm assuming
this should be changed to "nsIDOMNode* aRoot"?], it calls SetUpEncoder(aRoot...)

Do I change the parameter nsIDOMNode in SerializeToStream() to
nsINode or do I change the parameter aRoot in SetUpEncoder(aRoot, ...)
to something compatible with nsINode (via some conversion or such)?

 
> >+  static nsresult SetUpEncoder(nsIDOMNode *aRoot, const nsACString& aCharset,
> 
> "nsINode* aRoot" ('*' to the left).
> 
> And maybe set up SetUpEncoderForNode?

Do you mean create a new method SetUpEncoderForNode() or rename
SetUpEncoder() to SetUpEncoderForNode()?

> 
> Also, you either need a boolean to force XML or something?  Otherwise you're
> making the XML serializer not always serialize XML, which I'm 99% sure will
> make you fail tests and web compat.

Do you mean in the nsDOMSerializer::SerializeToStream(..) code or in
the GetRequestBody() code?  I'm not sure what I need to do here as I
don't quite understand the code that well yet.

> 
> > +  if (doc->IsHTML()) {
> 
> What if "doc" is null?  What you probably want to do is set doc to
> aNode->OwnerDoc() instead.  And then QI domDoc from doc.

I tried the following:

  nsCOMPtr<nsIDocument> doc = aRoot->OwnerDoc();

But I get the following error:

 error C2248: 'nsINode::GetOwnerDocument' : cannot access protected member declared in class 'nsINode'

> > GetRequestBody(nsIDOMDocument* aDoc, nsIInputStream** aResult,
> 
> This needs to stop using the XML serializer.  So it'll need to set up the
> encoder with the "use HTML if needed" option and then EncodeToString() on it.

I'm stuck with this as I don't know what to replace this code with:

nsCOMPtr<nsIDOMSerializer> serializer = do_CreateInstance(NS_XMLSERIALIZER_CONTRACTID, &rv);

That is if that is what is needed to be done. 

Really sorry for being dumb here.  I do appreciate your patience with me.
> or do I change the parameter aRoot in SetUpEncoder(aRoot, ...)
> to something compatible with nsINode (via some conversion or such)?

Yes:

  nsCOMPtr<nsINode> node = do_QueryInterface(aRoot);
  if (!node) {
    // Throw an exception here
  }

> error C2248: 'nsINode::GetOwnerDocument' : cannot access protected member declared in
> class 'nsINode'

That looks like you did aRoot->GetOwnerDocument(), not aRoot->OwnerDoc()?

> I'm stuck with this as I don't know what to replace this code with:

You replace it with the code that nsDOMSerializer::SerializeToString() (which is all that gets called here) actually executes.  Which is basically a security check, getting the encoder, and then calling EncodeToString().
(In reply to Boris Zbarsky [:bz] from comment #35)
> > error C2248: 'nsINode::GetOwnerDocument' : cannot access protected member declared in
> > class 'nsINode'
> 
> That looks like you did aRoot->GetOwnerDocument(), not aRoot->OwnerDoc()?

Nope.  I did aRoot->OwnerDoc().
> I did aRoot->OwnerDoc()

Is aRoot an nsINode or an nsIDOMNode?  It needs to be the former if you plan to call OwnerDoc() on it.
(In reply to Boris Zbarsky [:bz] from comment #37)
> > I did aRoot->OwnerDoc()
> 
> Is aRoot an nsINode or an nsIDOMNode?  It needs to be the former if you plan
> to call OwnerDoc() on it.

It's an nsINode*.  

nsresult
nsContentUtils::SetUpEncoder(nsINode* aRoot, const nsACString& aCharset,
                             nsIDocumentEncoder **aEncoder)
{
  *aEncoder = nullptr;

  nsCOMPtr<nsIDocument> doc = aRoot->OwnerDoc();
  nsCOMPtr<nsIDOMDocument> domDoc(do_QueryInterface(doc));
OK, are you sure your compile message is from that line?  Because it's clearly talking aboutnsINode::GetOwnerDocument.  Do you still have this line in your code somewhere:

>+    rv = aRoot->GetOwnerDocument(getter_AddRefs(domDoc));

?  If you do, it needs to go away, of course.
(In reply to Boris Zbarsky [:bz] from comment #39)
> OK, are you sure your compile message is from that line?  Because it's
> clearly talking aboutnsINode::GetOwnerDocument.  Do you still have this line
> in your code somewhere:
> 
> >+    rv = aRoot->GetOwnerDocument(getter_AddRefs(domDoc));
> 
> ?  If you do, it needs to go away, of course.

Oh right. Sorry.  I was looking at the wrong line of course.

I changed that to:

   domDoc = aRoot->OwnerDoc();
> I changed that to:

I suspect you don't need that line at all, but hard to tell without seeing the diff.
Mentor: bzbarsky
Whiteboard: [mentor=bz][lang=c++] → [lang=c++]
Still working on this, Edmund?
Flags: needinfo?(ewong)
(In reply to Josh Matthews [:jdm] from comment #42)
> Still working on this, Edmund?

No.  Stumbled and faulted.  Will release it so maybe someone can work on it.
Sorry, bz.  I was hoping to work on this but not knowing the concepts doomed
me. :(
Assignee: ewong → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(ewong)
This seems like the simplest way to achieve what I think this patch wants.
Attachment #8372746 - Attachment is obsolete: true
Attachment #8665017 - Flags: review?(jduell.mcbugs)
I'm not sure if this is even close to the correct way to push changes to web platform tests - so tell me if I should be doing something else!
Attachment #8665019 - Flags: review?(jduell.mcbugs)
Assignee: nobody → michael
Michael, changes to web-platform-tests are better submitted as pull requests on Github :)
https://github.com/w3c/web-platform-tests/
No, this is a totally acceptable way to get a wpt change. We will upstream next time we do a sync.
Cool, didn't know that. Thanks James :)
Comment on attachment 8665017 [details] [diff] [review]
Part 1: Send text/html as MIME type for XHR send() of HTML document

This changes the header, but does it change the body?

That is, given:

  data:text/html,<img>

If you send that via XHR do you get "<img>" in the string, or "<img/>"?
And even more importantly, given:

  data:text/html,<div></div>Hello

do you get "<div></div>Hello" or "<div/>Hello"?  Because if we're claiming to be text/html then the latter is not ok, obviously.
(In reply to Boris Zbarsky [:bz] from comment #51)
> And even more importantly, given:
> 
>   data:text/html,<div></div>Hello
> 
> do you get "<div></div>Hello" or "<div/>Hello"?  Because if we're claiming
> to be text/html then the latter is not ok, obviously.

It encodes it as XHTML - so <img> becomes <img />, but the empty div `<div></div>` stays as such.

It uses nsDOMSerializer to serialize right now - which I think serializes as XHTML - Is there a html serializer which I should use for HTML documents?
Flags: needinfo?(bzbarsky)
There's built-in functionlity for that, yes. Compare what nsParserUtils::Sanitize does to what SetUpEncoder in nsDOMSerializer.cpp does.  It's not exposed on nsIDOMSerializer at the moment, but we should perhaps change that.
Flags: needinfo?(bzbarsky)
Comment on attachment 8665017 [details] [diff] [review]
Part 1: Send text/html as MIME type for XHR send() of HTML document

Review of attachment 8665017 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not a dom peer, which is where this code lives (and more importantly, I'm by no means an expert on the exact issue here).  bz has been commenting, so punting to him, or whoever he finds to review (note: it's a 3 line patch).
Attachment #8665017 - Flags: review?(jduell.mcbugs) → review?(bzbarsky)
Attachment #8665019 - Flags: review?(jduell.mcbugs) → review?(bzbarsky)
Comment on attachment 8665017 [details] [diff] [review]
Part 1: Send text/html as MIME type for XHR send() of HTML document

Per the comments....
Attachment #8665017 - Flags: review?(bzbarsky) → review-
Comment on attachment 8665019 [details] [diff] [review]
Part 2: Update Web Platform tests to check for correct behavior

r=me
Attachment #8665019 - Flags: review?(bzbarsky) → review+
Attachment #8665017 - Attachment is obsolete: true
Attachment #8671451 - Flags: review?(bzbarsky)
I made HTML documents be serialized through the FragmentOrElement logic, but I'm really not happy with
the amount of garbage which I dumped into nsContentUtils.cpp - so I'm looking for feedback as to what
I should do there, as well as whether this is doing close-ish to the right thing.
Attachment #8671452 - Flags: feedback?(bzbarsky)
I'm asking for review again because I made more changes due to the html formatting changes. 

I'm not super happy with this either. I feel like the wpt should be changed to do an explicit 
equality (because right now the test would pass if you added doctypes to documents which 
shouldn't have doctypes, etc.

AFAICT the reason why the WPT does indexOf is a problem with presto, which AFAIK isn't being
developed any more - maybe we could get away with doing equality tests now?
Attachment #8665019 - Attachment is obsolete: true
Attachment #8671454 - Flags: review?(bzbarsky)
(In reply to Michael Layzell [:mystor] from comment #59)
> Created attachment 8671454 [details] [diff] [review]
> Part 2: Update Web Platform tests to check for correct behavior
s/Part 2/Part 3
Comment on attachment 8671451 [details] [diff] [review]
Part 1: Send text/html as MIME type for XHR send() of HTML document

This should really happen in the same changeset that changes the actual behavior of GetRequestBody.  Which can be a different changeset than the one that adds the infrastucture for the changed behavior, of course.
Attachment #8671451 - Flags: review?(bzbarsky) → review-
(In reply to Boris Zbarsky [:bz] from comment #61)
> Comment on attachment 8671451 [details] [diff] [review]
> Part 1: Send text/html as MIME type for XHR send() of HTML document
> 
> This should really happen in the same changeset that changes the actual
> behavior of GetRequestBody.  Which can be a different changeset than the one
> that adds the infrastucture for the changed behavior, of course.

That's reasonable - I'll pull the infrastructure apart from the GetRequestBody changes.
Comment on attachment 8671452 [details] [diff] [review]
Part 2: Correctly serialize HTML documents for XHR send()

So I think this changeset should probably be split into two changesets:

1)  Move the code to nsContentUtils without making any real changes to it; just a cut/paste with a rename of the one entry function and maybe the change of its arg type to nsINode.

2)  The changeset to add the doctype handling and such.

It looks like now we have two copies of ShouldEscape.  Why is that?  We don't want two copies of it....  Nor do we want two copies of the StringBuilder stuff, right?

Past that, putting this stuff in nsContentUtils seems fine, though we could also consider an nsSerializationUtils or something.
Attachment #8671452 - Flags: feedback?(bzbarsky) → feedback+
Comment on attachment 8671454 [details] [diff] [review]
Part 2: Update Web Platform tests to check for correct behavior

(In reply to Michael Layzell [:mystor] from comment #59)
> AFAICT the reason why the WPT does indexOf is a problem with presto

No, it does it because for some MIME types (including text/plain and image/gif) the spec doesn't require any particular DOM, so doesn't require a particular serialization here.

I think the test should be fixed to flag those explicitly and do an exact match on the ones where the DOM _is_ specified and a substring match on the ones where it's not.  Separate bug ok for that.

r=me
Attachment #8671454 - Flags: review?(bzbarsky) → review+
Comment on attachment 8671531 [details] [diff] [review]
Part 1: Move Fragment Serialization logic into nsContentUtils::SerializeNodeToMarkup

It looks like you switched the order of AppendEncodedCharacters and AppendEncodedAttributeValue.  Was there a reason for that?  If not, please put them back in the same order so it's clear that the code just got moved.

In AppendEncodedAttributeValue the cases of the switch got outdented by 2. Please put them back the way they were.

r=me.
Attachment #8671531 - Flags: review?(bzbarsky) → review+
Oh, one other thing.  Does FragmentOrElement.cpp still use BloomFilter.h?  If not, please stop including it there.
Comment on attachment 8671532 [details] [diff] [review]
Part 2: Send text/html as MIME type for XHR send() of HTML document

Oh, this is so much simpler to review.... ;)

>+      case nsIDOMNode::DOCUMENT_NODE: {

So this case can only be hit if the caller passed in a document as aRoot and passed aDescendentsOnly == false, right?

How about we just require callers who want to serialize a document to pass aDescendantsOnly == true, assert if someone doesn't do that, and not add this case?

>+    nsContentUtils::SerializeNodeToMarkup(doc, false, serialized);

If this returns false, shouldn't you throw NS_ERROR_OUT_OF_MEMORY or some such instead of silently sending an empty string?

r=me with those fixed.  Nice work!
Attachment #8671532 - Flags: review?(bzbarsky) → review+
Regarding the test using indexOf() instead of equality, I did indeed have an issue with Presto adding an extra <?xml version="1.0"?>, but more generally as the test is in a XMLHttpRequest test suite I'm trying to stick (more or less) to test XHR stuff and avoid veering into other things - like XML or HTML serializations. So the test on purpose avoids asserting things that have more to do with serialization.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: