Closed Bug 289714 Opened 19 years ago Closed 8 years ago

Loaded-as-data XML documents should not generate <parsererror>

Categories

(Core :: XML, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: bzbarsky, Assigned: wisniewskit)

References

Details

(Keywords: dev-doc-complete, site-compat)

Attachments

(2 files, 16 obsolete files)

25.38 KB, patch
Details | Diff | Splinter Review
2.80 KB, patch
Details | Diff | Splinter Review
When we have a well-formedness error in an XML doc loaded as data, should we
really generate a <parseerror> node?  Or should we just clear out the DOM and
return?  It seems like the latter would be a better idea; the question is
whether people out in the wild seriously depend on the former behavior...

Thoughts?  I think this should be pretty easy to fix if we decide to do it (esp.
once bug 251354 is fixed).
Depends on: 251354
I do actually think people depend on the current behaviour. But I also think
it'd be ok to break them as long as we provided some other mechanism for
detecting the parse error. For synconous loads like xmlhttprequest (and doc.load
once we do that) we should IMHO throw an exception.
Sync document.load() returns a boolean to indicate success or failure; it can
return false (as it does now) on parse error.

For xmlhttprequest, don't we already have a way to report errors?  We should use
it if we do.

I agree that we should have a way to indicate whether the document was parsed
correctly, though.  Maybe even add a boolean property on some document interface?
I think that if all the methods of loading indicate failure somehow, then it's
enough to just empty the document on failure since a successful load can never
result in an empty document. Though of course an additional flag wouldn't hurt,
but it might get tricky to figure out when to clear that flag...
OK, makes sense.  Will look into doing this, then.
Oh, and of course, we should check how IE deals with errors.
I don't think XMLHttpRequest has any method other than the <parseerror> element
to report the actual error. If you are going to do something else, why not
standardize on what MS does:
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk/html/xmproparseerror.asp>
> I don't think XMLHttpRequest has any method other than the <parseerror> element
> to report the actual error.

It has an onerror handler, looks like.  Not completely trivial to use from C++,
but really easy from JS.

> why not standardize on what MS does

It's a thought.  Unfortunately, it's not very clearly documented.  For example,
the errorCode property is undocumented.  Do we need compat in errorCode values
if we implement this?  The filepos property doesn't say what it means by
"absolute file position" (is that bytes?  Or what?).  
(In reply to comment #7)

> It has an onerror handler, looks like.  Not completely trivial to use from C++,
> but really easy from JS.

True, but what error information do you get? 

> It's a thought.  Unfortunately, it's not very clearly documented.  For example,
> the errorCode property is undocumented.  Do we need compat in errorCode values
> if we implement this?  

I wouldn't think so, but you can start with 
1)
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk/html/PushModelParser_ErrorCodes.asp>
2)
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcexmldm/html/cerefIXMLParseErrorErrorMessages.asp>

1) is deprecated and 2) is supposedly CE .Net 4.2, but they share values and I
expect are standardized across MSXML implementations.

I would think that we would not have to be identical to their errorCode, nor in
their reason so long as we agreed that non-zero errorCode was an error.

The filepos property doesn't say what it means by
> "absolute file position" (is that bytes?  Or what?).  

If you are really interested, I can experiment and try to find out.

> True, but what error information do you get? 

Not much, really.  Just that there was an error.  Just like the return value
from document.load.

> If you are really interested, I can experiment and try to find out.

If people don't rely on it, I'm not that interested... if they do, I sort of am.
I think we should make this change, and make it as early as we can.

So the things we currently load as data are:  XMLHttpRequest, DOMParser, anything loaded by the syncload service, XMLDocument.load, and some xforms and xslt stuff.

I think we should treat these as following:

  XMLHttpRequest:  Null responseXML (already happens if the response is not XML)
                   and dispatch an error event.  For sync, throw.
  DOMParser:  Throw.
  Async XMLDocument.load: error event
  syncload service: throw
  XForms: ???
  XSLT:  ???

We seem to already have some code in our tree that assumes it should look for a parsererror tag.  We'll have to fix that, of course.

I do think we should do this asap so sites that depend on this have time to get fixed.
Flags: blocking1.9?
Summary: Loaded-as-data XML documents should not generate <parseerror> → Loaded-as-data XML documents should not generate <parsererror>
This is really annoying, there's no reason we should keep doing this for yet another release.
Assignee: xml → peterv
Flags: blocking1.9? → blocking1.9+
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla1.9alpha6
Pushing.
Target Milestone: mozilla1.9alpha6 → mozilla1.9beta1
Depends on: 45566
Assignee: peterv → Olli.Pettay
Status: ASSIGNED → NEW
Depends on: 313854
Let's see if I manage to do this for M9. I should.
Target Milestone: mozilla1.9 M8 → mozilla1.9 M9
Safari has also some kind of error document, which for example DOMParser
returns. Opera throws LSException "PARSE_ERR".
Neither of those, nor IE, seem to use onerror of XHR to indicate error in parsing.
So I'm not so sure how important this is or what should be done in each case.
With XHR you could just set the document to null, I guess.
I'll have to make changes to https://www.squarefree.com/userscripts/valid-xhtml.user.js (assuming this fixes bug 45566), but I'm fine with that.
Bug 399502 should be fixed first to get events from xmldoc.load()
Depends on: 399502
As far as I see, XSLT generates the error page only if the page is loaded
to be visible (not as data). Document loading should be cleaned up a bit, but
for 1.9 XSLT behaves well enough, IMO.
So:
  XMLHttpRequest:  Null responseXML
  DOMParser:  Throw LSException like Opera
  Async XMLDocument.load: error event
  syncload service: throw
  XForms: Handles internally, report error to console as done already now.
  XSLT: when used in scripts, do whatever documented in nsIXSLTProcessor.idl
        (throw nsIXSLTException)

DOMParser could also just return null document?
Attached patch DOMParser throws (obsolete) — Splinter Review
Peter, any comments?
Attachment #285230 - Flags: review?(peterv)
Please include information about the XML parsing error in the exception: line number, column number, reason.  https://www.squarefree.com/userscripts/valid-xhtml.user.js needs that information.
Ah, true, that would be useful.
Though, I don't know how to do it, or can't find any reasonable way to do it.
We've shipped this forever, right?  Why should this block beta?
Yeah, this should not block beta. I'd even say it shouldn't even block the release at this point, but since there is a patch we might as well drive it in.
I'm minusing.  That doesn't mean it won't go in, but we need to prioritize accordingly.  Anyone who objects can renom.
Flags: blocking1.9+ → blocking1.9-
I thought the idea was that this is a pretty major change in behavior so we should get it shipped in beta so that web sites can adjust to it as needed.  That's if we're doing it for 1.9, of course.

Put another way, I would argue that if this doesn't make beta we shouldn't take it for 1.9.
I don't think this is a big enough change that if it doesn't make first beta we can't include it later. Most sites probably has no error handling at all so they assume that the XML is wellformed, and likely only ever gets that.
Whiteboard: [wanted-1.9]
(In reply to comment #21)
> Please include information about the XML parsing error in the exception: line
> number, column number, reason. 

The exception should contain also the JS line number etc where
for example parseFromString is called. Should the stack (nsIException.location)
contain the information about xml parsing error or perhaps inner exception
(nsIException.inner)? Suggestions?
Flags: blocking1.9-
Flags: wanted1.9+
Whiteboard: [wanted-1.9]
Attachment #285230 - Flags: review?(peterv)
QA Contact: ashshbhatt → xml
Comment on attachment 285230 [details] [diff] [review]
DOMParser throws

We can't throw for DOMParser parsing errors.
Attachment #285230 - Flags: review-
What's the status of this now? I filed 837463 and it got marked as a duplicate of this, but there hasn't been any activity here for nearly a year.
Blocks: 313854
No longer depends on: 313854
I decided to try my hand at unraveling this again, based on the comments in this bug. To summarize, this patch triggers onerror instead of onload for an XHR that tries and fails to parse its response as XML. The onerror ProgressEvent is also given a sub-object (tentatively named relatedError) with the related parsing failure's info (message, source, source line, and line/column number). This seems to be in-line with what the comments in this bug were aiming at.

However, there are some web platform test issues that I believe need clarification. Firstly, if the XML response body is an empty string, the second test in XMLHttpRequest/responsexml-basic.htm expects the responseXML to be nulled out and for onload to be triggered. This patch would trigger onerror with a "no element found" relatedError, so I added a check to allow 0-length XML responses to be treated as this test currently requires. I'm not sure if the test should be changed instead, though.

There is also DOMParser-parseFromString-xml.html, which actually expects <parsererror> nodes for DOMParser() failures. This runs counter to what the XMLHttpRequest web platform tests expect. However, I'm again not sure if it would be best to change the tests, and have DOMParser() throw exceptions with relatedErrors (or something along those lines). As such I opted to just honor the current tests and have DOMParser() continue to add <parsererror> nodes. This of course makes the patch a bit more complex; a new SetDocumentFlags() call is used by XHRs specifically to tell the XML content sink not to add a <parsererror> node to the resulting document. DOMParser() of course doesn't make this call.

There are also a couple of web platform tests (and one mochitest) that were timing out, simply because they're sending text/plain responses without setting their Content-Type header to reflect that. As such, the XHR expected XML content, failed to parse it, and onerror was triggered by this patch. The tests didn't even think to check for an onerror case, and so they sat there until they timed out. I altered the relevant tests so they return the correct content type, and to also check for and treat onerror as a failure, so they wouldn't have to time out.

Finally, there is XMLHttpRequest/responsetext-decoding.htm. With this patch, four of its tests time out similarly, but it seems like they may intend to treat the content-type as XML. The problem is that their responses are only a BOM or a BOM followed by another BOM, which this patch treats as a non-0-length case for onerror. However, the tests don't seem like they should care whether onload or onerror are fired, as long as the responseText is set appropriately, and so I've simply altered them to pass in either case. I'm not at all sure if that's correct, though.

As for my patch itself, I opted to save the nsIScriptError from the XML content sink on the related nsIDocument during parse-time, then save it on the XMLHttpRequest for future use during the onerror code (since the pointed to the nsIDocument can be nulled before then). I'm not sure if that's the most graceful or appropriate approach, as I'm still new to the codebase.

As mentioned before, I also added SetDocumentLoadFlags() to handle the DOMParser()/XMLHttpRequest divide, since this seemed like the easiest and most obvious way to handle that case. I'm not sure if another similar mechanism would be more appropriate, though.

I'm also not sure if adding a new WebIDL type for ParseErrors is the way to go, but it seemed to be the path of least complexity, so I thought I'd start with that.
Flags: needinfo?(hsteen)
Per the XMLHttpRequest spec:

https://xhr.spec.whatwg.org/#document-response
"let document be a document that represents the result of running the XML parser with XML scripting support disabled on received bytes. If that fails (unsupported character encoding, namespace well-formedness error, etc.), then return null."

So no XML parsing failures will (per spec) cause XHR error events, as far as I can tell.

IMHO it would be nice to align DOMParser and not create those parsererrror elements - but my opinion shouldn't count for much here, I'll rather ask Boris to chime in on this.

Triggering XHR.onerror for responses sent without content-type (and assumed to be XML): no, please. This is not likely to be web compatible.

Maybe silly question: why do we make an effort to make error details available to JS at all? Can't we just report the details in a console error message and be done? Perhaps exposing those details made sense in 2005, but the web has moved on and XML is much less important. JS-implemented XML error handlers is probably a very rare beast..
Flags: needinfo?(hsteen) → needinfo?(bzbarsky)
Sure, if all that's desired is to null out the document before onload, then this patch would certainly become much simpler.

You could be right the exposing the error in a more fine-grained manner may no longer matter much, since I basing that off comment #21 (from 2007). I was concerned because while the console does show a terse error message indicating there was a parsing failure, it doesn't give you specifics (which XHR had the parse failure, what the source was, etc). Perhaps that should be extended instead?
I think reporting more info in the console would be great.
(Right now we often output utterly confusing and maddening "No element found" error messages with no context. There are plenty of questions on Stackoverflow regarding what this error means, improving it would be terrific).

annevk mentioned on IRC that the parsererror element thing is actually added to the DOM parsing standard now - I think it's a wart, but it was considered necessary for web compat so..
https://w3c.github.io/DOM-Parsing/#the-domparser-interface
Flags: needinfo?(bzbarsky)
Alright, here's a much simpler version of the patch which simply nulls out responseXML upon parsing failures, rather than doing anything with onerror. I'll see if I can find a way to improve the console messages, but I'm not sure what would be best. Perhaps even just prefixing them with "Error parsing XML: " would be a good start?
Attachment #8759512 - Attachment is obsolete: true
"Error parsing XML: no element found" would be a great improvement, especially if it's also clearly associated with a URL. I'd suggest changing the wording slightly and say "no root element found".

Also, is un-labelled XML a thing? Do other browsers assume XML if a site sends XML without sending a content-type? Perhaps assuming text/plain would work better :)
True, browsers don't assume XML in that case, as Chrome and Firefox currently return response==responseText and responseXML=null for unspecified content-type responses. My earlier patch was just flawed and I didn't realize it at the time. The new patch doesn't have that kind of issue, thankfully.

As for changing the console error, this patch seems to do the job. The messages now have this format:
  XML Parsing Error: no root element found
  Location: http://w3c-test.org/XMLHttpRequest/resources/status.py?type=application/xml&content=
  Line Number 1, Column 1:

Of course the line/column number info is superfluous, given that it's repeated in the little link to the far right of such messages, but I'd rather re-use this format string since it's already in the localization file.
> I'll rather ask Boris to chime in on this

I don't know that I have any new opinions on this since my last comments in the bug, modulo checking what other browsers do and hence what web compat is likely to require.

> why do we make an effort to make error details available to JS at all

You mean why is there a <parseerror> thing in general?  Because that XML parser behavior was designed around an XML parser parsing content in a browsing context (i.e. yellow screen of death) and the idea was to show the user something at least moderately more useful than just a blank page, I think....  The side-effects on XHR and whatnot were not intended, but the parser behavior predated those things existing, so...

> Also, is un-labelled XML a thing? 

In general (i.e. not just over HTTP), sure.  Happens all the time for local files, right?  ;)

If you create a file with no extension containing this text and fetch it via XHR to a file:// URI, what should happen and why?  What should happen if its extension is .docx?

<?xml version="1.0"?>
<root>aaa</root>

and then try to fetch it via XHR
Attached patch 289714-fix-gmp-test.diff (obsolete) — Splinter Review
A try run uncovered two bugs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f83f3523f74a

This patch fixes the first, where GMP has an "empty" XML document that's not really an XML document at all (no root element). Looking at the test, it's fine to just add an empty root element, since it's currently passing because there's a parsererror that it's ignoring.
Attached patch 289714-fix-xpcshell-tests.diff (obsolete) — Splinter Review
This patch fixes the other related failures in the above try run, where XPC shell tests are reading a non-XML file to test overriding the mime type, which of course fails because now the responseXML is null. As such I change the test to read an actual XML file and make sure it's still overriding the mime type as expected.

A new try run shows that there are only failures that seem completely unrelated to me (running them locally gives different results, which are the same results that I get without this patchset applied): https://treeherder.mozilla.org/#/jobs?repo=try&revision=dab588e6c61e
Attachment #8760015 - Flags: review?(peterv)
Attachment #8760023 - Flags: review?(peterv)
Alright, I've requested review for the non-test-change patches, as I think this is about where it needs to be.
I've rebased the patches, and folded the test-related ones into the main one where they ultimately belong.

Looks like a try run for all of them passes, aside from an unrelated intermittent and some other odd "pulse_actions" task I don't recall scheduling myself, which seems to have stalled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e64abece97eb

Also note that two other bugs should be fixed by these changes as well: bug 521301 (related) and bug 313854 (a dupe, as far as I can tell).
Attachment #8760015 - Attachment is obsolete: true
Attachment #8763159 - Attachment is obsolete: true
Attachment #8763160 - Attachment is obsolete: true
Attachment #8760015 - Flags: review?(peterv)
Attachment #8767029 - Flags: review?(jst)
And here's the rebased version of the additional patch to make the console errors a bit nicer.
Attachment #285230 - Attachment is obsolete: true
Attachment #8760023 - Attachment is obsolete: true
Attachment #8760023 - Flags: review?(peterv)
Attachment #8767030 - Flags: review?(jst)
Rebased.
Assignee: bugs → wisniewskit
Attachment #8767029 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8767029 - Flags: review?(jst)
Attachment #8771722 - Flags: review?(jst)
Attachment #8767030 - Flags: review?(jst) → review+
Comment on attachment 8771722 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs.diff

This looks good to me, though someone, maybe James Graham, should sign off on the web-platform test changes as I don't know the intricacies of the files involved in that. And someone who works on toolkit (Justin Dolske?) needs to review the toolkit changes. A few minor nits below.

+XMLDocument::SetDocumentLoadFlags(int32_t flags) {
+  mLoadFlags = flags;
+}
+
+int32_t
+XMLDocument::GetDocumentLoadFlags() {

Function opening brace on its own line in this file please.

- In nsExpatDriver::HandleError():

   if (serr) {
-    rv = serr->InitWithWindowID(description,
+		rv = serr->InitWithWindowID(description,

This looks like a spurious white space change.
Attachment #8771722 - Flags: review?(jst)
Attachment #8771722 - Flags: review?(james)
Attachment #8771722 - Flags: review+
Comment on attachment 8771722 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs.diff

Dolske, can you have a look at the toolkit changes here? Or find someone else if there's a better candidate.
Attachment #8771722 - Flags: review?(dolske)
Comment on attachment 8771722 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs.diff

Review of attachment 8771722 [details] [diff] [review]:
-----------------------------------------------------------------

wpt changes look fine to me.
Attachment #8771722 - Flags: review?(james) → review+
(In reply to Johnny Stenback  (:jst, jst@mozilla.com) from comment #47)
> Comment on attachment 8771722 [details] [diff] [review]
> This looks like a spurious white space change.

You're right, it seems like I uploaded a strange version of that patch. That line was meant to be changing "description" to "errorText" (to yield more error info). Here's a correct version of the patch (the other patch seems fine).
Attachment #8767030 - Attachment is obsolete: true
Ah, it turns out that this patch did indeed have a spurious change in the same place. Here's a fresh version with your comments addressed and that spurious change removed, carrying over existing r+'es.
Attachment #8771722 - Attachment is obsolete: true
Attachment #8771722 - Flags: review?(dolske)
Attachment #8772960 - Flags: review?(dolske)
Attachment #8772957 - Flags: review?(jst)
Attachment #8772957 - Flags: review?(jst) → review+
Comment on attachment 8772960 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs.diff

Review of attachment 8772960 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not the right reviewer for the bulk of this patch, JST reviewed the other part, so flag him for this one too? (bz would be the other obvious choice)

But looking at the /toolkit part, I'm a little confused what this change is doing; it sounds like it you're making malformed XML call onerror instead, but then why the onload changes?

Also:

http://searchfox.org/mozilla-central/rev/6bb28c9903fe38d445cd2d3b17af4eabdade9335/toolkit/mozapps/extensions/content/extensions.xml#1461
http://searchfox.org/mozilla-central/rev/6bb28c9903fe38d445cd2d3b17af4eabdade9335/toolkit/mozapps/extensions/nsBlocklistService.js#654

AIUI this (XMLURI_PARSE_ERROR check) can never happen now?
Attachment #8772960 - Flags: review?(dolske)
No, ultimately I'm not calling onerror, since as per comment 34 it was considered a no-go due to compatibility concerns. As such I'm just nulling out the document to match the current spec. That's why those test changes are necessary.

And yes, it seems like XMLURI_PARSE_ERROR stuff needs to be changed to do a null-check as well. Thanks for mentioning that! I'll do another check for references like that I may have missed and update the patch before requesting JST review this.
As jst asked dolske to find another reviewer and he suggested you, it looks I'm back in your review queue, bz :)

Here's a fresh version of the patch where I've rechecked the codebase to see if I missed any XHR checks for parsererror/XMLURI_PARSE_ERROR as dolske suggested.

A try run seems fine, just stuff that appears unrelated: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe9aa9542945

To sum up the patch, it changes XHRs to return null instead of a document with a <parsererror> root, if parsing fails. It does not onerror or throw, it just returns null. It only affects XHRs, not DOMParser/etc.
Attachment #8772960 - Attachment is obsolete: true
Attachment #8775352 - Flags: review?(bzbarsky)
Have we checked what other browsers do here?

Do you know why there's still a test failure in XMLHttpRequest/responsexml-basic.htm ?
Flags: needinfo?(wisniewskit)
>Have we checked what other browsers do here?

I haven't tested it beyond Firefox and Chrome in the last couple of years, no (back then Firefox was the only one giving parsererror documents for XHRs).

Chrome currently returns a null document instead of a parsererror, so I doubt other browsers have changed, but if you'd like to be more through I'll try to find out ASAP.

>Do you know why there's still a test failure in XMLHttpRequest/responsexml-basic.htm ?

It seems to be failing because getElementById("n2") is returning a node, despite that node being set to a weird namespace. The test is otherwise passing fine.
Flags: needinfo?(wisniewskit)
> if you'd like to be more through I'll try to find out ASAP

That would be much appreciated.

> despite that node being set to a weird namespace

Oh.  That's a bug in the test.  id is a global attribute on all elements, regardless of namespace, nowadays.
Comment on attachment 8775352 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs.diff

>+++ b/browser/base/content/test/general/browser_bug409481.js
>-  ok(root.nodeName != "parsererror", "Sidebar is well formed");

What does this have to do with XMLHttpRequest?  That is, why is this change needed?

>+  ok(sidebar, "Sidebar is well formed");

No, just claiming that there is an element with this id in the document says nothing about what's inside it, which is what the old test tested.

>+++ b/dom/base/nsIDocument.h
>+  virtual void SetDocumentLoadFlags(int32_t flags) {}

This is probably more generic than it needs to be.  How about just an mSuppressParseErrorElement boolean and SuppressParseErrorElement()/SetSuppressParseErrorElement functions?

Needs documentation too.  Including why, for example, this is not the state for all data documents (e.g. because DOMParser _does_ want parseerror).

>+++ b/dom/xhr/XMLHttpRequestMainThread.cpp
>-  if (mResponseCharset != mResponseXML->GetDocumentCharacterSet()) {

No, this is wrong.  Consider the case when we start getting data and start parsing a document.  Partway through that, the page gets responseText.  We need to use the document encoding to generate that, no?  In the new setup this won't work right.  r- for this part.

This should be testable; if we don't have a test testing it already, please add one.

Of course you're right that you have to do this again in OnStopRequest before we drop our mResponseXML reference in the error case.  So some of this logic might need to be factored out into a helper function.

Have you looked through chrome uses of responseXML and made sure they can all handle null?  Or was toolkit/components/search/nsSearchService.js caught by tests?

Have you looked at addons mxr for consumers of this stuff at all?
Attachment #8775352 - Flags: review?(bzbarsky) → review-
Hmm, thanks for that feedback. I'll reconsider the patch accordingly.

>Have you looked through chrome uses of responseXML and made sure they can all handle null?  Or was toolkit/components/search/nsSearchService.js caught by tests?

I did look through the codebase for mentions of parsererror and its error-page URI, but that indeed may not have been enough (at least a try attempt didn't find anything, for what that's worth).

>Have you looked at addons mxr for consumers of this stuff at all?

No, I'm not sure how to do that. Was there documentation on using addons mxr?
I gues dxr now, not mxr.

You go to https://dxr.mozilla.org/addons/source/ (assuming you have the permissions to log in to that; let me know if you don't) and then you can search stuff.  I definitely see a ton of hits on "parsererror" in there....
But some of them might be using DOMParser, not XHR...
Ok, good to know. I don't have a Mozilla LDAP account yet, so I can't log into the site. Is there a specific process I should follow to get an LDAP account?
Hmm...  I actually don't know.  I expect you do have an LDAP account if you can push to try, though you may not know the username/password.  But not all LDAP accounts have access to addons source and I'm not sure what the procedure there is.

In any case, I'm seeing 140000-some hits on parsererror in addons.  And at least some are from XHR.

Maybe it's worth only changing non-system-principal XHR or something.  :(

Jorge, how do you think we should proceed here?
Flags: needinfo?(jorge)
Hmm. Would it be possible to only do so for "older" addons, so that WebExtensions also have the standards-dictated behavior?

Edge 13 and 14 preview also return a null responseXML and keep the response as text. I'm also trying to build WebKit-gtk on my devbox to confirm what happens there.
If you have a pointer to a testcase, I can just try it in Safari for you.

For the rest.... I'm not sure whether we have a way to detect whether our caller is a webextension or not.  If we do, it makes sense to give them the new behavior, yes.
Sure, I have a test-case running at http://thomas.tanrei.ca/xhrtest.html

Unless Safari is different, it should show that responseXML=null and response="<xml>test/ml>"
Yep, that's what it's showing.
Alright, are there other browsers we ought to test, or is it sufficient for our purposes that Chrome, Edge, and Safari don't yield parsererrors?
That's sufficient for web compat, yes.  ;)
I gave the results a quick look and two things stand out:

1) jquery is flagged, so that generates tons of results. Assuming jquery handles this change gracefully, we can discard them.

2) BrandThunder add-ons are flagged, and there are hundreds of those. Adding Mike Kaply who might be able to tell us if they rely on this behavior and whether it's an easy change to adapt to.

I'm only starting to use DXR, so I don't know how to filter these out and see what else is left.

As for giving Thomas access, it would be good to know if he has an LDAP account first :)
Flags: needinfo?(jorge)
If I do have one, I don't know the credentials. Is there someone I should ping to find out?
> assuming jquery handles this change gracefully, we can discard them.

Hmm.  The relevant jQuery code is:


	httpData: function( xhr, type, filter ) {
		var ct = xhr.getResponseHeader("content-type"),
			xml = type == "xml" || !type && ct && ct.indexOf("xml") >= 0,
			data = xml ? xhr.responseXML : xhr.responseText;

		if ( xml && data.documentElement.tagName == "parsererror" )
			throw "parsererror";

I guess in other browsers it ends up throwing something other than "parsererror" is all.

Needinfoing Mike for the brandthunder thing.
Flags: needinfo?(mozilla)
I assume you mean these lines:

chrome/content/bindings/btpfeedreader.xml:              if (this.responseXML.documentElement.nodeName == 'parsererror') {
chrome/content/bindings/btpfeedreader.xml:                if (dom.documentElement.nodeName == 'parsererror') {

In various add-ons. If the behavior breaks, it doesn't bother me. Those are rare error cases anyway (bt is sunsetting a lot of this stuff).

So they're good.


That being said, will there be any way to know it's a parser error versus something else?

parsererrors can be fixed (and that's exactly what I do - attempt to fix up responseText and reparse).

The only way I know that is because of the parsererror on the node...
Flags: needinfo?(mozilla)
>That being said, will there be any way to know it's a parser error versus something else?

I'm not 100% sure on that. Unfortunately, as per comments such as 34 and 36, it was deemed too web-incompatible to just fire an onerror with a more precise reason in this case. We could side-step the issue for addons/system code (as per comment 63) by allowing system/privileged XHRs to continue getting parsererror nodes (though I would prefer if WebExtensions wouldn't allow such a hack). Or perhaps the spec could be extended to add a "parserError" type of attribute on the XHR itself in that kind of event, but I admit that I'm doubtful that kind of idea would gain traction (I'm certainly up for trying, though).

So as it stands right now, per spec one would always get a "successful" XHR (onload/readystatechange=4) on parser errors, but with a null responseXML. I'd imagine there are other ways to end up in the same condition, but thankfully the DOMParser code still has to return parsererror nodes (also for web-compat reasons, funny enough). As such one could at least re-parse the responseText with DOMParser to get a more precise error case, even if that's sub-optimal.

PS: speaking of not allowing an IsSystemXHR() hack for WebExtensions, a quick glance at the codebase shows a isWebExtensionContentScript property which seems almost promising, but I'm not sure if that could be easily used by the XHR code.
Nothing should be stopping us from firing a parseerror event or what not in Chrome if that'd be needed to be able to clearly distinguish between parse errors and other failures...
Alright, here's a version of the patch addressing your review comments, bz.

Ultimately I decided to just keep doing things the same way as they're being done now, for chrome XHRs (ie, keep sending a parsererror element). Web content and WebExtensions will get the spec-compliant behavior, and the biggest change to the patch consists of the checks to verify whether we're in a WebExt or not. I haven't added any test-cases for these bits yet, but I did at least confirm by hand that things work as I would expect, so I figure this is at least review-worthy.

Note that I also corrected the web platform test's issue that you mentioned in comment 57, so we'll now pass that test completely.

A fresh try run is ongoing: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eaefaafd2bc5
Attachment #8775352 - Attachment is obsolete: true
Attachment #8776761 - Flags: review?(bzbarsky)
Comment on attachment 8776761 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs-from-web-content.diff

The nsIDocument additions still need documentation.

>@@ -2057,16 +2070,18 @@ XMLHttpRequestMainThread::OnStopRequest(

Worth adding a comment here about how we have to snapshot the charset information before we possibly null out mResponseXML in XML parsing error cases.

>+++ b/dom/xhr/XMLHttpRequestMainThread.h

Would it make sense to add a JSContext argument to the RequestBody overload of Send() and pass one in from all the callsites, then have just one |mInWebExtension = InWebExtension(aCx)| thing?

Why are the failures in testing/web-platform/tests/XMLHttpRequest/responsexml-non-well-formed.htm back with this patch?

I'm not seeing a test for the "get responseText during load" thing....
Attachment #8776761 - Flags: review?(bzbarsky)
Sure, I'll add some comments to nsIDocument and OnStopRequest.


>Would it make sense to add a JSContext argument to the RequestBody overload of Send() and pass one in from all the callsites, then have just one |mInWebExtension = InWebExtension(aCx)| thing?

Yes, I think so. Either way's fine by me, as long as the overall approach looks fine.


>Why are the failures in testing/web-platform/tests/XMLHttpRequest/responsexml-non-well-formed.htm back with this patch?

That's just because I forgot to include the change that marks the tests as passing now.


>I'm not seeing a test for the "get responseText during load" thing....

Right, I'll try to fashion one ASAP.
Bill, is "isWebExtensionContentScript" the right thing to test to detect web extensions code?  Or is there web extension code that is not running in those sandboxes but can do XHR?
Flags: needinfo?(wmccloskey)
This is definitely going to need an intent to ship, by the way...
Keywords: dev-doc-needed
Alright, I just sent one. Hopefully it'll appear on the list soon (I subscribed to the list days ago).
(In reply to Boris Zbarsky [:bz] from comment #79)
> Bill, is "isWebExtensionContentScript" the right thing to test to detect web
> extensions code?  Or is there web extension code that is not running in
> those sandboxes but can do XHR?

No, it's not sufficient. Most web extension code (all of which can do XHRs) runs in the context of a normal web page, not a sandbox. Only content scripts run in a Sandbox.

WebExtension XHRs don't use system principals, in case that helps here.
Flags: needinfo?(wmccloskey)
> Most web extension code (all of which can do XHRs) runs in the context of a normal web page

That should be fine.  We want webextensions to match web pages, so we want to change the behavior of both.

So webextensions aren't running someplace where they get xrays?

I guess the real question is whether IsSystemXHR() would ever be true for a webextension.  I guess sounds like it wouldn't?
(In reply to Boris Zbarsky [:bz] from comment #83)
> > Most web extension code (all of which can do XHRs) runs in the context of a normal web page
> 
> That should be fine.  We want webextensions to match web pages, so we want
> to change the behavior of both.
> 
> So webextensions aren't running someplace where they get xrays?

Content scripts do get Xrays. These are the ones that run in sandboxes (and where that flag is true).

> I guess the real question is whether IsSystemXHR() would ever be true for a
> webextension.  I guess sounds like it wouldn't?

It shouldn't ever be true. If it is, that's a bug.
Aha.  Sounds like we don't need to do the webextension check at all, then.
And ideally add tests for xhr in a webextension, both in and not in a content script.
Yeah, after giving it a test, it seems like both content and background scripts aren't considered SystemXHRs after all, so I'll just try to find out how to add test-cases for both cases, as well as the other test and any comments/documentation.
Alright, I think I've managed to figure out how to test the various things that needed testing, so here's a fresh version of the patch for review.

Try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3edc7fb72bf

Since :bz's gone for a while, and the biggest change here is the web-extension tests, would you mind giving those parts a review, :billm? (I'm not sure who should take a look at them, nor how any reviews should be split at this point).

I'm also not sure how to "correctly" silence ESLint's complaints about "chrome" being undefined in the webextension files I added to the tree. Any suggestions besides just adding "eslint-disable-next-line no-undef" (or equivalent) comments?
Attachment #8776761 - Attachment is obsolete: true
Attachment #8780995 - Flags: review?(wmccloskey)
Comment on attachment 8780995 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs-from-web-content.diff

Review of attachment 8780995 [details] [diff] [review]:
-----------------------------------------------------------------

This probably works, but we have a much nicer way of testing WebExtensions. See here for an example:
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/test/mochitest/test_ext_contentscript_context.html
(There are lots more in the same directory.)

This works better because:
- We don't have to check in XPIs.
- You can put the entire test in one file.
- You can use the browser.test API to communicate between the extension and the test harness.

Also, please use browser.foo instead of chrome.foo.

Sorry to make you re-do this, but I think it will be a lot nicer this way!
Attachment #8780995 - Flags: review?(wmccloskey) → review-
No problem, I wish I'd seen those tests before I wrote the previous version :)

Here's a new version of the patch that models the test after the nicer versions. Note that I had to split the system XHR part of the test out into another mochitest location, where XHRs are run as system XHRs.

---

Also, bz, I just realized that my test for the "match text charset to document's charset" corner case doesn't test what it's supposed to test, so I tried again. Unfortunately, I can't seem to find a way to actually hit that corner case.

For it to happen, the user would have to call GetResponseText before we've received enough of the document to sniff out its charset, since calling xhr.overrideMimeType is not allowed after the document starts loading.

We also don't parse HTML documents unless the responseType is "document", and reading the responseText isn't permitted in that case. (That means I can't just add in a <meta charset> which doesn't end up in the first chunk of data).

The only other way I can think to test this is to send just the first byte of a BOM, and then the rest after the user calls GetResponseText. But when I do this the XML always comes out as UTF-8, and I'm not sure if it's a bug in the XML parser or intentional behavior.

If it's intentional then how else are we supposed to hit this edge case? (Hopefully I'm missing something obvious?)
Attachment #8780995 - Attachment is obsolete: true
Attachment #8782058 - Flags: review?(wmccloskey)
Hmm. XML encoding declarations seem to be completely ignored, so that's a no-go for the corner case as well.
Target Milestone: mozilla1.9beta1 → ---
Comment on attachment 8782058 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs-from-web-content.diff

Review of attachment 8782058 [details] [diff] [review]:
-----------------------------------------------------------------

r+ for the webextension test.
Attachment #8782058 - Flags: review?(wmccloskey) → review+
Thomas, which corner case are you referring to in comment 90?  Do you mean the "No, this is wrong" bit from my review from comment 58?

If so, all you need to do is get responseText once the document charset is known but before OnStartRequest happens.  There should be no need for overrideMimeType in that situation: just an XML document with a "weird" charset (pretty much anything that's not "UTF-8", afaict) declared in the <?xml ...?> bit at the beginning.  Oh, and no charset stuff coming from HTTP headers (easy enough), and some non-ASCII bytes in that document so you can tell apart it being treated as UTF-8 and the specified charset.  Then you need to make sure we can get the responseText before OnStopRequest (e.g. get it off a progress event that we fire before OnStopRequest) and see whether the non-ASCII things are correct.

If you meant some other corner case, please let me know which one?
Flags: needinfo?(wisniewskit)
No, that was indeed the corner case I was referring to. Thanks for the insight, I was certainly over-thinking things.

Here's a new patch that should do the trick, assuming the setTimeout approach used is unlikely to cause intermittent failures. Note that :billm already reviewed the new web extensions test in this version (in case you missed those comments).
Attachment #8782058 - Attachment is obsolete: true
Flags: needinfo?(wisniewskit)
Attachment #8783764 - Flags: review?(bzbarsky)
Comment on attachment 8783764 [details] [diff] [review]
289714-null-responseXML-for-invalid-xml-xhrs-from-web-content.diff

The 1s delay thing in bug289714.sjs seems racy... If we just lose the timeslice for 1s, I see nothing preventing the two pieces being coalesced into a single packet.

I recently saw some discussion going by about guaranteeing progress events, or at least multiple OnDataAvailable calls, by just ensuring the data is large enough (over 32KB?).  Can we do that here?

I guess the problem then would be ensuring that you know what to compare the text to, but you could do a substring match instead of equality match, right?

r=me with this addressed
Attachment #8783764 - Flags: review?(bzbarsky) → review+
Sure, here's a version that uses the same "32k" approach from bug 918719, and so should be less prone to races. Carrying over r+, as I didn't change anything except what you suggested.
Attachment #8783764 - Attachment is obsolete: true
And here's a fresh copy of the other patch.

Try still seems fine with both patches applied: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec4b5bdf1f50

Carrying over r+ and requesting check-in of both patches.
Attachment #8772957 - Attachment is obsolete: true
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9ccb7a95588
Do not write <parsererror> nodes for invalid XML documents in XMLHttpRequests made by web content. r=billm, r=bz
https://hg.mozilla.org/integration/mozilla-inbound/rev/3aeb398df5c6
Make console XML Parsing notices more informative. r=jst
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c9ccb7a95588
https://hg.mozilla.org/mozilla-central/rev/3aeb398df5c6
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Thomas, is there any technical limitation preventing from using a different string ID in the .properties file, to reflect the updated content?

If we want to make sure all localizations are updated, we'd need a new ID
https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_content_best_practices#Changing_existing_strings
Flags: needinfo?(wisniewskit)
Hmm, I'm not entirely sure what the best course of action is here. hallvors suggested this change in comment 38 to simply tweak that specific text, and I'm not sure if other localizations need to be similarly tweaked.

bz, what do you think is the best course of action here? Some obvious options:
- leave things as they are now (having changed the wording for just the default localization and hoping the others are fine)
- changing nsExpatDriver::HandleError to remap expat's internal error code 3 to a new l10n string id.
- just revert the wording-change and leaving the text as it was.
Flags: needinfo?(wisniewskit) → needinfo?(bzbarsky)
I don't have a strong opinion, really.
Flags: needinfo?(bzbarsky)
Fair enough. I'm veering toward just leaving things as they are, since other localizations won't be any worse than they were, even if their text could be updated (the change in the message isn't *that* substantial). Is that alright with you, flod?
Flags: needinfo?(francesco.lodolo)
OK. I'll make sure to send a note about this string when the Aurora cycle starts, at least we should get a few more locales up to date.
Flags: needinfo?(francesco.lodolo)
Keywords: site-compat
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: