Closed Bug 658949 Opened 13 years ago Closed 13 years ago

data URL with hash - content doesn't match location.

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla6
Tracking Status
firefox6 - ---

People

(Reporter: john.p.baker, Assigned: dholbert)

References

()

Details

(Keywords: addon-compat, regression, Whiteboard: bz nominated near comment 3)

Attachments

(3 files, 3 obsolete files)

1. Go to data:text/html,<h1 id="a">AA</h1>#a
2. Content shows |AA #a|
3. Location bar shows |data:text/html,<h1 id="a">AA</h1>|

Mozilla/5.0 (Windows NT 5.0; rv:6.0a1) Gecko/20110522 Firefox/6.0a1
This also happens when I use character reference (e.g., &#x41;) in data URL.
Per RFC 3986, URIs can contain '#' characters even if they start with scheme.
> URI-reference = URI / relative-ref
> URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
> relative-ref  = relative-part [ "?" query ] [ "#" fragment ]
RFC 2397 should be considered as an explanation about the absolute-URI (not URI) syntax for the data scheme per RFC 3986 4.3:
>      absolute-URI  = scheme ":" hier-part [ "?" query ]
>
>   URI scheme specifications must define their own syntax so that all
>   strings matching their scheme-specific syntax will also match the
>   <absolute-URI> grammar.  Scheme specifications will not define
>   fragment identifier syntax or usage, regardless of its applicability
>   to resources identifiable via that scheme, as fragment identification
>   is orthogonal to scheme definition.

So,
> if (aBaseURI && !spec.IsEmpty() && spec[0] == '#') {
this test should be:
> if (aBaseURI && spec.Find('#') != kNotFound) {
And the following clause should be rewritten in response to this change.
No, that test is fine.  But we do need to change the "absolute data uri" parser to stop at '#' and toss the rest into the ref.  Daniel, do you want to do that?

That said, why the heck is the location bar not showing the '#a' part?  That's really weird...
Yeah, I can look into this.
Component: General → Networking
OS: Windows XP → All
QA Contact: general → networking
Hardware: x86 → All
(side note, not really related to fixing this: if I paste the data URL into my location bar, hit enter, paste it again, hit enter again, THEN the "#a" shows up at the end of the URLbar.  That's probably just because we hit a newURL.equalsExceptRef(currentURL) special-case somewhere.)
Attached patch wip fix (obsolete) — Splinter Review
Here's a WIP fix, basically doing what bz suggested in comment 3. It fixes this part:
> 2. Content shows |AA #a|
but not this part:
> 3. Location bar shows |data:text/html,<h1 id="a">AA</h1>|


(also, interstingly, *after* you've got '#a' showing up in comment 5 -- if you switch tabs away & back, the #a disappears)
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
As noted with an XXXdholbert comment in the wip fix -- I'm not immediately sure what we want to do with the now-parsed-out #ref in nsDataChannel::OpenContentStream().
> but not this part:

Any idea why?  That's really odd...
Figured it out -- nsSimpleURI::Clone fails to copy mIsRefValid right now.  So, clones think they don't have a .ref.  New patch coming soon with that fixed.

(In reply to comment #1)
> This also happens when I use character reference (e.g., &#x41;) in data URL.

@masayuki:  That's a different issue, I think.  If you need to include a "#" character in a data URL, you'll need to encode it as "%23" (as is the case in any normal URL).  If that doesn't work, please file a bug on that.
Attached patch patch 1: fix (obsolete) — Splinter Review
Here's the fix. Overview of what it does:
 - Teaches nsDataHandler::ParseURI about #ref
 - Transfers mIsRefValid flag in nsSimpleURI::Clone (per prev comment)
 - Maintains (and asserts) the invariant that whenever we clear mIsRefValid, we also clear mRef.  (Not strictly necessary, but good for sanity & for memory savings)
Attached patch patch 2: tests (obsolete) — Splinter Review
This tests patch includes the reftest from before (making sure that a data URI with #ref appended doesn't render any differently), along with an addition to xpcshell test test_URIs.js for the clone() issue from comment 9.

I verified that both new tests fail with no patch & pass with the patch.

(In the interest of thoroughness, I also extended the testcases in test_URIs.js a few additional URIs that I encountered while debugging this.)
Attachment #534530 - Attachment is obsolete: true
Attachment #534597 - Flags: review?(bzbarsky)
Comment on attachment 534594 [details] [diff] [review]
patch 1: fix

Note that the fix still has this in nsDataChannel::OpenContentStream:
>      rv = nsDataHandler::ParseURI(spec, contentType, contentCharset,
> -                                 lBase64, dataBuffer);
> +                                 lBase64, dataBuffer, hashRef);
> +
> +    // XXXdholbert What should we do with 'hashRef' here?

IIUC, this is the point at which we actually decode the data portion of the data URI.  In a "real" URL, we'd want to treat hashRef as a "seek-to-this-point-in-the-document" indicator.  So if we ignore it here, that presumably won't work. (but I don't think it did before, and I don't think much else would break...?)  So perhaps this part can be handled in a follow (if there's anything that even needs handling?)
Attachment #534594 - Flags: review?(bzbarsky)
(In reply to comment #12)
> So perhaps this part can be handled in a
> follow (if there's anything that even needs handling?)

er s/follow/followup/
BTW, TryServer says the patches for this bug + bug 659177 currently fail some mochitest-oth tests, including test_moz_document_rules.html[1] -- I'll start investigating those failures after my local rebuild finishes.

(If bz gets to this review before I figure those out[2], I can post the test-failures-fix as a separate patch, if that helps.)

[1] http://tbpl.mozilla.org/?tree=Try&rev=c6c035ce5552
http://tinderbox.mozilla.org/showlog.cgi?log=Try/1306194559.1306198023.19416.gz

[2] (I'll be AFK for a chunk of tonight at my brother's musical performance; should be back around 11pm PDT.  Will do my best to figure out the tryserver orange before that starts...)
Comment on attachment 534594 [details] [diff] [review]
patch 1: fix

>+    // XXXdholbert What should we do with 'hashRef' here?

Absolutely nothing.  The scroll-to-ref code just asks the URI for the ref directly elsewhere.

Which means I don't think we need to return the hashRef at all here. Just make sure to not include it in dataBuffer.

Note that I don't think you need the separate SetRef call in NewURI, since SetSpec should do the right thing, I would think.

r=me with those changes.
Attachment #534594 - Flags: review?(bzbarsky) → review+
test_moz_document_rules does this:

49         var rule = "@-moz-document " + urltests +
50                    " { #display { z-index: " + zIndex + " } }";
51         var sheeturi = "data:text/css," + encodeURI(rule);

and JS encodeURI does not escape '#'.  So this test just needs to be fixed to do that.

Similar issues in browser/base/content/test/browser_bug517902.js and browser/base/content/test/tabview/browser_tabview_bug594958.js
Comment on attachment 534597 [details] [diff] [review]
patch 2: tests

Don't you want about:blank rather than moz-safe-about:blank here?  I think you do.

r=me with that, assuming it still tests what you want to test.  If not, why not?
Attachment #534597 - Flags: review?(bzbarsky) → review+
(In reply to comment #17)
> Don't you want about:blank rather than moz-safe-about:blank here?  I think
> you do.

I already have about:foo in that test (to cover all "about:xxx" type URLs), so about:blank is basically already tested there.

I added "moz-safe-about:blank" because "moz-safe-about:" was a new type of scheme that I hadn't seen before, until today.

> and JS encodeURI does not escape '#'.  So this test just needs to be fixed to do that.

Ah! Great, thanks.  (particularly because I didn't end up getting very far since comment 14 - sadly my home machine has gotten really slow at doing rebuilds for some reason...  Need to look into that)

I'll be sure to fix those tests up before landing.
moz-safe-about is used as the inner URI scheme for safe about: URIs like "about:blank" and a few others.  It's NOT used for random unknown about:foobar URIs.  I'm actually sort of surprised newURI succeeds for "about:foobar"...

Point being that testing "about:blank" should test whatever you wanted tested there.
Landed, starting with a "patch 0" that just fixes up some existing tests (s/#/%23/ in data URIs) to fix the try failures from comment 14 (inferring rubber-stamp=bz on that test-only patch, based on comment 16.)

patch 0:  http://hg.mozilla.org/mozilla-central/rev/7e8a5585577b
patch 1:  http://hg.mozilla.org/mozilla-central/rev/d632d15ccb11
patch 2:  http://hg.mozilla.org/mozilla-central/rev/89f5c8191f3b
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla6
Backed out due to orange on Windows Reftests.
See:
http://tbpl.mozilla.org/?tree=Firefox&rev=89f5c8191f3b
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(see bug 623423 comment 6 regarding the orange)
Figured out the orange. We need one more s/#/%23/-within-data-URI fix:
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/reftest.xul?rev=04e11820280c&mark=37-41#37
>37 <?xml-stylesheet type="text/css" href="data:text/css,
>38 
>39 #_box_windowsDefaultTheme:-moz-system-metric(windows-default-theme) {
>40   display: none;
>41 }
Here's the same version of patch 0 that I pushed before, but now with one final s/#/%23/ in the reftest.xul chunk quoted above.
(and here's the ready-to-land version of patch 1 that I landed before -- posting it here for convenience w/ relanding, since volkmar has graciously offered to re-land the patches here)
Attachment #534594 - Attachment is obsolete: true
and here's patch 2, ready for checkin (identical to what landed before)
Attachment #534597 - Attachment is obsolete: true
Attachment #534759 - Attachment description: patch 0: fix existing tests (and now also reftest.xul) → patch 0: fix existing tests (and now also reftest.xul) (for checkin)
Pushed:
http://hg.mozilla.org/mozilla-central/rev/41371c9fe361
http://hg.mozilla.org/mozilla-central/rev/c4d58bb8f5ad
http://hg.mozilla.org/mozilla-central/rev/f5c062e99196
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Blocks: 659466
Depends on: 659650
Whiteboard: bz nominated near comment 3
not going to track this but if you think it's safe, please nominate the patch with a risk analysis.
Comment on attachment 534761 [details] [diff] [review]
patch 1: fix (for checkin)

Requesting approval to land on aurora.

Reward:
 (1) Fixes regression from bug 308590, caused by forgetting to set a flag in one case.
 (2) Makes our data-URI-parsing code consistent about handling "#ref" identifiers on full data URIs.
 (3) Includes tests.

Risk:
 - Low, though note that this _intentionally_ breaks some existing content.  In particular -- in data URIs that contain a '#' character, everything after the "#" will now be considered a reference, rather than part of the document.  Bug 308590 was already expected to break such content, but it didn't until this bug here was fixed, because of (2) above.

(Such content is easy to fix with s/#/%23/, which I've done for all cases in mozilla-central, in patch 0 here and in bug 659466 .)
Attachment #534761 - Flags: approval-mozilla-aurora?
Comment on attachment 534761 [details] [diff] [review]
patch 1: fix (for checkin)

(oops, I forgot that this is already in aurora, from landing before the branch point.  Canceling approval request.)
Attachment #534761 - Flags: approval-mozilla-aurora?
Mozilla/5.0 (Windows NT 5.1; rv:6.0) Gecko/20100101 Firefox/6.0

Verified issue using FF 6.0b3 on Windows XP, Windows 7, Ubuntu 11.04 and Mac OS X 10.6 ('#a' part is properly shown in the location bar).

Setting resolution to VERIFIED FIXED.
Status: RESOLVED → VERIFIED
Depends on: 1312700
Blocks: 1367965
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: