Closed Bug 494092 Opened 15 years ago Closed 10 years ago

location bar should use search for lat/long or expressions that contain a decimal, instead of "server not found"

Categories

(Firefox :: Address Bar, defect)

defect
Not set
normal
Points:
3

Tracking

()

VERIFIED FIXED
Firefox 35
Iteration:
35.1

People

(Reporter: dietrich, Assigned: alexbardas)

References

Details

Attachments

(1 file, 7 obsolete files)

type 47.615082,-122.322830 and enter

get "server not found"

should instead use "i feel lucky" search.
Still seems present in current 4.0 betas.
Summary: location bar should use search for lat/long, instead of "server not found" → location bar should use search for lat/long or expressions that contain a decimal, instead of "server not found"
Potential approach for solving this problem:
<Gijs> (1) anything where asciiHost == host and the host includes none of [a-z] -> (2)
<Gijs> (2) anything with fewer than 3 dots and only numbers -> do a search but do a the async lookup thing

<Gijs> jaws: IDN - if I search for "café" right now, that'll use the current "no dots -> search *now*" path
<Gijs> jaws: but e.g. café.local won't, because it has a dot
<Gijs> jaws: the question I had was whether there was something we should do about this.
<Gijs> jaws: the flip side of that is that perhaps it doesn't make sense to treat IDN differently from plain ascii "cafe.local"
<Gijs> or even "mozilla.org" ;)
Gavin, I'd like us to round off the work from bug 693808 by taking this bug as well for the upcoming iteration, does that sound right to you?
Points: --- → 3
QA Whiteboard: [qa+]
Flags: needinfo?(gavin.sharp)
Flags: in-testsuite?
Flags: firefox-backlog+
OS: Mac OS X → All
Hardware: x86 → All
Could we realistically uplift this to 33 for it to ride the trains with bug 693808 and other improvements?

This particular fix on its own doesn't seem that high-value, and this stuff is somewhat regression-prone, so I'm kind of on the fence about digging into this deeper now.
Flags: needinfo?(gavin.sharp)
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #5)
> Could we realistically uplift this to 33 for it to ride the trains with bug
> 693808 and other improvements?
> 
> This particular fix on its own doesn't seem that high-value, and this stuff
> is somewhat regression-prone, so I'm kind of on the fence about digging into
> this deeper now.

This basically affects anything of the form 1234.42 and some math expressions (42.3/7), and was one of two called out by jaws' blogpost, which is linked from the release notes. These expressions also tend to result in errors rather than just slowness, so I think that while not super high value, there is significant value in fixing this for 33.

I also think that now that we have narrowed down how we aim to fix it ("do the right thing" is sufficiently difficult to narrow down for this stuff that that was a "thing"), the actual fix should be particularly complicated.

It's true that this is somewhat regression prone, but I am hopeful that the tests that we're adding with these bugs make it less so. I also think that we are bound to catch regressions pretty quickly, considering how heavily the location bar is used.
(In reply to :Gijs Kruitbosch (Bugmail catchup, needinfo if urgent) from comment #6)
> I also think that now that we have narrowed down how we aim to fix it ("do
> the right thing" is sufficiently difficult to narrow down for this stuff
> that that was a "thing"), the actual fix should be particularly complicated.

err, *shouldn't*. Sigh.
I can work on this bug. Are there more implementation details you can provide?
(In reply to Alex Bardas :alexbardas from comment #8)
> I can work on this bug. Are there more implementation details you can
> provide?

The proposed algorithm is in comment #3. The code lives in http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDefaultURIFixup.cpp , with a hopefully understandably-structured xpcshell test in http://mxr.mozilla.org/mozilla-central/source/docshell/test/unit/test_nsDefaultURIFixup_info.js , where it should be reasonably easy to add input such as that from comment #0 and comment #6, and update the expected output.

I expect that you'll need to either add to or replace this:

http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDefaultURIFixup.cpp?mark=927-942#903

to check if the input before the slash contains anything in the [a-zA-Z] range, and to count the number of dots involved. You can then add another "else if" clause to the if statements below it to search for it, if it satisfies the conditions outlined in comment #3.

Note that because this is docshell code, I can't review it (though I'm happy to give you feedback on it) and you should ask someone on this list instead: https://wiki.mozilla.org/Modules/Core#docshell - realistically, probably :smaug (Olli) as he looked at this code recently and unlike bz isn't on PTO (I think?). :-)
Assignee: nobody → abardas
Status: NEW → ASSIGNED
I've made the improvements and added the tests. I actually had to take into account that there may be at least 2 "-" signs, 1 "," and 2 "." and only numbers. 

I've also added some tests for those cases & it seems to work ok.
Attachment #8471871 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8471871 [details] [diff] [review]
WIP Location bar should search for lat long expressions

Review of attachment 8471871 [details] [diff] [review]:
-----------------------------------------------------------------

OK, so I think my comment was not clear, sorry.

While the summary of this bug is about lat/long expressions, I don't think we should restrict ourselves to those. There are many math expressions that currently lead to broken host lookups instead of search queries, because of the existing logic here.

I think that when implementing this, we should go further, because searching for an actual lat/long expression is just as expensive as doing something simpler, namely: everything without any alphabetical characters in its ascii hostname, and which isn't an IDN hostname (ie host === asciiHost on the nsIURI interface for it) should by default lead to a search query instead of a DNS lookup (although the browser will do the latter to show you a notification that you might want to go to http://userinput/ anyway).

I'm aware that if you had a localhost that was aliased to "1234345364564562" or "1232453.1543534" or "123-1234.34534" then this would break your workflow the first time. I'm OK with that - we'll show a prompt anyway, if the DNS lookup succeeds. But the majority case for all of the above for 99.99% of our users is a search query, not a URL. So I think that's what we should go with.

::: docshell/base/nsDefaultURIFixup.cpp
@@ +993,5 @@
>              aFixupInfo->mFixupUsedKeyword = true;
>          }
>      }
> +    // Check if the input is represented by geographical coordinates:
> +    else if (commaLoc)

So instead of keying this off the commaLoc, I think you could use a loop like this one at the start of the method. In the loop, check for occurrences of [a-zA-Z]. You can probably use nsCRT::IsAsciiAlpha for that. You can store the result in a bool. You can also look for the other chars we're interested in and update the respective uints in that loop, instead of doing repetitive FindChars which will all be doing similar loops.

You should also count the number of dots. For 3 dots with only digits inbetween, it's likely the user meant an IP address, so we can avoid doing a keyword lookup. (you could potentially check if the numbers between the dots and start/end of the string were <= 255, but I wouldn't be super worried about that).

Then this else if can be keyed off hasAsciiAlpha (or some other name; I'm bad at naming things) and whether or not we think this could potentially be an IP(v4) address. In fact, it might make sense to add it as a disjunction clause to the previous condition, because we'll need to check the whitelist, too.

Finally... we should be careful not to regress ipv6 input here (I'm not sure valid ipv6 gets this far; if it does it might be caught by the colon finding code, but either way, we should check that input like [::1] continues to work). :-)

::: docshell/test/unit/test_nsDefaultURIFixup_info.js
@@ +63,5 @@
> +  ["[]", null, null, true, true],
> +  ["47.615082,-122.830", "http://47.615082,-122.830/", null, true, true],
> +  ["47.615082,23.51", "http://47.615082,23.51/", null, true, true],
> +  ["-22.14,-23.51", "http://-22.14,-23.51/", null, true, true],
> +  ["-22.14,23.51-", "http://-22.14,23.51-/", null, false, true],

While I realize this isn't a valid lat/long expression, nor really a useful mathematical expression, I still think we should probably do a search query instead of trying to find this host (anything with a terminal '-' is not a valid hostname, if wikipedia's summary of the relevant RFC is to be believed!).
Attachment #8471871 - Flags: review?(gijskruitbosch+bugs)
Attachment #8471871 - Attachment is obsolete: true
Attachment #8472642 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8472642 [details] [diff] [review]
Location bar should search for more types of expressions

Review of attachment 8472642 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! Bundle of comments below; you'll probably want to ask :smaug for review for the next round. :-)

::: docshell/base/nsDefaultURIFixup.cpp
@@ +933,5 @@
> +    aURIString.BeginReading(iterBegin);
> +    aURIString.EndReading(iterEnd);
> +    nsACString::const_iterator iter = iterBegin;
> +
> +    uint32_t dotLoc = uint32_t(kNotFound);

Nit: I think these declarations and assignment of default values should go before the iter declarations, but perhaps your actual reviewer will disagree... :-)

@@ +945,5 @@
> +    uint32_t foundDigits = 0;
> +    bool hasAsciiAlpha = false;
> +
> +    while (iter != iterEnd)
> +    {

A nit on bracing: you changed it (the brace position) for all the if statements... tbh, in this file, I could never find rhyme or reason to what style was used, so I'd prefer to leave things as-is to preserve blame.

Re-reading, I think you've just applied the official style guide, so maybe your actual reviewer will prefer updating my earlier mistakes in this file (and then some)...

@@ +968,5 @@
> +        } else if (nsCRT::IsAsciiAlpha(*iter)) {
> +            hasAsciiAlpha = true;
> +        } else if (nsCRT::IsAsciiDigit(*iter)) {
> +            ++foundDigits;
> +        }

I think that once foundDots >= 2 && hasAsciiAlpha && foundDigits && (all the locs) != kNotFound, we can leave this loop. However, not sure that micro-optimization is worth the effort of writing.

@@ +980,5 @@
> +    nsAutoCString host;
> +
> +    bool isValidAsciiHost = aFixupInfo->mFixedURI &&
> +            NS_SUCCEEDED(aFixupInfo->mFixedURI->GetAsciiHost(asciiHost)) &&
> +            !asciiHost.IsEmpty();

You're checking this for emptiness and not host... which is interesting because I guess host will be empty, too, in that case. However, I don't think we do anything in this method in that case (or if this is even reachable with an empty host), so the difference is probably academic.

@@ +986,5 @@
> +    bool isValidHost = aFixupInfo->mFixedURI &&
> +            NS_SUCCEEDED(aFixupInfo->mFixedURI->GetHost(host));
> +
> +    // If there are 3 dots and only numbers between them, then don't do a keyword lookup
> +    if (foundDots == 3 && (foundDots + foundDigits == pos)) {

Nit: compare to iterEnd here instead, for less confusion for the reader? Or does that not work / throw warnings because of the different type?

@@ +993,4 @@
>      // We do keyword lookups if a space or quote preceded the dot, colon
> +    // or question mark (or if the latter were not found)
> +    // or when the host is the same as asciiHost and there are no characters from [a-z][A-Z]
> +    else if (((spaceLoc < dotLoc || quoteLoc < dotLoc) &&

Nit: no else after return;, please. :-)

@@ +994,5 @@
> +    // or question mark (or if the latter were not found)
> +    // or when the host is the same as asciiHost and there are no characters from [a-z][A-Z]
> +    else if (((spaceLoc < dotLoc || quoteLoc < dotLoc) &&
> +            (spaceLoc < colonLoc || quoteLoc < colonLoc) &&
> +            (spaceLoc < qMarkLoc || quoteLoc < qMarkLoc)) || qMarkLoc == 0 || 

Nit: trailing space

@@ +1006,5 @@
>      }
>      // ... or if there is no question mark or colon, and there is either no
>      // dot, or exactly 1 and it is the first or last character of the input:
>      else if ((dotLoc == uint32_t(kNotFound) ||
> +            (dotLoc == lastDotLoc && (dotLoc == 0 || dotLoc == aURIString.Length() - 1))) &&

Nit: can get rid of lastDotLoc now and just use foundDots instead to check that there is exactly 1 dot. See https://bugzilla.mozilla.org/show_bug.cgi?id=1042519#c8 .

::: docshell/test/unit/test_nsDefaultURIFixup_info.js
@@ +61,5 @@
>    [".test", "http://.test/", "http://www..test/", true, true],
>    ["mozilla is amazing", null, null, true, true],
>    ["", null, null, true, true],
> +  ["[]", null, null, true, true],
> +  ["localhost", "http://localhost/", "http://www.localhost.com/", false, true],

I'm confused. Earlier, we have:

 ["mozilla", "http://mozilla/", "http://www.mozilla.com/", true, true],

which is almost the same except for the penultimate point, but for similar input.

Is this addition just to test the whitelist? If so, may I suggest changing it to "whitelist" (ditto for the pref, of course) and to put it next to the "mozilla" entry to make this more obvious?
Attachment #8472642 - Flags: review?(gijskruitbosch+bugs) → feedback+
Attachment #8472642 - Attachment is obsolete: true
Attachment #8472685 - Flags: review?(bugs)
We really need to get this stuff out of C++ land at some point :/ 

But I'll review tomorrow.
No tests for ipv6 addresses?
Comment on attachment 8472685 [details] [diff] [review]
Location bar should search for more types of expressions

># HG changeset patch
># Parent 018c774d1b2d9e87eda4f2b03328c8ee4210e766
># User Alex Bardas <abardas@mozilla.com>
>Bug 494092 - location bar should use search for lat/long or expressions that contain a decimal, instead of 'server not found' r=smaug
>
>diff --git a/docshell/base/nsDefaultURIFixup.cpp b/docshell/base/nsDefaultURIFixup.cpp
>--- a/docshell/base/nsDefaultURIFixup.cpp
>+++ b/docshell/base/nsDefaultURIFixup.cpp
>@@ -923,76 +923,117 @@ void nsDefaultURIFixup::KeywordURIFixup(
>     // "nonQualifiedHost?args"
>     // "nonQualifiedHost?some args"
>     // "blah.com."
> 
>     // Note: uint32_t(kNotFound) is greater than any actual location
>     // in practice.  So if we cast all locations to uint32_t, then a <
>     // b guarantees that either b is kNotFound and a is found, or both
>     // are found and a found before b.
>-    uint32_t dotLoc   = uint32_t(aURIString.FindChar('.'));
>-    nsAutoCString tmpURIString(aURIString);
>-    uint32_t lastDotLoc = uint32_t(tmpURIString.RFindChar('.'));
>-    uint32_t colonLoc = uint32_t(aURIString.FindChar(':'));
>-    uint32_t spaceLoc = uint32_t(aURIString.FindChar(' '));
>-    if (spaceLoc == 0) {
>-        // Treat this as not found
>-        spaceLoc = uint32_t(kNotFound);
>+    uint32_t dotLoc = uint32_t(kNotFound);
>+    uint32_t colonLoc = uint32_t(kNotFound);
>+    uint32_t spaceLoc = uint32_t(kNotFound);
>+    uint32_t qMarkLoc = uint32_t(kNotFound);
>+    uint32_t quoteLoc = uint32_t(kNotFound);
>+    uint32_t pos = 0;
>+    uint32_t foundDots = 0;
>+    uint32_t foundDigits = 0;
>+    bool hasAsciiAlpha = false;
>+
>+    nsACString::const_iterator iterBegin;
>+    nsACString::const_iterator iterEnd;
>+    aURIString.BeginReading(iterBegin);
>+    aURIString.EndReading(iterEnd);
>+    nsACString::const_iterator iter = iterBegin;
>+
>+    while (iter != iterEnd)
>+    {
{ to the previous line

>+        if (*iter == '.') {
>+            ++foundDots;
>+            if (dotLoc == uint32_t(kNotFound)) {
>+                dotLoc = pos;
>+            }
>+        }
>+        else if (*iter == ':' && colonLoc == uint32_t(kNotFound)) {
>+            colonLoc = pos;
>+        }
>+        else if (*iter == ' ' && spaceLoc == uint32_t(kNotFound)) {
>+            spaceLoc = pos;
>+        }
>+        else if (*iter == '?') {
>+            qMarkLoc = pos;
>+        }
The old code assigns qMarkLoc to the location of the first '?'. This code to the last. Why?

>+        else if ((*iter == '\'' || *iter == '"') && quoteLoc == uint32_t(kNotFound)) {
>+            quoteLoc = pos;
>+        } else if (nsCRT::IsAsciiAlpha(*iter)) {
>+            hasAsciiAlpha = true;
>+        } else if (nsCRT::IsAsciiDigit(*iter)) {
>+            ++foundDigits;
>+        }
You oddly use both
} else if (expr) {
and
}
else if (expr) {
coding style here.
} else if (expr) { is the right one


>+
>+    bool isValidAsciiHost = aFixupInfo->mFixedURI &&
>+            NS_SUCCEEDED(aFixupInfo->mFixedURI->GetAsciiHost(asciiHost)) &&
>+            !asciiHost.IsEmpty();
8 spaces indentation? 4 spaces should be enough in this file.

>@@ -48,24 +48,33 @@ let testcases = [
>   ["http://www.mozilla.org", "http://www.mozilla.org/", null, false, false],
>   ["http://127.0.0.1/", "http://127.0.0.1/", null, false, false],
>   ["file:///foo/bar", "file:///foo/bar", null, false, false],
>   ["://www.mozilla.org", "http://www.mozilla.org/", null, false, true],
>   ["www.mozilla.org", "http://www.mozilla.org/", null, false, true],
>   ["http://mozilla/", "http://mozilla/", "http://www.mozilla.com/", false, false],
>   ["http://test./", "http://test./", "http://www.test./", false, false],
>   ["127.0.0.1", "http://127.0.0.1/", null, false, true],
>+  ["192.168.10.110", "http://192.168.10.110/", null, false, true],
>   ["1234", "http://1234/", "http://www.1234.com/", true, true],
>   ["host/foo.txt", "http://host/foo.txt", "http://www.host.com/foo.txt", false, true],
>   ["mozilla", "http://mozilla/", "http://www.mozilla.com/", true, true],
>+  ["whitelist", "http://whitelist/", "http://www.whitelist.com/", false, true],
>   ["test.", "http://test./", "http://www.test./", true, true],
>   [".test", "http://.test/", "http://www..test/", true, true],
>   ["mozilla is amazing", null, null, true, true],
>   ["", null, null, true, true],
>-  ["[]", null, null, true, true]
>+  ["[]", null, null, true, true],
>+  ["café.local", "http://café.local/", "http://www.café.local/", false, true],
>+  ["47.6182,-122.830", "http://47.6182,-122.830/", null, true, true],
>+  ["47.6182,23.51", "http://47.6182,23.51/", null, true, true],
>+  ["-22.14,-23.51", "http://-22.14,-23.51/", null, true, true],
>+  ["-22.14,23.51-", "http://-22.14,23.51-/", null, true, true],
>+  ["32.7", "http://32.7/", "http://www.32.7/", true, true],
>+  ["5+2", "http://5+2/", "http://www.5+2.com/", true, true],
> ];
So we really need more tests here. ipv6 stuff , and string with multiple '?' etc.
Attachment #8472685 - Flags: review?(bugs) → review-
Valid ipv6 doesn't get this far, but I've added a condition + tests to make sure we don't regress ipv6 input here. I'd be happy to improve that condition if there are suggestions for it.
Attachment #8472685 - Attachment is obsolete: true
Attachment #8473825 - Flags: review?(bugs)
Comment on attachment 8473825 [details] [diff] [review]
Location bar should search for more types of expressions

Ipv6 addresses must be inside [], didn't cover this case yet.
Attachment #8473825 - Flags: review?(bugs)
I'd still prefer more '?' tests. Some valid urls with ? and some tests where there are
several ? marks (such would have caught the change to '?' handling in the previous patch).

Maybe test also some crazy input, like ?.:'". 
We want the tests to prevent regressions in the future, so the more cases tests covers, the better.
Comment on attachment 8473919 [details] [diff] [review]
Improved the default uri fixup (taking into consideration ipv4, ipv6 uris) & test base

>+    // If there are at least 2 colons, at most 1 slash after the colons,
>+    // only hexadecimal characters ([a-z][0-9]) and the uri is enclosed in [],
>+    // then don't do a keyword lookup (ipv6)
>+    if (foundColons >= 2 && foundSlashes <= 1 && lSBracketLoc == 0 && rSBracketLoc == pos-1 &&
>+            ((foundSlashes && slashLoc > colonLoc) || !foundSlashes) &&
>+            (foundColons + foundSlashes + foundHexAsciiAlpha + foundDigits == pos - 2)) {
>+        return;
>+    }
I don't understand rSBracketLoc == pos-1. ] isn't necessarily the last thing.
(and btw, there should be space before and after - )
In which case would there be / after colons, but before ] ?
Looks like you test only cases when ] is the last character in the url.
Attachment #8473919 - Flags: review?(bugs) → review-
Iteration: --- → 34.3
QA Whiteboard: [qa+]
Flags: qe-verify+
QA Contact: alexandra.lucinet
I've revised the ipv6 check and added some sanity to it.

I can't test for http://[::1]/ because KeywordURIFixup method isn't called.

If i omitted a use case for it, please let me know in the comments.
Attachment #8473919 - Attachment is obsolete: true
Attachment #8475520 - Flags: review?(bugs)
Similar as the last attachment, I've only improved the readability of the ipv6 condition.
Attachment #8475520 - Attachment is obsolete: true
Attachment #8475520 - Flags: review?(bugs)
Attachment #8475558 - Flags: review?(bugs)
QA Contact: alexandra.lucinet → andrei.vaida
(In reply to Olli Pettay [:smaug] from comment #15)
> We really need to get this stuff out of C++ land at some point :/

I filed bug 1057531 for this.
Looks like the merging problems with bug 1057186 might be not be trivial.
I can rebase this patch on top of the one from bug 1057186.

Did you have time to look on the ipv6 condition? If you can give me some feedback for that, I'll use it in the new patch.
Flags: needinfo?(bugs)
Review is coming when it is coming (sorry, have been busy with reviewing other stuff).
Flags: needinfo?(bugs)
Iteration: 34.3 → 35.1
Comment on attachment 8475558 [details] [diff] [review]
Improved the default uri fixup (taking into consideration ipv4, ipv6 uris) & test base

+    uint32_t colonLoc = uint32_t(kNotFound);
firstColonLoc would be easier to understand

Same with dotLoc, and others which are just for the first occurrence.

+    bool isIpv6 = true;
This name is misleading. You're not doing ipv6 validation, but just check if the
input reminds ipv6

>+    while (iter != iterEnd) {
>+        if (pos >= 1 && rSBracketLoc == uint32_t(kNotFound)) {
>+            if (!(lSBracketLoc == 0 &&
>+                  (*iter == ':' || *iter == '.' || *iter == ']' ||
>+                   (*iter >= 'a' && *iter <= 'f') ||
>+                   (*iter >= 'A' && *iter <= 'F') ||
>+                    nsCRT::IsAsciiDigit(*iter)))) {
Strange indentation. Everything inside if() should be aligned, no need for indentation.
But the expression is really hard to read

Maybe
  if (!(lSBracketLoc == 0 &&
        (*iter == ':' ||
         *iter == '.' ||
         *iter == ']' ||
         (*iter >= 'a' && *iter <= 'f') ||
         (*iter >= 'A' && *iter <= 'F') ||
         nsCRT::IsAsciiDigit(*iter)))) {

>+        } else if (*iter == '[') {
>+            lSBracketLoc = pos;
>+        } else if (*iter == ']') {
>+            rSBracketLoc = pos;
So what if there are several [ or ] characters.
I think we want tests for that too (just to ensure we don't do anything unexpected and that we don't regress behavior later)


>+  ["?'.com", "http:///?%27.com", "http://www..com/?%27.com", true, true],
>+  ["' ?.com", null, null, true, true],
>+  ["???", "http:///???", "http://www..com/???", true, true],
>+  ["?? ??", "http:///??%20??", "http://www..com/??%20??", true, true],
>+  ["?mozilla", "http:///?mozilla", "http://www..com/?mozilla", true, true]
Those ..com cases looks like a bug somewhere, but seems like some older bug.

>       // Check the preferred URI
>       if (couldDoKeywordLookup && expectKeywordLookup) {
>         let urlparamInput = encodeURIComponent(testInput).replace("%20", "+", "g");
>+        // the first `?` from test input will be automatically removed
>+        if (urlparamInput.startsWith("%3F")) {
>+          urlparamInput = urlparamInput.replace("%3F", "");
>+        }
I don't understand this. The comment doesn't explain why we want to remove the first '?'


And please merge this over the other patch.
Attachment #8475558 - Flags: review?(bugs) → review-
Blocks: 1057186
No longer blocks: 1057186
Depends on: 1057186
I've rebased it and tried to take into consideration all suggestions.
Attachment #8475558 - Attachment is obsolete: true
Attachment #8487419 - Flags: review?(bugs)
Comment on attachment 8487419 [details] [diff] [review]
Improved the default uri fixup (taking into consideration ipv4, ipv6 uris) & test base

z>+        } else if (*iter == '[') {
>+            lastLSBracketLoc = pos;
>+        } else if (*iter == ']') {
>+            foundRSBrackets++;
>+        } else if (nsCRT::IsAsciiAlpha(*iter)) {
>+            hasAsciiAlpha = true;
>+        } else if (nsCRT::IsAsciiDigit(*iter)) {
>+            ++foundDigits;
>+        }
>+
>+        pos++;
>+        iter++;
>     }
>-    uint32_t qMarkLoc = uint32_t(aURIString.FindChar('?'));
>-    uint32_t quoteLoc = std::min(uint32_t(aURIString.FindChar('"')),
>-                               uint32_t(aURIString.FindChar('\'')));
>+
>+    if (lastLSBracketLoc > 0 || foundRSBrackets != 1) {
>+        looksLikeIpv6 = false;
>+    }
> 
>     nsresult rv;
>+    nsAutoCString asciiHost;
>+    nsAutoCString host;
>+
>+    bool isValidAsciiHost = aFixupInfo->mFixedURI &&
>+        NS_SUCCEEDED(aFixupInfo->mFixedURI->GetAsciiHost(asciiHost)) &&
>+        !asciiHost.IsEmpty();
>+
>+    bool isValidHost = aFixupInfo->mFixedURI &&
>+        NS_SUCCEEDED(aFixupInfo->mFixedURI->GetHost(host)) &&
>+        !host.IsEmpty();
>+
>+    // If there are 3 dots and only numbers between them, then don't do a
>+    // keyword lookup (ipv4)
>+    if (foundDots == 3 && (foundDots + foundDigits == pos)) {
>+        return;
>+    }
>+
>+    // If there are only colons and only hexadecimal characters ([a-z][0-9])
>+    // enclosed in [], then don't do a keyword lookup
>+    if (looksLikeIpv6) {
>+        return;
>+    }
>+
>     // We do keyword lookups if a space or quote preceded the dot, colon
>-    // or question mark (or if the latter were not found):
>-    if (((spaceLoc < dotLoc || quoteLoc < dotLoc) &&
>-         (spaceLoc < colonLoc || quoteLoc < colonLoc) &&
>-         (spaceLoc < qMarkLoc || quoteLoc < qMarkLoc)) ||
>-        qMarkLoc == 0)
>-    {
>+    // or question mark (or if the latter were not found)
>+    // or when the host is the same as asciiHost and there are no
>+    // characters from [a-z][A-Z]
>+    if (((firstSpaceLoc < firstDotLoc || firstQuoteLoc < firstDotLoc) &&
>+         (firstSpaceLoc < firstColonLoc || firstQuoteLoc < firstColonLoc) &&
>+         (firstSpaceLoc < firstQMarkLoc || firstQuoteLoc < firstQMarkLoc)) || firstQMarkLoc == 0 ||
>+        (isValidAsciiHost && isValidHost && !hasAsciiAlpha &&
>+         host.EqualsIgnoreCase(asciiHost.get()))) {
>+
>         rv = KeywordToURI(aFixupInfo->mOriginalInput, aPostData,
>                           getter_AddRefs(aFixupInfo->mPreferredURI));
>-        if (NS_SUCCEEDED(rv) && aFixupInfo->mPreferredURI)
>-        {
>+        if (NS_SUCCEEDED(rv) && aFixupInfo->mPreferredURI) {
>             aFixupInfo->mFixupUsedKeyword = true;
>         }
>     }
>     // ... or if there is no question mark or colon, and there is either no
>     // dot, or exactly 1 and it is the first or last character of the input:
>-    else if ((dotLoc == uint32_t(kNotFound) ||
>-              (dotLoc == lastDotLoc && (dotLoc == 0 || dotLoc == aURIString.Length() - 1))) &&
>-             colonLoc == uint32_t(kNotFound) && qMarkLoc == uint32_t(kNotFound))
>-    {
>+    else if ((firstDotLoc == uint32_t(kNotFound) ||
>+              (foundDots == 1 && (firstDotLoc == 0 || firstDotLoc == aURIString.Length() - 1))) &&
>+              firstColonLoc == uint32_t(kNotFound) && firstQMarkLoc == uint32_t(kNotFound)) {
> 
>-        nsAutoCString asciiHost;
>-        if (aFixupInfo->mFixedURI &&
>-            NS_SUCCEEDED(aFixupInfo->mFixedURI->GetAsciiHost(asciiHost)) &&
>-            !asciiHost.IsEmpty()) {
>-
>-            if (IsDomainWhitelisted(asciiHost, dotLoc)) {
>-                return;
>-            }
>+        if (isValidAsciiHost && IsDomainWhitelisted(asciiHost, firstDotLoc)) {
>+            return;
>         }
> 
>         // If we get here, we don't have a valid URI, or we did but the
>         // host is not whitelisted, so we do a keyword search *anyway*:
>         rv = KeywordToURI(aFixupInfo->mOriginalInput, aPostData,
>                           getter_AddRefs(aFixupInfo->mPreferredURI));
>-        if (NS_SUCCEEDED(rv) && aFixupInfo->mPreferredURI)
>-        {
>+        if (NS_SUCCEEDED(rv) && aFixupInfo->mPreferredURI) {
>             aFixupInfo->mFixupUsedKeyword = true;
>         }
>     }
> }
> 
> bool nsDefaultURIFixup::IsDomainWhitelisted(const nsAutoCString aAsciiHost,
>                                             const uint32_t aDotLoc)
> {
>diff --git a/docshell/test/unit/test_nsDefaultURIFixup_info.js b/docshell/test/unit/test_nsDefaultURIFixup_info.js
>--- a/docshell/test/unit/test_nsDefaultURIFixup_info.js
>+++ b/docshell/test/unit/test_nsDefaultURIFixup_info.js
>@@ -83,16 +83,55 @@ let testcases = [ {
>     input: "http://test./",
>     fixedURI: "http://test./",
>     alternateURI: "http://www.test./",
>   }, {
>     input: "127.0.0.1",
>     fixedURI: "http://127.0.0.1/",
>     protocolChange: true,
>   }, {
>+    input: "192.168.10.110",
>+    fixedURI: "http://192.168.10.110/",
>+    protocolChange: true,
>+  }, {
>+    input: "[::1]",
>+    fixedURI: "http://[::1]/",
>+    alternateURI: "http://[::1]/",
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "[fe80:cd00:0:cde:1257:0:211e:729c]",
>+    fixedURI: "http://[fe80:cd00:0:cde:1257:0:211e:729c]/",
>+    alternateURI: "http://[fe80:cd00:0:cde:1257:0:211e:729c]/",
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "[64:ff9b::8.8.8.8]",
>+    fixedURI: "http://[64:ff9b::8.8.8.8]/",
>+    protocolChange: true
>+  }, {
>+    input: "[64:ff9b::8.8.8.8]/~moz",
>+    fixedURI: "http://[64:ff9b::8.8.8.8]/~moz",
>+    protocolChange: true
>+  }, {
>+    input: "[::1][::1]",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "[::1][100",
>+    fixedURI: "http://[::1][100/",
>+    alternateURI: "http://[::1][100/",
>+    keywordLookup: true,
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "[::1]]",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>     input: "1234",
>     fixedURI: "http://1234/",
>     alternateURI: "http://www.1234.com/",
>     keywordLookup: true,
>     protocolChange: true,
>     affectedByWhitelist: true,
>   }, {
>     input: "host/foo.txt",
>@@ -120,16 +159,32 @@ let testcases = [ {
>     alternateURI: "http://www..test/",
>     keywordLookup: true,
>     protocolChange: true,
>   }, {
>     input: "mozilla is amazing",
>     keywordLookup: true,
>     protocolChange: true,
>   }, {
>+    input: "search ?mozilla",
>+    keywordLookup: true,
>+    protocolChange: true,
>+  }, {
>+    input: "mozilla .com",
>+    keywordLookup: true,
>+    protocolChange: true,
>+  }, {
>+    input: "what if firefox?",
>+    keywordLookup: true,
>+    protocolChange: true,
>+  }, {
>+    input: "london's map",
>+    keywordLookup: true,
>+    protocolChange: true,
>+  }, {
>     input: "mozilla ",
>     fixedURI: "http://mozilla/",
>     alternateURI: "http://www.mozilla.com/",
>     keywordLookup: true,
>     protocolChange: true,
>     affectedByWhitelist: true,
>   }, {
>     input: "   mozilla  ",
>@@ -185,16 +240,122 @@ let testcases = [ {
>     keywordLookup: true,
>     protocolChange: true,
>   }, {
>     input: "http://whitelisted/",
>     fixedURI: "http://whitelisted/",
>     alternateURI: "http://www.whitelisted.com/",
>     affectedByWhitelist: true,
>     inWhitelist: true,
>+  }, {
>+    input: "café.local",
>+    fixedURI: "http://café.local/",
>+    alternateURI: "http://www.café.local/",
>+    protocolChange: true
>+  }, {
>+    input: "47.6182,-122.830",
>+    fixedURI: "http://47.6182,-122.830/",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "-47.6182,-23.51",
>+    fixedURI: "http://-47.6182,-23.51/",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "-22.14,23.51-",
>+    fixedURI: "http://-22.14,23.51-/",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "32.7",
>+    fixedURI: "http://32.7/",
>+    alternateURI: "http://www.32.7/",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "5+2",
>+    fixedURI: "http://5+2/",
>+    alternateURI: "http://www.5+2.com/",
>+    keywordLookup: true,
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "moz ?.::%27",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "mozilla.com/?q=search",
>+    fixedURI: "http://mozilla.com/?q=search",
>+    alternateURI: "http://www.mozilla.com/?q=search",
>+    protocolChange: true
>+  }, {
>+    input: "mozilla.com?q=search",
>+    fixedURI: "http://mozilla.com/?q=search",
>+    alternateURI: "http://www.mozilla.com/?q=search",
>+    protocolChange: true
>+  }, {
>+    input: "mozilla.com ?q=search",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "mozilla.com.?q=search",
>+    fixedURI: "http://mozilla.com./?q=search",
>+    protocolChange: true
>+  }, {
>+    input: "mozilla.com'?q=search",
>+    fixedURI: "http://mozilla.com'/?q=search",
>+    alternateURI: "http://www.mozilla.com'/?q=search",
>+    protocolChange: true
>+  }, {
>+    input: "mozilla.com':search",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "[mozilla]",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "':?",
>+    fixedURI: "http://'/?",
>+    alternateURI: "http://www.'.com/?",
>+    keywordLookup: true,
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "a?.com",
>+    fixedURI: "http://a/?.com",
>+    alternateURI: "http://www.a.com/?.com",
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "?'.com",
>+    fixedURI: "http:///?%27.com",
>+    alternateURI: "http://www..com/?%27.com",
>+    keywordLookup: true,
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "' ?.com",
>+    keywordLookup: true,
>+    protocolChange: true
>+  }, {
>+    input: "?mozilla",
>+    fixedURI: "http:///?mozilla",
>+    alternateURI: "http://www..com/?mozilla",
>+    keywordLookup: true,
>+    protocolChange: true,
>+    affectedByWhitelist: true
>+  }, {
>+    input: "??mozilla",
>+    fixedURI: "http:///??mozilla",
>+    alternateURI: "http://www..com/??mozilla",
>+    keywordLookup: true,
>+    protocolChange: true,
>+    affectedByWhitelist: true
>   }];
> 
> if (Services.appinfo.OS.toLowerCase().startsWith("win")) {
>   testcases.push({
>     input: "C:\\some\\file.txt",
>     fixedURI: "file:///C:/some/file.txt",
>     protocolChange: true,
>   });
>@@ -292,24 +453,30 @@ function run_test() {
> 
>       // Check booleans on input:
>       let couldDoKeywordLookup = flags & urifixup.FIXUP_FLAG_ALLOW_KEYWORD_LOOKUP;
>       do_check_eq(info.fixupUsedKeyword, couldDoKeywordLookup && expectKeywordLookup);
>       do_check_eq(info.fixupChangedProtocol, expectProtocolChange);
>       do_check_eq(info.fixupCreatedAlternateURI, makeAlternativeURI && alternativeURI != null);
> 
>       // Check the preferred URI
>-      let requiresWhitelistedDomain = flags & urifixup.FIXUP_FLAG_REQUIRE_WHITELISTED_HOST
>+      let requiresWhitelistedDomain = flags & urifixup.FIXUP_FLAG_REQUIRE_WHITELISTED_HOST;
>       if (couldDoKeywordLookup) {
>         if (expectKeywordLookup) {
>           if (!affectedByWhitelist || (affectedByWhitelist && !inWhitelist)) {
>-        let urlparamInput = encodeURIComponent(sanitize(testInput)).replace("%20", "+", "g");
>-        let searchURL = kSearchEngineURL.replace("{searchTerms}", urlparamInput);
>-        do_check_eq(info.preferredURI.spec, searchURL);
>-      } else {
>+            let urlparamInput = encodeURIComponent(sanitize(testInput)).replace("%20", "+", "g");
>+            // If the input starts with `?`, then info.preferredURI.spec will omit it
>+            // In order to test this behaviour, remove `?` only if it is the first character
>+            if (urlparamInput.startsWith("%3F")) {
>+              urlparamInput = urlparamInput.replace("%3F", "");
>+            }
>+            let searchURL = kSearchEngineURL.replace("{searchTerms}", urlparamInput);
>+            let spec = info.preferredURI.spec.replace("%27", "'", "g");
>+            do_check_eq(spec, searchURL);
>+          } else {
>             do_check_eq(info.preferredURI, null);
>           }
>         } else {
>           do_check_eq(info.preferredURI.spec, info.fixedURI.spec);
>         }
>       } else if (requiresWhitelistedDomain) {
>         // Not a keyword search, but we want to enforce the host whitelist
>         if (!affectedByWhitelist || (affectedByWhitelist && inWhitelist))
Attachment #8487419 - Flags: review?(bugs) → review+
https://hg.mozilla.org/integration/fx-team/rev/05f6564ede67
Flags: in-testsuite? → in-testsuite+
Keywords: checkin-needed
Whiteboard: [fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/05f6564ede67
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → Firefox 35
Depends on: 1067168
Verified fixed on Nightly 35.0a1 (2014-09-14), testing was performed using Windows 7 64-bit, Ubuntu 14.04 LTS 64-bit and Mac OS X 10.9.4.

Potential issues:
1. Accessing IPv6 URLs using two brackets (e.g. http://[[fe80::250:4ff:fea5:d634]] ) throws the following JS error in the Browser Console:
> [Exception... "Component returned failure code: 0x804b000a (NS_ERROR_MALFORMED_URI) 
> [nsIURIFixup.createFixupURI]"  nsresult: "0x804b000a (NS_ERROR_MALFORMED_URI)"  
> location: "JS frame :: resource:///modules/PlacesUIUtils.jsm :: PUIU_createFixedURI :: line 54" 
> data: no]
in urlbarBindings.xml:311
 * please note that a search is performed for that string, which is expected
2. IPv6 URLs that only end in a bracket (e.g. http://fe80::250:4ff:fea5:d634]/ )  are considered valid URLs
 * on the other hand, IPv6 URLs that only start with a bracket (i.e. http://[2001:4860:0:2001::68/ ) are considered search strings
3. Accessing IPv6 URLs without typing the protocol and the first bracket throws the following JS error in the Browser Console:
> NS_ERROR_UNKNOWN_HOST: Component returned failure code: 0x804b001e 
> (NS_ERROR_UNKNOWN_HOST) [nsIDNSService.asyncResolve] 
in browser.js:10658
 * please note that a search is performed for that string, which is expected
4. IPv6 URLs are not considered search strings despite "network.dns.disableIPv6" being true
 * please note that a server error page is displayed instead of a search page

I believe that if the scenarios mentioned above are indeed issues, they can be treated in separate bugs. So I'm marking this one verified. 

Alex, what's your take on these?
Status: RESOLVED → VERIFIED
Flags: needinfo?(abardas)
Depends on: 1067637
(In reply to Andrei Vaida, QA [:avaida] from comment #35)
> Verified fixed on Nightly 35.0a1 (2014-09-14), testing was performed using
> Windows 7 64-bit, Ubuntu 14.04 LTS 64-bit and Mac OS X 10.9.4.
> 
> Potential issues:
> 1. Accessing IPv6 URLs using two brackets (e.g.
> http://[[fe80::250:4ff:fea5:d634]] ) throws the following JS error in the
> Browser Console:
> > [Exception... "Component returned failure code: 0x804b000a (NS_ERROR_MALFORMED_URI) 
> > [nsIURIFixup.createFixupURI]"  nsresult: "0x804b000a (NS_ERROR_MALFORMED_URI)"  
> > location: "JS frame :: resource:///modules/PlacesUIUtils.jsm :: PUIU_createFixedURI :: line 54" 
> > data: no]
> in urlbarBindings.xml:311
>  * please note that a search is performed for that string, which is expected
> 2. IPv6 URLs that only end in a bracket (e.g.
> http://fe80::250:4ff:fea5:d634]/ )  are considered valid URLs
>  * on the other hand, IPv6 URLs that only start with a bracket (i.e.
> http://[2001:4860:0:2001::68/ ) are considered search strings
> 3. Accessing IPv6 URLs without typing the protocol and the first bracket
> throws the following JS error in the Browser Console:
> > NS_ERROR_UNKNOWN_HOST: Component returned failure code: 0x804b001e 
> > (NS_ERROR_UNKNOWN_HOST) [nsIDNSService.asyncResolve] 
> in browser.js:10658
>  * please note that a search is performed for that string, which is expected
> 4. IPv6 URLs are not considered search strings despite
> "network.dns.disableIPv6" being true
>  * please note that a server error page is displayed instead of a search page
> 
> I believe that if the scenarios mentioned above are indeed issues, they can
> be treated in separate bugs. So I'm marking this one verified. 
> 
> Alex, what's your take on these?

1. I've tested on a Beta Release (v33). When accessing that url, it throws the same error. This patch doesn't introduce it and I'm not that concerned about it right now because the error it catched and the behavior is correct (the search is performed).

2. I can also reproduce this issue on v33 and KeywordURIFixup method is not called for that input (but that ip is not valid ipv6 so it should be a bug if it's not intentional)

3. Looks like a bug. It may have been introduced by bug 1048618 (it may have not, I'd have to bisect it more), but there should be no errors.

4. Looks like a bug. I can still reproduce it in v33 for example, but that pref should disable ipv6 name lookups. I have to investigate more and maybe fill a bug about it.

Overall, I'd say that some bugs can be filed and these tests should also be performed after a patch for bug 1057531 lands.
Flags: needinfo?(abardas)
(In reply to Alex Bardas :alexbardas from comment #36)
> 1. I've tested on a Beta Release (v33). When accessing that url, it throws
> the same error. This patch doesn't introduce it and I'm not that concerned
> about it right now because the error it catched and the behavior is correct
> (the search is performed).
> 
> 2. I can also reproduce this issue on v33 and KeywordURIFixup method is not
> called for that input (but that ip is not valid ipv6 so it should be a bug
> if it's not intentional)
> 
> 3. Looks like a bug. It may have been introduced by bug 1048618 (it may have
> not, I'd have to bisect it more), but there should be no errors.
> 
> 4. Looks like a bug. I can still reproduce it in v33 for example, but that
> pref should disable ipv6 name lookups. I have to investigate more and maybe
> fill a bug about it.
> 
> Overall, I'd say that some bugs can be filed and these tests should also be
> performed after a patch for bug 1057531 lands.

Thank you for your feedback, Alex. I'll file separate bugs for Issue #3 and Issue #4.
Blocks: 1079707
Blocks: 1079721
Will this also resolve the case where periods are used in a single "word" search without a valid TLD? For example, in all other browsers I'm able to type async.each into the address bar and it's correctly interpreted as a search. Firefox assumes it's an invalid domain.
(In reply to Timothy Strimple from comment #38)
> Will this also resolve the case where periods are used in a single "word"
> search without a valid TLD? For example, in all other browsers I'm able to
> type async.each into the address bar and it's correctly interpreted as a
> search. Firefox assumes it's an invalid domain.

This bug won't fix that case, and I can't find one for that case. Can you please file a bug for this?
Flags: needinfo?(tim)
I have created a bug for this. My first Firefox bug, please let me know if I missed anything!

https://bugzilla.mozilla.org/show_bug.cgi?id=1080682
Flags: needinfo?(tim)
(In reply to Timothy Strimple from comment #40)
> I have created a bug for this. My first Firefox bug, please let me know if I
> missed anything!
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1080682

Thanks, looks good :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: