Closed Bug 1428227 Opened 6 years ago Closed 6 years ago

Google doesn't index bugzilla.mozilla.org any more

Categories

(bugzilla.mozilla.org :: Search, defect)

Production
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: fireattack, Assigned: dylan)

Details

Attachments

(1 file)

45 bytes, text/x-github-pull-request
emceeaich
: review+
Details | Review
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.71 Safari/537.36

Steps to reproduce:

Recently I found that Google site search for BMO barely return any results (a regression, as it worked fine before).

An extreme example is just search "site:bugzilla.mozilla.org":

https://www.google.com/search?q=site%3Abugzilla.mozilla.org







Actual results:

It returns only 3 results here.

DDG and Bing doesn't seem to be affected:

https://duckduckgo.com/?q=site%3Abugzilla.mozilla.org&t=hg&ia=web
https://www.bing.com/search?q=site%3Abugzilla.mozilla.org

I noticed this problem a 2 days ago and thought it might be temporary, but it persists.


It is the same with other keyword(s), compare:

https://www.google.com/search?q=webext+site%3Abugzilla.mozilla.org

with

https://duckduckgo.com/?q=webext+site%3Abugzilla.mozilla.org&t=hg&ia=web
https://www.bing.com/search?q=webext%20site%3Abugzilla.mozilla.org


Expected results:

It should show more results. 

Additional notes:

It also affects the built-in BMO feature, Google search (https://bugzilla.mozilla.org/query.cgi?format=google), since it just redirects you to Google.

I checked https://bugzilla.mozilla.org/robots.txt but didn't see anything strange, it does have `Allow: /show_bug.cgi`.

another user reports that pages are now marked as  <meta name="robots" content="noindex" /> , not sure if it's new, or if it has any effect.
Only error pages should have noindex. However we're putting nofollow on all links -- maybe we should only do that for links that lead off-site.
Oh wow, would you look at that? The logic is totally wrong. I'll have a fix up in jiffy.
Assignee: nobody → dylan
Attached file PR
Instead of a "noindex" which is a boolean (and a double negative)
we just mirror the 'robots' meta directive as a variable.
so we set robots = noindex on the error pages, and robots == index everywhere else by default. 

I originally reviewed this code, and I didn't test that anything *wasn't" set to noindex.
Attachment #8940040 - Flags: review?(ehumphries)
Note to self: after push, using google webmaster tools to request a re-index.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(dylan)
Resolution: --- → FIXED
Hey, how can I get google to re-index us? will it just happen or ... ?
Flags: needinfo?(dylan) → needinfo?(glob)
iirc it should happen automatically because of the sitemap. https://support.google.com/webmasters/answer/6065812?hl=en  hints this may take several days.

google's sitemap view does indeed show that there's a lot of pages ignored:
> 1,317,302 Submitted; 3,388 Indexed
https://www.google.com/webmasters/tools/sitemap-list?hl=en&siteUrl=http://bugzilla.mozilla.org/#MAIN_TAB=1&CARD_TAB=-1
Flags: needinfo?(glob)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: