Closed
Bug 1181153
Opened 9 years ago
Closed 9 years ago
Port treestatus to relengapi
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
References
Details
Attachments
(11 files)
51 bytes,
text/x-github-pull-request
|
dustin
:
review+
|
Details | Review |
51 bytes,
text/x-github-pull-request
|
dustin
:
review+
|
Details | Review |
1.88 KB,
text/plain
|
Details | |
40 bytes,
text/x-review-board-request
|
emorley
:
feedback+
|
Details |
46 bytes,
text/x-github-pull-request
|
dustin
:
review+
emorley
:
checkin+
|
Details | Review |
50 bytes,
text/x-github-pull-request
|
freddy
:
review+
|
Details | Review |
40 bytes,
text/x-review-board-request
|
catlee
:
review+
|
Details |
58 bytes,
text/x-github-pull-request
|
automatedtester
:
review+
|
Details | Review |
54 bytes,
text/x-github-pull-request
|
jsantell
:
review+
|
Details | Review |
48 bytes,
text/x-github-pull-request
|
abr
:
review+
|
Details | Review |
43 bytes,
text/x-github-pull-request
|
Details | Review |
Catlee and I got a good start on this at whistler in https://github.com/catlee/build-relengapi/tree/treestatus But needs to be finished up.
Updated•9 years ago
|
Component: Other → TreeStatus
Product: Release Engineering → Tree Management
QA Contact: mshal
Version: unspecified → ---
Assignee | ||
Comment 1•9 years ago
|
||
Attachment #8638170 -
Flags: review?(catlee)
Assignee | ||
Comment 2•9 years ago
|
||
That's landed, but I still need to make the transition.
Assignee | ||
Comment 3•9 years ago
|
||
https://etherpad.mozilla.org/treestatus-migration
Assignee | ||
Comment 4•9 years ago
|
||
Here's what I find in the access logs: 2620:101:80fc:224:baac:6fff:fe38:f64e - - [24/Aug/2015:14:24:28 +0000] "GET /?format=json HTTP/1.1" 200 7642 "-" "Python-urllib/2.7" mtv2 corp network 63.245.214.82 - - [24/Aug/2015:14:16:58 +0000] "GET /b2g-inbound?format=json HTTP/1.1" 200 346 "-" "Python-urllib/2.6" 63.245.214.162 - - [24/Aug/2015:14:18:40 +0000] "GET /try?format=json HTTP/1.1" 200 83 "-" "Python-urllib/2.7" 63.245.214.82 - - [24/Aug/2015:14:26:42 +0000] "GET /b2g-inbound?format=json HTTP/1.1" 200 346 "-" "Python-urllib/2.6" 63.245.214.82 - - [24/Aug/2015:14:31:44 +0000] "GET /b2g-inbound?format=json HTTP/1.1" 200 346 "-" "Python-urllib/2.6" 63.245.214.162 - - [24/Aug/2015:14:41:15 +0000] "GET /mozilla-inbound?format=json HTTP/1.1" 200 323 "-" "Python-urllib/2.7" 63.245.214.162 - - [24/Aug/2015:14:53:30 +0000] "GET /try?format=json HTTP/1.1" 200 83 "-" "Python-urllib/2.7" scl3/releng NAT, urllib UA 69.59.28.19 - - [24/Aug/2015:14:06:32 +0000] "GET / HTTP/1.1" 200 16590 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" pingdom <mumble> - - [24/Aug/2015:14:28:26 +0000] "GET /mozilla-inbound?format=json HTTP/1.1" 200 323 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0" users with browsers 63.245.214.162 - - [24/Aug/2015:14:52:31 +0000] "GET /gaia?format=json HTTP/1.0" 200 212 "-" "Twisted PageGetter" scl3 NAT, always /gaia I'm betting that the urrlib requests are from the hg hook (63.245.214.162) and b2g-bumper (.82). Pingdom is easy. Users with browsers will follow the redirect. The mtv2 requests baffle me a little: they appear roughly every 1-3 minutes, so I don't think they're on a crontask. I have no idea what's using the Twisted PageGetter -- is there something in the Buildbot code that consults the tree status?
Flags: needinfo?(bugspam.Callek)
Comment 5•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #4) > Here's what I find in the access logs: > > 2620:101:80fc:224:baac:6fff:fe38:f64e - - [24/Aug/2015:14:24:28 +0000] "GET > /?format=json HTTP/1.1" 200 7642 "-" "Python-urllib/2.7" > mtv2 corp network *spitballing* maybe nagios? > 63.245.214.162 - - [24/Aug/2015:14:52:31 +0000] "GET /gaia?format=json > HTTP/1.0" 200 212 "-" "Twisted PageGetter" > scl3 NAT, always /gaia This is not something in releng-controlled buildbot, I vaguely recall :Pike using buildbot for l10n reasons, and maybe jhford/someone for other b2g reasons (note this is requesting gaia). But this is me not really certain as to the cause of either of those.
Flags: needinfo?(bugspam.Callek)
Comment 6•9 years ago
|
||
Projects I've come across on Github/elsewhere that use TreeStatus, in case it helps with comment 5 (though most will be outside of our infra) and/or planning migration: https://hg.mozilla.org/hgcustom/version-control-tools/file/default/pylib/mozautomation/mozautomation/treestatus.py https://github.com/jhford/node-treestatus https://github.com/AutomatedTester/treestatus-stats https://github.com/jsantell/mozilla-tree-status https://github.com/KWierso/treestatus-monitor https://github.com/adamroach/moz-treestat https://github.com/glandium/pulsebot/blob/master/treestatus.py https://github.com/mozfreddyb/treestatusbot
Assignee | ||
Comment 7•9 years ago
|
||
The v6 stuff is not nagios -- nagios is not in mtv2, and anyway is only pinging this host. Thanks, Ed -- I'll check through that list and see what I can figure out. We could leave something in place to transform requests to http://treestatus.mozilla.org/<tree>?format=json into an appropriate call to the new API, and return the result, but that will mean that existing code doesn't change to point to the new service, and we're running two services indefinitely. My feeling is that we should get the stuff we know about, including tree-critical stuff, shifted to use the new API and then shut off the old host so that anything else will break and alert its author.
Comment 8•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #4) > Here's what I find in the access logs: > > The mtv2 requests baffle me a little: they appear roughly every > 1-3 minutes, so I don't think they're on a crontask. I have no idea what's > using the Twisted PageGetter -- is there something in the Buildbot code that > consults the tree status? Unsure if related - moc does monitor tree-status, so will need to change their tool
Assignee | ||
Comment 9•9 years ago
|
||
That's pingdom, and already on the list.
Assignee | ||
Comment 10•9 years ago
|
||
Comment on attachment 8638170 [details] [review] https://github.com/mozilla/build-relengapi/pull/308 (via github)
Attachment #8638170 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 11•9 years ago
|
||
the Twisted PageGetter is https://github.com/mozfreddyb/treestatusbot/blob/master/irc.py It's possible that the mtv2 requests are just from a host running one of the relevant Firefox extensions. I'm starting to change my mind about breaking the old site (comment 7) after modifying the known uses to hit RelengAPI. Keeping the old site running as a translator has the disadvantage of keeping and old service around (with attendant disk, memory, and CPU usage on servers), but the advantage of not disturbing a lot of people with something that is ultimately pretty trivial. I'm going to change the approach, then: I'll build a replacement for the existing treestatus.mozilla.org which redirects / to the RelengAPI UI but handles /<tree>?format=json and /?format=json as described in comment 7. I'll deploy this first (after testing), and only then migrate as many services as possible (comment 6, mozharness in-tree, mozharness out-of-tree, treeherder, hg hook) to look at RelengAPI. The replacement will be hosted on the releng cluster in scl3, since the ulterior motive for all of this is to get the service out of phx1. I'll see if I can implement the old site with just Apache directives, to avoid the need for an additional WSGI daemon.
Assignee | ||
Comment 12•9 years ago
|
||
Updated migration procedure in https://etherpad.mozilla.org/treestatus-migration
Assignee | ||
Comment 13•9 years ago
|
||
Attachment #8652898 -
Flags: review?(bugspam.Callek)
Assignee | ||
Comment 14•9 years ago
|
||
The Apache config looks like RewriteEngine On SSLProxyEngine On # proxy requests with ?format=json RewriteCond %{QUERY_STRING} ^format=json$ RewriteRule "^/(.*)" https://api.pub.build.mozilla.org/treestatus/compat/trees/$1 [P,L] ProxyPassReverse / https://api.pub.build.mozilla.org/treestatus/compat/trees/ # and redirect everything else RewriteRule "^/(.*)$" https://api.pub.build.mozilla.org/treestatus/$1 [R,L] but because the target is https, this requires mod_ssl be loaded, which is a little bit complicated.
Assignee | ||
Comment 15•9 years ago
|
||
Bug 1198837 has the TrafficScript I used to accomplish the same thing. Unfortunately, that does require a DNS change. I may need to use the Apache approach as well, to handle the DNS propagation interval, but at least that's just temporary. https://treestatus.allizom.org currently has this applied.
Assignee | ||
Comment 16•9 years ago
|
||
One-way sync from old to new; this uses transactions to safely delete and re-insert everything without a "blip" of lost data in the interim. It takes about 15 seconds to run on the current production data (I practiced by mirroring that to the relengapi staging instance). Note that this assumes direct access to both databases, which is unusual since they're in different datacenters!
Assignee | ||
Updated•9 years ago
|
Attachment #8652898 -
Flags: review?(bugspam.Callek) → review+
Assignee | ||
Comment 17•9 years ago
|
||
Bug 1181153: use the new RelengAPI-based tree status; r?emorley
Attachment #8656663 -
Flags: review?(emorley)
Comment 18•9 years ago
|
||
Comment on attachment 8656663 [details] MozReview Request: Bug 1181153: use the new RelengAPI-based tree status; r?emorley Looks fine to me, but deferring to an hg.m.o peer (owner?) :-)
Attachment #8656663 -
Flags: review?(gps)
Attachment #8656663 -
Flags: review?(emorley)
Attachment #8656663 -
Flags: feedback+
Updated•9 years ago
|
Attachment #8656663 -
Flags: review?(gps)
Comment 19•9 years ago
|
||
Comment on attachment 8656663 [details] MozReview Request: Bug 1181153: use the new RelengAPI-based tree status; r?emorley https://reviewboard.mozilla.org/r/18229/#review16413 Aside from the API issue, this is good. ::: hghooks/mozhghooks/treeclosure.py:25 (Diff revision 1) > -treestatus_base_url = "https://treestatus.mozilla.org" > +treestatus_base_url = "https://api.pub.build.mozilla.org/treestatus/trees/%s" https://api.pub.build.mozilla.org/treestatus/trees/mozilla-inbound is failing for me. This change as-is will break the hook.
Assignee | ||
Comment 20•9 years ago
|
||
Well the change isn't at fault - there's just no data there yet. But as I said in the review req, this patch won't land until that is the authoritative data source.
Assignee | ||
Comment 21•9 years ago
|
||
Attachment #8664401 -
Flags: review?(cdawson)
Assignee | ||
Comment 22•9 years ago
|
||
Attachment #8664474 -
Flags: review?(fbraun)
Assignee | ||
Comment 23•9 years ago
|
||
Bug 1181153: use the new treestatus API; r?catlee
Attachment #8664484 -
Flags: review?(catlee)
Assignee | ||
Comment 24•9 years ago
|
||
Attachment #8664499 -
Flags: review?(dburns)
Comment 25•9 years ago
|
||
Comment on attachment 8664474 [details] [review] https://github.com/mozfreddyb/treestatusbot/pull/1 Looks good to me.
Attachment #8664474 -
Flags: review?(fbraun) → review+
Comment 26•9 years ago
|
||
Comment on attachment 8664484 [details] MozReview Request: Bug 1181153: use the new treestatus API; r?catlee https://reviewboard.mozilla.org/r/19961/#review18067
Attachment #8664484 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 27•9 years ago
|
||
Attachment #8665022 -
Flags: review?(jsantell)
Assignee | ||
Comment 28•9 years ago
|
||
Attachment #8665029 -
Flags: review?(adam)
Assignee | ||
Comment 29•9 years ago
|
||
Attachment #8665033 -
Flags: review?(mh+mozilla)
Comment 30•9 years ago
|
||
Comment on attachment 8665022 [details] [review] https://github.com/jsantell/mozilla-tree-status/pull/6 Tree Status addon looks good -- will the old API work for users that don't upgrade? (Not a huge deal, this is all for internal mozilla usage for the most part)
Attachment #8665022 -
Flags: review?(jsantell) → review+
Assignee | ||
Comment 31•9 years ago
|
||
Jordan, the old API will keep working for "a while", but I'd like to decommission it within a few months.
Updated•9 years ago
|
Attachment #8664499 -
Flags: review?(dburns) → review+
Assignee | ||
Comment 32•9 years ago
|
||
Comment on attachment 8664401 [details] [review] https://github.com/mozilla/treeherder/pull/995 (from github issue)
Attachment #8664401 -
Flags: review?(cdawson) → review+
Assignee | ||
Comment 33•9 years ago
|
||
When it comes time to make this transition, I'd like to deploy the Apache config in comment 14 on the phx1 generic cluster so that any client with cached DNS gets the right results. There's a way to do this with TrafficScript (see bug 1198837) but as I understand it the phx1 load balancer can't proxy over to an scl3 VIP (in other words, it supports `pool.select`, but has no equivalent to ProxyPass). That's what's led me to do this with Apache. The rub is, in order to proxy to an HTTPS backend, Apache requires mod_ssl, which does not appear to be installed on the phx1 generic cluster. And that poses a substantial risk to other generic sites. As I see it, the options are: - install mod_ssl and make this Apache config change during the transition - do this with trafficscript instead (if possible) - accept that, for the duration of the DNS propagation, there are two treestatus instances The cost of the last is that sheriffs cannot close the trees during that time. It's likely only a few minutes, so probably not horrible. Richard, since you helped out on bug 1198837, what do you think?
Flags: needinfo?(rsoderberg)
Comment 34•9 years ago
|
||
Short-term? I would migrate treestatus DNS to Dynect (or Route 53) and then take option 3, "two treestatus instances", for ~5 minutes, and then hardhat the old instance permanently. Long-term? I would migrate treestatus to AWS, since it needs to be up even if a datacenter is down.
Flags: needinfo?(rsoderberg)
Assignee | ||
Comment 35•9 years ago
|
||
How does migrating to dynect or route 53 help over just switching the record in Mozilla's DNS? And yes, RelengAPI will be migrating to Heroku someday.
Assignee | ||
Comment 36•9 years ago
|
||
Per irc, we can avoid all of this if we just do the transition in a TCW. Set things up in scl3, then just change the DNS during the TCW when the trees are all marked "closed" anyway. The DNS propagation will end before the TCW does, so no need to manage the split-brain during that time. And if the new service fails, just revert the DNS to the un-touched phx1 deployment.
Assignee | ||
Comment 37•9 years ago
|
||
https://old.etherpad-mozilla.org/treestatus-migration
Assignee | ||
Comment 38•9 years ago
|
||
https://public.etherpad-mozilla.org/p/treestatus-migration
Assignee | ||
Comment 39•9 years ago
|
||
OK, migrated! I've left the old service in phx1 on for the moment, although all DNS is pointed away from it. It's still serving ~0.003 rq/s from browsers viewing treestatus, but there's no need to cut those folks off. All of the patch-landing remains to be done, but can happen at any time now.
Assignee | ||
Comment 40•9 years ago
|
||
The hg hooks are landed. I don't want to do anything further on a Friday afternoon.
Comment 41•9 years ago
|
||
https://hg.mozilla.org/hgcustom/version-control-tools/rev/783405fe84cb77c7b3a5ef10564620a075b94f0a hghooks: update treeclosure tests to work with releng api (bug 1181153)
Updated•9 years ago
|
Attachment #8665033 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 42•9 years ago
|
||
:glandium -- does that mean pulsebot won't get patched? Or that there was a problem with the patch? Or that you're the wrong person to review? Or that you landed the patch?
Flags: needinfo?(mh+mozilla)
Comment 43•9 years ago
|
||
That there was a problem with the patch and that I landed a fix anyways. I was also assuming you saw that I closed the pull request mentioning it was fixed.
Flags: needinfo?(mh+mozilla)
Comment 44•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/9379fedaf0e372e3dac9751a0cc57d1eac5b315a Bug 1181153: use the new treestatus in RelengAPI
Updated•9 years ago
|
Attachment #8664401 -
Flags: checkin+
Assignee | ||
Comment 45•9 years ago
|
||
OK, this is largely complete. There is still code out there with the old URLs in it, but (a) it will still work and (b) there are patches on this bug for it. There's a MOC bug to set up a new pingdom alert. The next release of relengapi will include a link from the root page.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Attachment #8665029 -
Flags: review?(adam) → review?
Updated•9 years ago
|
Product: Tree Management → Release Engineering
Comment 46•9 years ago
|
||
Comment on attachment 8665029 [details] [review] https://github.com/adamroach/moz-treestat/pull/1 Had to make a minor tweak here ("Accept: application/jason"), but aside from that, this patch works like a charm. Thanks!
Attachment #8665029 -
Flags: review? → review+
Updated•2 years ago
|
Component: Applications: TreeStatus → General
You need to log in
before you can comment on or make changes to this bug.
Description
•