Closed
Bug 1467573
Opened 6 years ago
Closed 6 years ago
the slave-health reboot action fails for any machine
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: riman, Unassigned)
Details
I have tried to reboot a few machines from slave-health but the action does not work. Reboot is failed -> Output [Errno 2] No such file or directory
Reporter | ||
Comment 1•6 years ago
|
||
https://dxr.mozilla.org/build-central/source/slave_health/js/slave_health.js#528-529 I could not found "/slaves" file in this PATH "/slave_health/json/" https://dxr.mozilla.org/build-central/source/slave_health/js/slave_health.js#5 404 Not Found: https://secure.pub.build.mozilla.org/slaveapi/slaves/ Q: Could you take a look, please?
Flags: needinfo?(q)
Comment 2•6 years ago
|
||
Oh, I misunderstood your question on slack! Q won't be able to help with this. I'm not sure who best to poke for issues with slaveapi, at the moment; certainly asking in #releng would be a good step (unless jordan has a better idea).
Flags: needinfo?(q)
Comment 3•6 years ago
|
||
Did a small investigation, and seems like the "slaveapi_baseurl" doesn't work, it's used in the "slave_shutdown_url" and "slave_reboot_url". Is the slaveapi been decomisioned? or retired?
Comment 4•6 years ago
|
||
AFAIK slaveapi is around until ESR52 dies and takes buildbot with it in early September.
Comment 5•6 years ago
|
||
radu, ciduty: I believe slaveapi is busted due to the python upgrades work that was done throughout puppet. namely: https://github.com/mozilla-releng/build-puppet/pull/71/files#diff-95780eb30d106e421159cda544ff09ec context: •jlund> Jordan Lund looks like slaveapi is broken. possibly python upgrade related 12:24:03 ↔ bc nipped out 12:30:49 <aki> yeah, https://github.com/mozilla-releng/build-puppet/pull/71/files#diff-95780eb30d106e421159cda544ff09ec landed recently; we could back out the slaveapi portion radu mentioned a good work around is to use inventory to find host information and reboot the machine manually. Decent stop gap but if reversing this patch "just works" I think it's worth doing so. relevant slaveapi logs from /builds/slaveapi/prod/slaveapi.log on slaveapi1.srv.releng.scl3.mozilla.com: 085 2018-06-22 09:56:17,199 - ERROR - -=- - Exception on /slaves/t-xp32-ix-005 [GET] 34086 Traceback (most recent call last): 34087 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app 34088 34089 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request 34090 return f 34091 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception 34092 self.jinja_env.filters[name or f.__name__] = f 34093 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request 34094 """ 34095 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request 34096 the view, and further request handling is stopped. 34097 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/views.py", line 84, in view 34098 constructor of the class. 34099 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/views.py", line 149, in dispatch_request 34100 def dispatch_request(self, *args, **kwargs): 34101 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/web/slave.py", line 25, in get 34102 slave.load_all_info() 34103 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/slave.py", line 38, in load_all_info 34104 Machine.load_all_info(self) 34105 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/machines/base.py", line 33, in load_all_info 34106 self.load_inventory_info() 34107 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/slave.py", line 60, in load_inventory_info 34108 info = Machine.load_inventory_info(self) 34109 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/machines/base.py", line 43, in load_inventory_info 34110 info = inventory.get_system(self.fqdn) 34111 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/clients/inventory.py", line 99, in get_system 34112 result = requests.get(str(url), auth=auth).json()["objects"][0] 34113 File "/builds/slaveapi/prod/lib/python2.7/site-packages/requests/api.py", line 55, in get 34114 # avoid leaving sockets open which can trigger a ResourceWarning in some 34115 File "/builds/slaveapi/prod/lib/python2.7/site-packages/requests/api.py", line 44, in request 34116 :return: :class:`Response <Response>` object 34117 File "/builds/slaveapi/prod/lib/python2.7/site-packages/requests/sessions.py", line 361, in request 34118 #: representing multivalued query parameters. 34119 File "/builds/slaveapi/prod/lib/python2.7/site-packages/requests/sessions.py", line 464, in send 34120 :param timeout: (optional) How long to wait for the server to send 34121 File "/builds/slaveapi/prod/lib/python2.7/site-packages/requests/adapters.py", line 363, in send 34122 """ 34123 SSLError: [Errno 2] No such file or director 34174 2018-06-22 12:10:18,984 - INFO - -=- - 10.22.81.90 - - [2018-06-22 12:10:18] "GET /slaves/health%20hg%20mozilla/actions/shutdown_buildsla ve HTTP/1.1" 200 135 0.000960 34175 34176 2018-06-22 12:10:18,994 - ERROR - -=- - Exception on /slaves/health hg mozilla [GET] 34177 Traceback (most recent call last): 34178 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app 34179 34180 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request 34181 return f 34182 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception 34183 self.jinja_env.filters[name or f.__name__] = f 34184 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request 34185 """ 34186 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request 34187 the view, and further request handling is stopped. 34188 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/views.py", line 84, in view 34189 constructor of the class. 34190 File "/builds/slaveapi/prod/lib/python2.7/site-packages/flask/views.py", line 149, in dispatch_request 34191 def dispatch_request(self, *args, **kwargs): 34192 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/web/slave.py", line 24, in get 34193 slave = SlaveClass(slave) 34194 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/slave.py", line 24, in __init__ 34195 Machine.__init__(self, name) 34196 File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/machines/base.py", line 18, in __init__ 34197 answer = resolver.query(name) 34198 File "/builds/slaveapi/prod/lib/python2.7/site-packages/dns/resolver.py", line 981, in query 34199 except dns.query.UnexpectedSource as ex: 34200 File "/builds/slaveapi/prod/lib/python2.7/site-packages/dns/resolver.py", line 910, in query 34201 if len(qname) > 1: 34202 NXDOMAIN Looks like requests and dns now are not happy by quick glance. As aki mentions above, we could backout (revert) that patch which may include needing to reinstall the venv depending on how puppet handles the backout or we try to fix up slaveapi source. The former would be a good stop gap since Slaveapi dies in early Sept as fubar mentions in comment 4. radu or someone from ciduty: could someone try to make a reverse patch of https://github.com/mozilla-releng/build-puppet/pull/71 in puppet and make a PR for review? You will want to revert only the parts that affect slaveapi.
Flags: needinfo?(riman)
Flags: needinfo?(ciduty)
Comment 6•6 years ago
|
||
I will be looking at coming up with a fix, instead of a backout. Currently I'm looking to setup the local env, so I can do proper testing on this issue. I have also talked with :riman and if I won't be able to find a fix for the issue this shift, I will handover to Radu everything I found so he can continue the work.
Flags: needinfo?(riman)
Flags: needinfo?(ciduty)
Reporter | ||
Comment 7•6 years ago
|
||
>I have created a revers patch for slaveapi/files/requirements.txt and made a PR for review. https://github.com/raduiman/build-puppet/pull/1/commits/921be40b58e18772678461069891162d93242909 Could you take a look, please? I did not find other files that affect slaveapi. >Should I reverse slaverebooter/files/requirements.txt too? Can it be related to the reboot issue? >And finally, should I change test/verify-requirements.sh to avoid checking the reversed files?
Flags: needinfo?(jlund)
Comment 8•6 years ago
|
||
(In reply to Radu Iman[:riman] from comment #7) > >I have created a revers patch for slaveapi/files/requirements.txt and made a PR for review. > https://github.com/raduiman/build-puppet/pull/1/commits/ > 921be40b58e18772678461069891162d93242909 > Could you take a look, please? I did not find other files that affect > slaveapi. awesome! can you make a PR from the mozilla-releng/build-puppet (upstream) repo. This PR is on your own account so merging it wouldn't land on mozilla-releng/build-puppet. > > >Should I reverse slaverebooter/files/requirements.txt too? Can it be related to the reboot issue? I can't recall if slaverebooter predates and is unrelated to slaveapi. I would reverse that file too. callek? > > >And finally, should I change test/verify-requirements.sh to avoid checking the reversed files? yes probably, good catch! callek, can you help out radu here. You are more familiar with slaveapi and you reviewed the original patch. Ben is still out on leave.
Flags: needinfo?(jlund) → needinfo?(bugspam.Callek)
Comment 9•6 years ago
|
||
(In reply to Jordan Lund (:jlund) from comment #8) > (In reply to Radu Iman[:riman] from comment #7) > > >I have created a revers patch for slaveapi/files/requirements.txt and made a PR for review. > > https://github.com/raduiman/build-puppet/pull/1/commits/ > > 921be40b58e18772678461069891162d93242909 > > Could you take a look, please? I did not find other files that affect > > slaveapi. > > awesome! can you make a PR from the mozilla-releng/build-puppet (upstream) > repo. This PR is on your own account so merging it wouldn't land on > mozilla-releng/build-puppet. > Ok, I do think the requirements revert would be a great first step, I wonder if we have any logs/data somewhere about what specific versions were installer prior to that patch landing though (incase an unnamed dep bumped with all this) A reinstall of the venv will be the next step if a basic puppet revert isn't enough. > > > > >Should I reverse slaverebooter/files/requirements.txt too? Can it be related to the reboot issue? > > I can't recall if slaverebooter predates and is unrelated to slaveapi. I > would reverse that file too. callek? Slaverebooter was roughly-speaking a cron that did the slave reboots for humans, and has been pretty defunct iirc. -- We shouldn't need to care about it. > > > > >And finally, should I change test/verify-requirements.sh to avoid checking the reversed files? > > yes probably, good catch! ++ > > callek, can you help out radu here. You are more familiar with slaveapi and > you reviewed the original patch. Ben is still out on leave. I've forgotten nearly all I once knew about slaveapi, but I'll certainly try to help where useful.
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Comment 10•6 years ago
|
||
I have created a new pull request: https://github.com/mozilla-releng/build-puppet/pull/143
Comment 11•6 years ago
|
||
Ontop of the proposed change by Radu, I have added a blacklist on this SlaveAPI requirements.txt so that PyUp will not upgrade the deps anymore. PR available here: https://github.com/mozilla-releng/build-puppet/pull/145
Comment 12•6 years ago
|
||
Slave Health doesn't exist anymore, Closing the bug.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•