Closed
Bug 1145387
(t-yosemite-r5-0073)
Opened 9 years ago
Closed 9 years ago
t-yosemite-r5-0073 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
Details
(Whiteboard: [buildduty][buildslaves][capacity])
Apparently I broke it by rebooting it: I rebooted the whole 10.10 pool, and the rest of them mostly survived, but this one has done two PDU reboots and one ssh reboot since without returning to actually taking jobs.
Comment 1•9 years ago
|
||
Re-imaged and returned to production.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 2•9 years ago
|
||
Made no difference - if the reimage actually did happen, what survives a reimage?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 3•9 years ago
|
||
It's failing to connect to the master in a way I haven't seen before: 2015-03-24 08:52:36-0700 [-] Log opened. 2015-03-24 08:52:36-0700 [-] twistd 10.2.0 (/tools/buildbot-0.8.4-pre-moz6/bin/python2.7 2.7.3) starting up. 2015-03-24 08:52:36-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor. 2015-03-24 08:52:36-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x10d003c20> 2015-03-24 08:52:36-0700 [-] Connecting to buildbot-master107.bb.releng.scl3.mozilla.com:9201 2015-03-24 08:52:36-0700 [-] Watching /builds/slave/talos-slave/shutdown.stamp's mtime to initiate shutdown 2015-03-24 08:52:36-0700 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective 2015-03-24 08:52:36-0700 [Broker,client] While trying to connect: Traceback from remote host -- Traceback (most recent call last): File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/spread/pb.py", line 1346, in remote_respond d = self.portal.login(self, mind, IPerspective) File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/cred/portal.py", line 116, in login ).addCallback(self.realm.requestAvatar, mind, *interfaces File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback callbackKeywords=kw) File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/buildbot-0.8.2_hg_f8e28d877d11_production_0.8-py2.7.egg/buildbot/master.py", line 498, in requestAvatar p = self.botmaster.getPerspective(mind, avatarID) File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/buildbot-0.8.2_hg_f8e28d877d11_production_0.8-py2.7.egg/buildbot/master.py", line 364, in getPerspective d = sl.slave.callRemote("print", "master got a duplicate connection; keeping this one") File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/spread/pb.py", line 328, in callRemote _name, args, kw) File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/spread/pb.py", line 807, in _sendMessage raise DeadReferenceError("Calling Stale Broker") twisted.spread.pb.DeadReferenceError: Calling Stale Broker 2015-03-24 08:52:36-0700 [Broker,client] Lost connection to buildbot-master107.bb.releng.scl3.mozilla.com:9201 2015-03-24 08:52:36-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x10d003c20> 2015-03-24 08:52:36-0700 [-] Main loop terminated. 2015-03-24 08:52:36-0700 [-] Server Shut Down. So it keeps looping indefinitely through the list of runner tasks. cc-ing :kmoir for yosemite insight, and :mrrrgn for runner.
Comment 4•9 years ago
|
||
The master had this in the log 2015-03-24 10:31:38-0700 [Broker,31889,10.26.56.68] duplicate slave t-yosemite-r5-0073; rejecting new slave and pinging old 2015-03-24 10:31:38-0700 [Broker,31889,10.26.56.68] old slave was connected from IPv4Address(TCP, '10.26.56.68', 49239) 2015-03-24 10:31:38-0700 [Broker,31889,10.26.56.68] new slave is from IPv4Address(TCP, '10.26.56.68', 49204) 2015-03-24 10:31:38-0700 [Broker,31889,10.26.56.68] Peer will receive following PB traceback: 2015-03-24 10:31:38-0700 [Broker,31889,10.26.56.68] Unhandled Error I updated slavealloc and the the slave attached to my master and it started taking jobs. So then I renabled it as a production slave again in slavealloc and rebooted it. It connected to buildbot-master107 again and the same error messages are occurring. According to the buildbot issue, the old tcp connection will eventually time out, but not sure if this still applies given that it's such an old report http://trac.buildbot.net/ticket/887 and http://trac.buildbot.net/ticket/1856 according to netstat on the master (buildbot-master107.bb.releng.scl3.mozilla.com) there aren't any established connections to this ip (10.26.56.68)
Comment 5•9 years ago
|
||
(In reply to Kim Moir [:kmoir] from comment #4) > according to netstat on the master > (buildbot-master107.bb.releng.scl3.mozilla.com) there aren't any established > connections to this ip (10.26.56.68) I'm going to gracefully restart bm107 and see if that helps.
Comment 6•9 years ago
|
||
This slave connect to bm108 and is now passing jobs. bm107 has been restarted.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•