Closed Bug 1058065 Opened 10 years ago Closed 10 years ago

Slaverebooter hung on Aug 25

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: Callek, Assigned: Callek)

Details

Attachments

(1 file)

So, this morning we got a slaverebooter alert:

[07:54:41]	nagios-releng	Mon 04:55:06 PDT [4905] buildbot-master74.srv.releng.usw2.mozilla.com:File Age - /builds/slaverebooter/slaverebooter.log is CRITICAL: FILE_AGE CRITICAL: /builds/slaverebooter/slaverebooter.log is 22034 seconds old and 27621448 bytes (http://m.mozilla.org/File+Age+-+/builds/slaverebooter/slaverebooter.log)
[08:54:41]	nagios-releng	Mon 05:55:06 PDT [4908] buildbot-master74.srv.releng.usw2.mozilla.com:File Age - /builds/slaverebooter/slaverebooter.log is CRITICAL: FILE_AGE CRITICAL: /builds/slaverebooter/slaverebooter.log is 25634 seconds old and 27621448 bytes (http://m.mozilla.org/File+Age+-+/builds/slaverebooter/slaverebooter.log)
[09:24:42]	nagios-releng	Mon 06:25:07 PDT [4909] buildbot-master74.srv.releng.usw2.mozilla.com:File Age - /builds/slaverebooter/slaverebooter.log is OK: FILE_AGE OK: /builds/slaverebooter/slaverebooter.log is 436 seconds old and 27621583 bytes (http://m.mozilla.org/File+Age+-+/builds/slaverebooter/slaverebooter.log)

The clear was from simone doing a kill on the slaverebooter process on bm74

when I grep slaverebooter.log for Aug 25 the first line is the very last line of the attached log.

The preceeding log lines are everything from the last run on Aug 24.

Note the *one* case of "2014-08-24 22:46:52,042 - ERROR - b-linux64-hp-0004 - Caught exception while processing" which is a slaverebooter error. telling as well is that theres no traceback there.

I'm going to leave this open and assigned to me for now, only bumping priority atm if it happens again.
There has been at least 1 or 2 other instances of this since this bug was filed, however in at least the last 2 weeks there hasn't been one (last week I was buildduty so was *hoping* to catch it then).

I'm going to resolve/wfm for now, and if it happens again pete/coop I'd suggest avoiding trying to fix, and just let me know, so I can triage "live". and we can probably still use this bug as a bouncing point for that.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: