Closed
Bug 1095300
Opened 10 years ago
Closed 9 years ago
slaveapi should be able to determine whether a slave is currently running a job
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: philor, Assigned: coop)
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/4111])
No description provided.
Comment 1•10 years ago
|
||
VS2013 made things faster, then we combined in js etc into libxul and now it's slower than we started ?
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/4111]
Comment 2•10 years ago
|
||
Per IRC, wontfix this bug (philor said not to change the current logic here, at least if said logic applies to all slaves) [21:48:58] Callek philor: hitting up https://bugzilla.mozilla.org/show_bug.cgi?id=1095300 now, whats your recommended "time to wait before graceful" knowing of course that a failed graceful will *also* take that long to be noticed [21:49:13] Callek as in, max(+6 hours, whatever I set this time to) [21:49:45] Callek (since 6 hours is the slaveapi timeout for gracefuls it can't verify) [21:50:18] Callek philor: is 6 hours, or 7 hours your recommended time, is what I'm basically asking [21:50:28] Callek philor: also this time is the same across all slaves/jobs [21:50:45] philor Callek: my recommendation is that we build a tool to track end-to-end times [21:50:53] philor and that we make it not the same across all slaves [21:51:06] philor that's... what's that nice word?... suboptimal [21:51:10] Callek philor: basically I'm asking what I can do right now to make things better [21:51:21] Callek since right now its 5 hours [21:51:50] Callek (I have other priorities that conflict with this atm, which means I can only devote a few minutes today to it) [21:52:40] philor Callek: so, we have slave pools where no job ever takes longer than an hour, and we leave them idle for 5 hours, and then another 6 if they don't graceful, and one slave pool where jobs take 5.5 hours, and we thus lose them for 11 hours every day because they rarely graceful? [21:52:47] philor Callek: please do nothing [21:53:33] Callek philor: well if the graceful comes back as failed, early (which sadly is not that common) then we will reboot earlier than the 6 hours [21:53:41] Callek and if it is successful we reboot right away [21:54:04] Callek but yea its the 5 hours from "last job" --> "checking" that we deal with right now [21:53:53] philor I can manually deal with Win build slaves better than every single pool can deal with even more idle time [21:54:18] Callek ok, then I'll mark that bug wontfix, thanks
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Assignee | ||
Comment 3•10 years ago
|
||
Couldn't we run different instances of slaverebooter at different cadences, each using a different config file to exclude slave types not on that cadence?
Reporter | ||
Comment 4•10 years ago
|
||
Thanks to not really having a clear picture of what happens when, I think I underestimated how widespread this is. It has less to do with builds which take longer than slaverebooter's idea of idle time than it has to do with how frequently we have running builds of any duration on a slave which had its previous build finish more than slaverebooter's idea of idle time ago. What we do, every single day, is run out of demand as the US workday ends, build up a good percentage of the Windows build pool with idle times of 1 or 2 or 3 or 4 hours, and then as NZ and Japan and then Europe wake up and first-thing push the things they got review on, we set ourselves up for the quarter-till-1am slaverebooter run to happen while we have a lot of slaves running a build, and thus unable to immediately graceful, but with more than 5 hours since their last job because they started their current job with several hours of idle time.
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #4) > What we do, every single day, is run out of demand as the US workday ends, > build up a good percentage of the Windows build pool with idle times of 1 or > 2 or 3 or 4 hours, and then as NZ and Japan and then Europe wake up and > first-thing push the things they got review on, we set ourselves up for the > quarter-till-1am slaverebooter run to happen while we have a lot of slaves > running a build, and thus unable to immediately graceful, but with more than > 5 hours since their last job because they started their current job with > several hours of idle time. We should really have another check here, specifically we should be able to determine quickly whether a given slave is currently running a job via slaveapi. We could gather this data by hitting the buildbot web interface (slow) or tailing the twistd.log on the slave or extending the buildbot slave code directly to provide a simple json endpoint. This would allow smarter decisions to be made by slaverebooter, and also provide useful information to people using slave health.
Assignee: bugspam.Callek → nobody
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: Adjust slaverebooter's idea of how long jobs take, since we're doing Win PGO builds in more than five hours now → slaveapi should be able to determine whether a slave is currently running a job
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → coop
Assignee | ||
Comment 6•9 years ago
|
||
There is still no non-invasive way to do this, and slaverebooter isn't used any more.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 9 years ago
Resolution: --- → WONTFIX
Updated•7 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•