Closed Bug 1323827 Opened 8 years ago Closed 7 years ago

[generic-worker] Worker tries reclaiming resolved task

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: garndt, Assigned: pmoore)

Details

Attachments

(1 file)

Looking at this log, the task was resolved, task environment deleted, and new user created, but then 6 minutes later an error is displayed explaining that a task marked as succeeded (resolved) cannot be reclaimed. 

Perhaps this just means that the reclaim timer should be stopped.  I'm not sure if it prevents the worker from claiming new tasks until this error happens.

Dec 15 11:20:10 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:08 Resolving task... 
Dec 15 11:20:10 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:08 Command finished successfully! 
Dec 15 11:20:11 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:09 Trying to remove directory 'Z:\task_1481818611' via os.RemoveAll(path) call as GenericWorker user... 
Dec 15 11:20:41 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:39 Looking for existing task users to delete... 
Dec 15 11:20:42 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:39 Attempting to remove Windows user task_1481818611... 
Dec 15 11:20:42 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:39 Running command: 'net' 'user' 'task_1481818611' '/delete' 
Dec 15 11:20:42 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: The command completed successfully. 
Dec 15 11:20:42 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:39 Making system call CloseDesktop with args: [7E0] 
Dec 15 11:20:42 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:39   Result: 1 
Dec 15 11:20:42 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:20:39 Creating Windows User task_1481822439... 
<some logs omitted>
Dec 15 11:24:05 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com HaltOnIdle: Is-Running :: generic-worker is running. 
Dec 15 11:24:05 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com HaltOnIdle: instance appears to be productive. 
Dec 15 11:26:05 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com HaltOnIdle: Is-Running :: generic-worker is running. 
Dec 15 11:26:05 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com HaltOnIdle: instance appears to be productive. 
Dec 15 11:26:33 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:26:31 Not updating status of task Cxie9BRxRKejCv0jYPbWTA run 0 from Succeeded to Reclaimed. This is because you can only update to status Reclaimed if the previous status was one of: [Claimed Reclaimed] 
Dec 15 11:26:33 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:26:31 Encountered exception when reclaiming task Cxie9BRxRKejCv0jYPbWTA: Not updating status from Succeeded to Reclaimed. This is because you can only update to status Reclaimed if the previous status was one of: [Claimed Reclaimed] 
Dec 15 11:26:33 i-0efa08abd827e7f63.gecko-t-win7-32.use1.mozilla.com generic-worker: 2016/12/15 17:26:31 Killing task Cxie9BRxRKejCv0jYPbWTA since I cannot reclaim it
Also to note, the logs explain that task users are being used, but I thought that these new instances (this was spawned on the 14th) should be using the run as generic user stuff (so no task users needed), unless I misunderstand how that works.
I agree this is misleading - but the message is just a warning. At the time I wrote this, I didn't see an obvious way to cancel the timer, but I can revisit.

About the logs saying new task users are being created - I'll look into that. Indeed for tasks with config setting runTasksAsCurrentUser, no new OS users should be created. Again, it might be a logging issue - I'll check.
Component: Worker → Generic-Worker
OK, there is a trivial way to stop a timer, d'oh...

https://golang.org/pkg/time/#Timer.Stop
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8819278 - Flags: review?(garndt)
(In reply to Pete Moore [:pmoore][:pete] from comment #2)

> About the logs saying new task users are being created - I'll look into
> that. Indeed for tasks with config setting runTasksAsCurrentUser, no new OS
> users should be created. Again, it might be a logging issue - I'll check.

This worker type (gecko-t-win7-32) doesn't (yet) have runTasksAsCurrentUser set - we (me, Rob, Joel etc) were testing on gecko-t-win7-32-beta.

Since beta testing has been successful, I'll raise a PR to move that over to the non-beta worker types too.
(In reply to Pete Moore [:pmoore][:pete] from comment #2)
> At the time I wrote this, I didn't see an obvious way to cancel the timer,
> but I can revisit.

Now I think about it, probably this Stop() functionality was there from the beginning, but in the back of my mind I was thinking I had to prematurely terminate a go routine, which of course I didn't have to. Anyway, should be resolved now! :-)
(In reply to Pete Moore [:pmoore][:pete] from comment #5)

> This worker type (gecko-t-win7-32) doesn't (yet) have runTasksAsCurrentUser
> set - we (me, Rob, Joel etc) were testing on gecko-t-win7-32-beta.
> 
> Since beta testing has been successful, I'll raise a PR to move that over to
> the non-beta worker types too.

I remember now why we haven't rolled it out yet - we seem to be having shutdowns caused by the "HaltOnIdle" check, and I didn't want to roll out globally until this was fixed. I'll look into this now (and raise a separate bug for it, that I'll block this bug on).
(In reply to Pete Moore [:pmoore][:pete] from comment #7)
> I'll look into this now (and raise a separate bug for it, that I'll block this bug on).

Actually, the rolling out of runTasksAsCurrentUser to non-beta gecko windows worker types is not related to this bug about cancelling the maxRunTime timer, so I've created the bug 1323990 for the worker shutdown fix, and then I'll raise a bug to roll out after that lands.
Attachment #8819278 - Flags: review?(garndt) → review+
Thanks for the clarification about -beta...I forgot we were testing it in parallel with run as task user.
Commits pushed to master at https://github.com/taskcluster/generic-worker

https://github.com/taskcluster/generic-worker/commit/dc962f20e26bd12eb3a8ef833180c41ba9c55677
Bug 1323827: stop task reclaim timer when task completes naturally

https://github.com/taskcluster/generic-worker/commit/6162bff6847893b712b07c4f012b1c29be05742b
Merge pull request #31 from taskcluster/bug1323827

Bug 1323827: stop task reclaim timer when task completes naturally
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: