Closed
Bug 1277568
Opened 8 years ago
Closed 8 years ago
Generic worker live log artifacts are unreachable after task completes
Categories
(Firefox Build System :: Task Configuration, task)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ekyle, Assigned: pmoore)
Details
Attachments
(1 file)
Looking at artifacts [1], the URLs [2] get forwarded to bad "servers". Here is what the error looks like on may side: > ERROR: HTTPSConnectionPool( > host='gq275tyaaaavkecitvprfvkbx4nfmuwsh4pn2xddtjjw4aon.taskcluster-worker.net', > port=60023 > ): Max retries exceeded with url: /log/A2IV1T3zRQKo9ykszrRrNQ ( > Caused by NewConnectionError( > '<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f4b84adfc90>: > Failed to establish a new connection: [Errno -5] No address associated with hostname', > ) > ) My sample for the past hour seems to indicate that only the "command_000000.log.live" is affected. [1] https://queue.taskcluster.net/v1/task/50oDK_RFQ92Fh1y_c589nw/artifacts [2] http://queue.taskcluster.net/v1/task/50oDK_RFQ92Fh1y_c589nw/artifacts/public/logs/command_000000.log.live
Comment 1•8 years ago
|
||
These are NSS tasks - Tim/Pete -- any idea about what's going on here?
Component: General → Task Configuration
Assignee | ||
Comment 2•8 years ago
|
||
These are livelog artifacts which connect you to the worker while the job is running. Once the task has completed, the worker stops serving the livelogs, and the URLs no longer work. I'll check with the team how these get fixed in docker worker...
Reporter | ||
Comment 3•8 years ago
|
||
Maybe there is no problem: The expiry shows them as expired.
> "expires": "2016-06-02T08:44:02.271Z"
Comment 4•8 years ago
|
||
docker-worker streams from the task output to two places, one to the live log service running along side the task, and one to a "backing" log that is a temp file on disk. When the task first starts, the live log artifact that's created is a redirect artifact that redirects to the live log endpoint. Upon completion of the task, the "backing" log artifact is created with the temp file that had the task output, and then the live log redirect artifact is redirected to that "backing" artifact so in the end both artifacts are pointing to the same underlying file. Redirect artifacts can be recreated to point to a different URL as long as the expiration, content type, and artifact path are the same. This is how the live log redirect artifact is able to be pointed to the backing log URL after the task completes [1]. [1] https://github.com/taskcluster/docker-worker/blob/master/lib/features/local_live_log.js#L207
Assignee | ||
Comment 5•8 years ago
|
||
Awesome, I'll do the same in generic worker then! Thanks Greg. I think it might be more intuitive if we allowed it such that a redirect artifact could be replaced by an s3 artifact, such that we'd only have one artifact for the log, and it would be mostly transparent to the user if it was a redirect to the worker, or a download from s3. At the moment having two artifacts with slightly different names that point to the same thing seems confusing. Previously I thought opposition to this idea was that artifacts should be immutable, but given that redirect artifacts can be replaced with different redirect artifacts, maybe we can do it after all. What are your thoughts, Greg and Jonas?
Flags: needinfo?(jopsen)
Flags: needinfo?(garndt)
Comment 6•8 years ago
|
||
This was part of the motivation for the discussion of publish/unpublish where when the live artifact is no longer relevant, it could be unpublished and the new artifact for the backing log file uploaded and published.
Flags: needinfo?(garndt)
Assignee | ||
Updated•8 years ago
|
Summary: Lots of bad artifact URLs → Generic worker live log artifacts are unreachable after task completes
Assignee | ||
Comment 7•8 years ago
|
||
This should mirror docker worker behaviour, so that after a live log is no longer available, the livelog artifact is replaced by an alternative redirect artifact which points to the underlying log file. The only difference is that we have one livelog per command in the Generic Worker, rather than just a single live log per task.
Updated•8 years ago
|
Attachment #8759785 -
Flags: review?(garndt) → review+
Comment 8•8 years ago
|
||
Commits pushed to master at https://github.com/taskcluster/generic-worker https://github.com/taskcluster/generic-worker/commit/7e8d86367fe35294267c47996eb14064b481c275 Bug 1277568: redirect livelog artifact to underlying log when command completes and livelog is no longer available https://github.com/taskcluster/generic-worker/commit/29808f82315d3a831565ce3cb1c18ab1e9175193 Merge pull request #8 from taskcluster/bug1277568 Bug 1277568: redirect livelog artifact to underlying log when task command completes
Reporter | ||
Comment 9•8 years ago
|
||
I believe it may be worse now: http://queue.taskcluster.net/v1/task/ZQxYgHVQQDiUx-AuL4RTvA/artifacts/public/logs/command_000000.log.live ERROR: Exceeded 30 redirects.
Reporter | ||
Comment 10•8 years ago
|
||
I think it is perfectly acceptable to set the "expires" property, rather than redirect.
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to Kyle Lahnakoski [:ekyle] from comment #9) > I believe it may be worse now: > > http://queue.taskcluster.net/v1/task/ZQxYgHVQQDiUx-AuL4RTvA/artifacts/public/ > logs/command_000000.log.live > > ERROR: Exceeded 30 redirects. Hi Kyle, That livelog redirect was from a integration test run before this bug had been resolved. It shouldn't redirect like that now. I'll be landing a new release today and we can confirm it functions the same as docker worker. Agreed, the redirect you've highlighted is a bad one, but the code that generated it has been fixed.
Assignee | ||
Comment 12•8 years ago
|
||
New AMIs have been created for us-west-1, us-west-2, us-east-1 and worker types win2012r2 and ttaubert-win2012r2 have been updated. This change is now live. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dc87f618c6f440745c65fdaa781fee1fbceb83d as an example.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 13•8 years ago
|
||
I think we briefly discussed this on IRC. But it might be a thing for London too.
Flags: needinfo?(jopsen)
Updated•6 years ago
|
Product: TaskCluster → Firefox Build System
Assignee | ||
Comment 14•6 years ago
|
||
Released in https://github.com/taskcluster/generic-worker/releases/tag/v2.0.0alpha44
You need to log in
before you can comment on or make changes to this bug.
Description
•