Closed Bug 1279019 Opened 8 years ago Closed 8 years ago

Generic worker should log metadata about itself and the worker type it is running on

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: pmoore)

References

Details

Attachments

(1 file)

When we land the windows taskcluster builds in bug 1244750, people will want to know what the windows build environment/OS looks like.

We can make this much more transparent, by including the userdata in-tree that was used to create the windows image for the worker.

Ideally we would include
1) a README with information
2) a subdirectory for the different worker types
3) for each worker type, the userdata, possibly a skeleton of the worker type definition (although would quickly bit rot - but useful so people know how it was set up, with secrets stripped out, of course), and possibly the json schema for the task payload or perhaps easier to digest might be an example task payload

What we want to ensure is that when this stuff lands in-tree and is active and running on treeherder, that people have access to the information about what is running on these workers and what task payload the workers expect.

At some point later it would be nice to automate the process of building worker types from data checked into the tree, much like we do with docker images for docker worker.
Depends on: 1279260
pmoore: where in tree do you envisage this living? Can you suggest a path? It seems that adding it to taskcluster/ci/legacy pollutes the intention of that path and I really haven't a clue where else it might live.
Flags: needinfo?(pmoore)
Dustin,

What are your thoughts on this? I'd like at least a README somewhere that people can find that directs them somewhere e.g. via a link that gives them information about how the windows worker types were set up, and what is on them. I too am not sure where this best lives.

We have /testing/docker - I could imagine having /testing/windows - although "testing" is probably the wrong home for "docker" - it should probably be /taskcluster/docker instead of /testing/docker.

Another option is in the yaml file where we specify the worker type (win2012) we could have a yaml comment with a link to a repository that houses the json file representing the Desired State Config for the worker type.

Objectives are:

1) make it transparent for developers to see what is on the Windows machines where they are running their jobs
2) think about how we can make the DSC hackable in future, and where a good place to place it would be now to achieve that without too much reorganisation in future (although maybe docker on windows or running vms in packet.net will reach the finish line first and this will become a non-requirement)
Flags: needinfo?(pmoore) → needinfo?(dustin)
I brought this up at the taskcluster meeting today, and Dustin had the brilliant idea of putting a few lines in the log file of the generic worker, to give a link to more information about the worker type.

This seems like a brilliant idea, and should be done in the generic worker. This way, when it creates a log file, it will first output some metadata about the worker and the worker type, and then the rest of the regular log.

We could even put it as a configuration setting in the worker type, so you could define in the worker type definition a reference URL where the worker type source is, which could be e.g. a URL to the json DSC file.

This is kinda neat, as it should always be up-to-date, and any time the DSC config changes and a new worker type is generated, the URL can also be updated, so the link and info in the log file should always be accurate and up-to-date!
I'm going to remove the block on bug 1244750, as this generic worker change isn't critical for releasing the windows builds on taskcluster.
No longer blocks: 1244750
Component: Platform and Services → Generic-Worker
Summary: Put AMI creation scripts for windows worker types in tree, even if we are not automatically building AMIs from these in-tree locations yet → Generic worker should log metadata about itself and the worker type it is running on
Assignee: nobody → pmoore
Flags: needinfo?(dustin)
Hey Greg,

This adds an extra artifact with some worker type metadata to tasks that the generic worker runs. The idea, is this metadata can point to e.g. https://raw.githubusercontent.com/taskcluster/generic-worker/2d2ad3000787f2c893299e693ea3f59287127f5c/worker_types/win2012r2/userdata so that users can see how the machine was set up.

Originally I thought about adding it to the first log file (there are multiple - one per command in the generic worker) but then I thought it is cleanest to stick it in a predictable known location, such that tools could easily inspect this artifact across different tasks, etc. So I put it instead in public/logs/worker_type_metadata.json as it is kind of a structured log about the worker type setup - maybe, kinda.

Thanks!
Attachment #8766277 - Flags: review?(garndt)
I recall Jonas having some opinions about this..
Flags: needinfo?(jopsen)
Left a few comments on the PR.
Comment on attachment 8766277 [details] [review]
Github Pull Request for generic-worker

I like it!
Attachment #8766277 - Flags: review?(garndt) → review+
Commits pushed to master at https://github.com/taskcluster/generic-worker

https://github.com/taskcluster/generic-worker/commit/6362a15e4a38012d359a19ce32b76d94f5e6cf4b
Bug 1279019: create public/logs/worker_type_metadata.json artifact in each task based on property workerTypeMetadata in the worker config

https://github.com/taskcluster/generic-worker/commit/866f0a297e1d87365b82603052750ace5bc7f4a2
Merge pull request #9 from taskcluster/bug1279019

Bug 1279019: add artifact public/logs/worker_type_metadata.json to each task
Released in https://github.com/taskcluster/generic-worker/releases/tag/v2.1.0

See NSS examples: https://treeherder.mozilla.org/#/jobs?repo=nss&revision=1e3e144b7e884a296bc5c34dc86490589b180237&filter-searchStr=win

In a future release, we should bundle up all logs plus this metadata into a single log, like in docker worker. It might be worth demarcating output from each command in "buildbot step"[1] format, to ease treeherder integration, or we might be better off with a different syntax, or no demarcation at all (just concatenation).

Advantages:
1) fewer s3 objects => cost benefit
2) one rather than multiple urls to curl/monitor when watching a task, so easier on devs tailing logs
3) better integration with taskcluster tools livelog inspector
4) more consistent with other workers

Disadvantages:
1) harder to see which step failed

1) https://github.com/taskcluster/buildbot-step
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(jopsen)
Resolution: --- → FIXED
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: