Closed Bug 1452095 Opened 6 years ago Closed 6 years ago

Upgrade mac taskcluster workers to generic-worker 10.8.4

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: dragrom)

References

Details

Attachments

(2 files, 8 obsolete files)

We've upgraded all the Windows AWS workers to generic-worker 10.7.8 in bug 1399401, and will be upgrading the Windows hardware workers to generic-worker 10.7.8 in bug 1443589. This bug is to bring generic-worker on our gecko mac testers up-to-date too.
Assignee: infra → relops
Component: Infrastructure: Puppet → RelOps
QA Contact: cshields → klibby
Attached patch bug1452095_puppet_v1.patch (obsolete) — Splinter Review
Note, you'll need to download:

  * https://github.com/taskcluster/generic-worker/releases/download/v10.7.8/generic-worker-darwin-amd64

to:

  * /data/repos/EXEs/generic-worker-v10.7.8-darwin-amd64

on the distinguished puppet master, before this can be rolled out.
Assignee: relops → pmoore
Status: NEW → ASSIGNED
Attachment #8965761 - Flags: review?(jlorenzo)
Note - RelOps may wish to do some smoke testing of the release before going live, e.g. rolling out to a staging pool and running some try pushes. I'm not sure how this is handled - but this is the patch that would actually do the production upgrade.

I'll defer to Kendall to decide how this needs to be handled.
Flags: needinfo?(klibby)
Thanks, Pete! Dragos has done this a bunch in the past for similar things, so I'll let him tackle it next week when he's back from the Romanian Easter holidays.
Flags: needinfo?(klibby) → needinfo?(dcrisan)
Comment on attachment 8965761 [details] [diff] [review]
bug1452095_puppet_v1.patch

Review of attachment 8965761 [details] [diff] [review]:
-----------------------------------------------------------------

This version bump looks good to me. I just deployed generic-worker-darwin-amd64 onto /data/repos/EXEs/ of releng-puppet2.srv.releng.scl3.mozilla.com. This will be replicated over the next hour.
Attachment #8965761 - Flags: review?(jlorenzo) → review+
I'll continue the work on this bug.
Flags: needinfo?(dcrisan)
Assignee: pmoore → dcrisan
Thanks guys!
Note, it will be worthwhile running

  generic-worker --help

against the old deployed version and the new version and diffing the output between them to see what config properties have been added or changed etc.

In particular a chain of trust key may be needed and taskcluster-proxy binary should be installed and the name taskcluster should resolve to loopback interface.

Any questions, please needinfo me.

Thanks!
[root@t-yosemite-r7-380.test.releng.mdc1.mozilla.com ~]# diff generic-worker_10.2.3 generic-worker_10.7.8 
85a86
>           availabilityZone                  The EC2 availability zone of the worker.
127a129,130
>           instanceID                        The EC2 instance ID of the worker.
>           instanceType                      The EC2 instance Type of the worker.
132a136,137
>           livelogGETPort                    Port number for livelog HTTP GET requests.
>                                             [default: 60023]
137,138d141
<           livelogGETPort                    Port number for livelog HTTP GET requests.
<                                             [default: 60023]
140a144,146
>           privateIP                         The private IP of the worker, used by chain of trust.
>           provisionerBaseUrl                The base URL for API calls to the provisioner in
>                                             order to determine if there is a new deploymentId.
143a150
>           region                            The EC2 region of the worker.
151,154c158,165
<                                             user, each time a task user is created. This is a
<                                             way to provide generic user initialisation logic
<                                             that should apply to all generated users (and thus
<                                             all tasks).
---
>                                             user, after the user has been created, the machine
>                                             has rebooted and the user has logged in, but before
>                                             a task is run as that user. This is a way to
>                                             provide generic user initialisation logic that
>                                             should apply to all generated users (and thus all
>                                             tasks) and be run as the task user itself. This
>                                             option does *not* support running a command as
>                                             Administrator.
166a178,182
>           shutdownMachineOnIdle             If true, when the worker is deemed to have been
>                                             idle for enough time (see idleTimeoutSecs) the
>                                             worker will issue an OS shutdown command. If false,
>                                             the worker process will simply terminate, but the
>                                             machine will not be shut down. [default: false]
174,178d189
<           shutdownMachineOnIdle             If true, when the worker is deemed to have been
<                                             idle for enough time (see idleTimeoutSecs) the
<                                             worker will issue an OS shutdown command. If false,
<                                             the worker process will simply terminate, but the
<                                             machine will not be shut down. [default: false]
182a194,198
>           taskclusterProxyExecutable        Filepath of taskcluster-proxy executable to use; see
>                                             https://github.com/taskcluster/taskcluster-proxy
>                                             [default: taskcluster-proxy]
>           taskclusterProxyPort              Port number for taskcluster-proxy HTTP requests.
>                                             [default: 80]
219a236,238
>     71     The worker was terminated via an interrupt signal (e.g. Ctrl-C pressed).
>     72     The worker is running on spot infrastructure in AWS EC2 and has been served a
>            spot termination notice, and therefore has shut down.
I think the only ones that you will be affected by are taskclusterProxyPort and taskclusterProxyExecutable. They both have reasonable default values. I don't anticipate that tests will try to use the taskcluster proxy, but in case they do, it might be worth:

1) installing taskcluster-proxy binary to the same directory that livelog binary lives
2) updating dns on the machine so that host "taskcluster" points to 127.0.0.1
3) run a test task the uses taskcluster-proxy feature (see https://docs.taskcluster.net/reference/workers/generic-worker/docs/features#feature-taskclusterproxy)
The taskclusterProxyExecutable and taskclusterProxyPort config settings can be set in: https://hg.mozilla.org/build/puppet/raw-file/production/modules/generic_worker/templates/generic-worker.config.erb (if not set, their documented default values will be used).
On my local mac, I added this to the /etc/hosts file:

127.0.0.1	taskcluster

My generic-worker config settings look like:

  "taskclusterProxyExecutable": "taskcluster-proxy",
  "taskclusterProxyPort": 8080,

And the taskcluster-proxy binary is located in the same directory as the generic-worker binary.

I'm guessing you can use port 80 in production, rather than 8080 - that is just what I used on my local mac.

Let me know if you hit any issues!
Let's use the default values for taskclusterProxyExecutable and taskclusterProxyPort and see if the test will fail or not

Using t-yosemite-r7-380.test.releng.mdc1.mozilla.com to test new generic-worker version

The host date/time:26 Apr 2018 07:46:05 PDT
We're getting lots of papertrail alerts for t-yosemite-r7-380.test.releng.mdc1.mozilla.com

> open /Users/cltbld/generic-worker.openpgp.key: no such file or directory

Can you create a gpg key for that worker?

You can do that with, e.g.

  $ generic-worker new-openpgp-keypair --file /Users/cltbld/generic-worker.openpgp.key

You'll probably need to do this as part of the puppet setup.
Flags: needinfo?(dcrisan)
(In reply to Pete Moore [:pmoore][:pete] from comment #16)
> We're getting lots of papertrail alerts for
> t-yosemite-r7-380.test.releng.mdc1.mozilla.com

Where "a lot" == > 2,500 since Friday.
I generated generic-worker.openpgp.key. looking to other hosts that use generic-worker 10.2.3, I saw the generic-worker.openpgp.key is missing from /Users/cltbld/ directory. This file is not used in the previous version or is generated when the job is running?
Flags: needinfo?(dcrisan) → needinfo?(pmoore)
(In reply to Dragos Crisan [:dragrom] from comment #18)
> I generated generic-worker.openpgp.key. looking to other hosts that use
> generic-worker 10.2.3, I saw the generic-worker.openpgp.key is missing from
> /Users/cltbld/ directory. This file is not used in the previous version or
> is generated when the job is running?

The key file has always been required (see `generic-worker --help`), but the up-front check for it was only added in bug 1358545 which was released in generic-worker v10.6.0.

Therefore probably the worker config was bad before, it just didn't get noticed, because nobody ran tasks with artifact signing on it.
Flags: needinfo?(pmoore)
Dragos,

Note, you can watch the generic-worker logs of that machine here:

https://papertrailapp.com/systems/t-yosemite-r7-380.test.releng.mdc1.mozilla.com/events?q=program%3Ageneric-worker

If you see anything untoward, please let me know or raise a bug. Looking at this, I see a lot of logs are being generated for all the cached artifacts on the worker - I will see if I can clean that up in the worker. But if you see anything else that looks less than ideal, please do reach out - we're happy to make improvements. Most of the time we're not looking at mac logs, so we might not realise things like overly-verbose logging etc.

Thanks!
(In reply to Pete Moore [:pmoore][:pete] from comment #20)

> If you see anything untoward, please let me know or raise a bug. Looking at
> this, I see a lot of logs are being generated for all the cached artifacts
> on the worker - I will see if I can clean that up in the worker.

Created https://github.com/taskcluster/generic-worker/pull/86 for this.
Added an exect block to generate generic-worker signingKey
Attachment #8972805 - Flags: review?(pmoore)
Attachment #8972805 - Flags: review?(jwatkins)
Added an exect block to generate generic-worker signingKey
Attachment #8972805 - Attachment is obsolete: true
Attachment #8972805 - Flags: review?(pmoore)
Attachment #8972805 - Flags: review?(jwatkins)
Attachment #8972813 - Flags: review?(pmoore)
Attachment #8972813 - Flags: review?(jwatkins)
Comment on attachment 8972813 [details] [diff] [review]
bug1452095_generate_gpg_key.patch

Review of attachment 8972813 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me! Were you able to test it?

One consequence of this approach, is that we will have a different signing key per worker, rather than per worker type. This isn't necessarily a bad thing, since:

i) we don't publish the public key, and
ii) we don't sign artifacts on testers

If at some point we start signing artifacts on testers, we can review the key management process.

Nice work!
Attachment #8972813 - Flags: review?(pmoore)
Attachment #8972813 - Flags: review?(aki)
Attachment #8972813 - Flags: review+
It looks good! But note, the worker is taking real production jobs:

https://tools.taskcluster.net/groups/Sg5RsO44QUCE1yc1LGr-_A/tasks/HCeYwI4jQXupByF45WT-6g/details

For future rollouts, it might be best to configure a different (unique) provisionerId/workerType identity, so that you can submit test tasks without affecting production jobs.

In this case, the upgrade seems successful, so no harm is done, but if there had been a problem, the worker could have burned through a large number of production tasks in quick succession, failing them or resolving them incorrectly. As I say, I don't think this is the case this time, so just a tip for any future rollout.

Thanks for taking care of this Dragos! :-)
Flags: needinfo?(pmoore)
Comment on attachment 8972813 [details] [diff] [review]
bug1452095_generate_gpg_key.patch

I think this works, since we don't need the testers' keys to be valid.
Attachment #8972813 - Flags: review?(aki) → review+
The taskcluster-proxy binary installation, and setup of /etc/hosts is still needed before we can go live with this (comment 10, comment 11, comment 12, comment 13).

In order to test that prior to go-live, you will need to adjust provisionerId and/or workerType in order that you can submit a task to test it, that no other worker picks up. Changing those in turn will no doubt require some tweaking of roles/clients.

Ping me or someone else in my team if you get stuck.

See e.g. https://docs.taskcluster.net/reference/workers/generic-worker/docs/features#feature-taskclusterproxy and also bug 1449981 for a comparison with Windows.

To test you can submit a task like this in the task creator[1]:

> provisionerId: aws-provisioner-v1
> workerType: <the new worker type name you give this worker>
> scopes: []
> payload:
>   features: 
>     taskclusterProxy: true
>   maxRunTime: 60
>   command:
>     - - /bin/bash
>       - -vxec
>       - |
>         echo "Querying my own task definition using taskcluster proxy..."
>         wget -O- -q "http://taskcluster/queue/v1/task/${TASK_ID}"
> metadata:
>     name: taskcluster-proxy-test-macOS
>     description: 'Test taskcluster-proxy on Mac'
>     owner: dcrisan@mozilla.com
>     source: 'https://bugzilla.mozilla.org/show_bug.cgi?id=1452095#c28'


---
[1] https://tools.taskcluster.net/tasks/create
Flags: needinfo?(dcrisan)
Correction - provisionerId is of course 'releng-hardware'

So something like this, with updated timestamps (click "Update Timestamps" after copy/pasting)...

provisionerId: releng-hardware
workerType: <the new worker type name you give this worker>
created: '2000-01-01T00:00:00.000Z'
deadline: '2000-01-02T00:00:00.000Z'
scopes: []
payload:
  features:
    taskclusterProxy: true
  maxRunTime: 60
  command:
    - - /bin/bash
      - '-vxec'
      - |
        echo "Querying my own task definition using taskcluster proxy..."
        wget -O- -q "http://taskcluster/queue/v1/task/${TASK_ID}"
metadata:
  name: taskcluster-proxy-test-macOS
  description: Test taskcluster-proxy on Mac
  owner: dcrisan@mozilla.com
  source: 'https://bugzilla.mozilla.org/show_bug.cgi?id=1452095#c29'
I'll adjust the code to install and configure taskcluster-proxy.
Flags: needinfo?(dcrisan)
- Installed tasckcluster-proxy:
[root@t-yosemite-r7-380.test.releng.mdc1.mozilla.com ~]# taskcluster-proxy
2018/05/07 03:19:34 Version: Taskcluster proxy 4.1.0 (unknown git revision)
2018/05/07 03:19:34 Listening on: :8080
2018/05/07 03:19:34 Client ID must be passed via environment variable TASKCLUSTER_CLIENT_ID or command line option --client-id

- created test-generic-worker worket-type
- run this job: 

provisionerId: releng-hardware
workerType: test-generic-worker
created: '2018-05-07T10:20:46.280Z'
deadline: '2018-05-08T10:20:46.280Z'
scopes: []
payload:
  features:
    taskclusterProxy: true
  maxRunTime: 60
  command:
    - - /bin/bash
      - '-vxec'
      - |
        echo "Querying my own task definition using taskcluster proxy..."

        /usr/local/bin/wget -O- -q "http://queue.taskcluster.net/v1/task/${TASK_ID}"
metadata:
  name: taskcluster-proxy-test-macOS
  description: Test taskcluster-proxy on Mac
  owner: dcrisan@mozilla.com
  source: 'https://bugzilla.mozilla.org/show_bug.cgi?id=1452095#c29'

- https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/test-generic-worker/workers/mdc1/t-yosemite-r7-380

:pmoore Can you run more jobs on workerType: test-generic-worker and see if the jobs fail?
Flags: needinfo?(pmoore)
- changed the generic-worker version
- added version tag for tasckcluster-proxy
- Added an exect block to generate generic-worker signingKey
- changed generic-worker config file by add information for tasckcluster-proxy
- installed tasck-cluster proxy on /usr/bin location
Attachment #8973639 - Flags: review?(pmoore)
Attachment #8973639 - Flags: review?(jwatkins)
Attachment #8972813 - Flags: review?(jwatkins)
(In reply to Dragos Crisan [:dragrom] from comment #31)

> 2018/05/07 03:19:34 Version: Taskcluster proxy 4.1.0 (unknown git revision)

This looks like a bug in the travis script - the git revision should be known. Looking at https://travis-ci.org/taskcluster/taskcluster-proxy/jobs/349954189#L493 I see that the default go install step ran before the alternative version that sets the git revision immediately afterwards: https://travis-ci.org/taskcluster/taskcluster-proxy/jobs/349954189#L524

The reason for this seems to be that the travis-ci install target is not overwritten, so I'm attempting to fix this in https://github.com/taskcluster/taskcluster-proxy/pull/35.
Comment on attachment 8973639 [details] [diff] [review]
Bug_1452095_Upgrade_mac_generic_worker.patch

Review of attachment 8973639 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/generic_worker/templates/generic-worker.config.erb
@@ +28,5 @@
>    "sentryProject": "generic-worker",
>    "shutdownMachineOnIdle": false,
>    "shutdownMachineOnInternalError": false,
> +  "taskclusterProxyExecutable": "taskcluster-proxy",
> +  "taskclusterProxyPort": 8080,

Is it possible to run on port 80 to be consistent with docker-worker?
Commits pushed to master at https://github.com/taskcluster/generic-worker

https://github.com/taskcluster/generic-worker/commit/fd23206f12d3f52432cc23d8b6f95596703118f5
Bug 1452095 - don't log every file in worker cache on worker startup

https://github.com/taskcluster/generic-worker/commit/6967cd9f932b37b53b556e6125468323cbe6a2ed
Merge pull request #86 from taskcluster/bug1452095

Bug 1452095 - don't log every file in worker cache on worker startup
Hi Dragos,

This looks good, apart from the port being 8080, everything else is perfect!

[taskcluster 2018-05-07T12:34:37.566Z] Worker Type (test-generic-worker) settings:
[taskcluster 2018-05-07T12:34:37.566Z]   {
[taskcluster 2018-05-07T12:34:37.566Z]     "config": {
[taskcluster 2018-05-07T12:34:37.566Z]       "deploymentId": "",
[taskcluster 2018-05-07T12:34:37.566Z]       "runTasksAsCurrentUser": true
[taskcluster 2018-05-07T12:34:37.566Z]     },
[taskcluster 2018-05-07T12:34:37.566Z]     "generic-worker": {
[taskcluster 2018-05-07T12:34:37.566Z]       "go-arch": "amd64",
[taskcluster 2018-05-07T12:34:37.566Z]       "go-os": "darwin",
[taskcluster 2018-05-07T12:34:37.566Z]       "go-version": "go1.10",
[taskcluster 2018-05-07T12:34:37.566Z]       "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.7.8",
[taskcluster 2018-05-07T12:34:37.566Z]       "revision": "49698e74d33e2f45f3dc4e95f118fce2d85d0a37",
[taskcluster 2018-05-07T12:34:37.566Z]       "source": "https://github.com/taskcluster/generic-worker/commits/49698e74d33e2f45f3dc4e95f118fce2d85d0a37",
[taskcluster 2018-05-07T12:34:37.566Z]       "version": "10.7.8"
[taskcluster 2018-05-07T12:34:37.566Z]     },
[taskcluster 2018-05-07T12:34:37.566Z]     "machine-setup": {
[taskcluster 2018-05-07T12:34:37.566Z]       "config": "https://hg.mozilla.org/build/puppet/raw-file/production/modules/generic_worker/templates/generic-worker.config.erb",
[taskcluster 2018-05-07T12:34:37.566Z]       "docs": "https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/generic_worker"
[taskcluster 2018-05-07T12:34:37.566Z]     }
[taskcluster 2018-05-07T12:34:37.566Z]   }
[taskcluster 2018-05-07T12:34:37.566Z] Task ID: RAVr-tDzQ5qJEEzxwc5ymw
[taskcluster 2018-05-07T12:34:37.566Z] === Task Starting ===
[taskcluster 2018-05-07T12:34:38.773Z] Executing command 0: /bin/bash -vxec 'echo "Querying my own task definition using taskcluster proxy..."
[taskcluster 2018-05-07T12:34:38.773Z] curl -L "http://taskcluster:8080/queue/v1/task/${TASK_ID}"
[taskcluster 2018-05-07T12:34:38.773Z] '
echo "Querying my own task definition using taskcluster proxy..."
+ echo 'Querying my own task definition using taskcluster proxy...'
Querying my own task definition using taskcluster proxy...
curl -L "http://taskcluster:8080/queue/v1/task/${TASK_ID}"
+ curl -L http://taskcluster:8080/queue/v1/task/RAVr-tDzQ5qJEEzxwc5ymw
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   965  100   965    0     0   4115      0 --:--:-- --:--:-- --:--:--  4106
100   965  100   965    0     0   4112      0 --:--:-- --:--:-- --:--:--  4106
{
  "provisionerId": "releng-hardware",
  "workerType": "test-generic-worker",
  "schedulerId": "-",
  "taskGroupId": "RAVr-tDzQ5qJEEzxwc5ymw",
  "dependencies": [],
  "requires": "all-completed",
  "routes": [],
  "priority": "lowest",
  "retries": 5,
  "created": "2018-05-07T12:27:43.940Z",
  "deadline": "2018-05-08T12:27:43.940Z",
  "expires": "2019-05-08T12:27:43.940Z",
  "scopes": [],
  "payload": {
    "features": {
      "taskclusterProxy": true
    },
    "maxRunTime": 60,
    "command": [
      [
        "/bin/bash",
        "-vxec",
        "echo \"Querying my own task definition using taskcluster proxy...\"\ncurl -L \"http://taskcluster:8080/queue/v1/task/${TASK_ID}\"\n"
      ]
    ]
  },
  "metadata": {
    "name": "taskcluster-proxy-test-macOS",
    "description": "Test taskcluster-proxy on Mac",
    "owner": "dcrisan@mozilla.com",
    "source": "https://bugzilla.mozilla.org/show_bug.cgi?id=1452095#c29"
  },
  "tags": {},
  "extra": {}
}[taskcluster 2018-05-07T12:34:39.020Z] Exit Code: 0
[taskcluster 2018-05-07T12:34:39.020Z] === Task Finished ===
[taskcluster 2018-05-07T12:34:39.020Z] Task Duration: 247.713373ms
Flags: needinfo?(pmoore)
Comment on attachment 8973639 [details] [diff] [review]
Bug_1452095_Upgrade_mac_generic_worker.patch

Review of attachment 8973639 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/packages/manifests/mozilla/generic_worker.pp
@@ +8,5 @@
>          'packages::mozilla::generic_worker::end': ;
>      }
>  
> +    $tag = 'v10.7.8'
> +    $proxy_tag = 'v4.1.0'

I've made a new release of taskcluster-proxy (v4.1.1) that fixes the issue with the git revision not being stored in the release. Can we bump this to v4.1.1? Sorry for the extra work! This will require downloading it on the distinguished master again, of course.
(In reply to Pete Moore [:pmoore][:pete] from comment #38)
> I've made a new release of taskcluster-proxy (v4.1.1) that fixes the issue
> with the git revision not being stored in the release. Can we bump this to
> v4.1.1? Sorry for the extra work! This will require downloading it on the
> distinguished master again, of course.
Sure, I'll download the new tasckcluster-proxy and change the version. Also, in this patch I'll change the proxy port to 80.
[root@t-yosemite-r7-380.test.releng.mdc1.mozilla.com ~]# taskcluster-proxy 
2018/05/07 06:48:52 Version: Taskcluster proxy 4.1.1 (git revision 10806818a42605e3af59895c1f84d7f9a4bac8d0)
2018/05/07 06:48:52 Listening on: :8080
2018/05/07 06:48:52 Client ID must be passed via environment variable TASKCLUSTER_CLIENT_ID or command line option --client-id

provisionerId: releng-hardware
workerType: test-generic-worker
created: '2018-05-07T14:05:21.736Z'
deadline: '2018-05-08T14:05:21.736Z'
scopes: []
payload:
  features:
    taskclusterProxy: true
  maxRunTime: 60
  command:
    - - /bin/bash
      - '-vxec'
      - |
        echo "Querying my own task definition using taskcluster proxy..."
        curl -L "http://taskcluster:80/queue/v1/task/${TASK_ID}"
metadata:
  name: taskcluster-proxy-test-macOS
  description: Test taskcluster-proxy on Mac
  owner: dcrisan@mozilla.com
  source: 'https://bugzilla.mozilla.org/show_bug.cgi?id=1452095#c29'

[taskcluster 2018-05-07T14:04:29.812Z] Worker Type (test-generic-worker) settings:
[taskcluster 2018-05-07T14:04:29.812Z]   {
[taskcluster 2018-05-07T14:04:29.812Z]     "config": {
[taskcluster 2018-05-07T14:04:29.812Z]       "deploymentId": "",
[taskcluster 2018-05-07T14:04:29.812Z]       "runTasksAsCurrentUser": true
[taskcluster 2018-05-07T14:04:29.812Z]     },
[taskcluster 2018-05-07T14:04:29.812Z]     "generic-worker": {
[taskcluster 2018-05-07T14:04:29.812Z]       "go-arch": "amd64",
[taskcluster 2018-05-07T14:04:29.812Z]       "go-os": "darwin",
[taskcluster 2018-05-07T14:04:29.812Z]       "go-version": "go1.10",
[taskcluster 2018-05-07T14:04:29.812Z]       "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.7.8",
[taskcluster 2018-05-07T14:04:29.812Z]       "revision": "49698e74d33e2f45f3dc4e95f118fce2d85d0a37",
[taskcluster 2018-05-07T14:04:29.812Z]       "source": "https://github.com/taskcluster/generic-worker/commits/49698e74d33e2f45f3dc4e95f118fce2d85d0a37",
[taskcluster 2018-05-07T14:04:29.812Z]       "version": "10.7.8"
[taskcluster 2018-05-07T14:04:29.812Z]     },
[taskcluster 2018-05-07T14:04:29.812Z]     "machine-setup": {
[taskcluster 2018-05-07T14:04:29.812Z]       "config": "https://hg.mozilla.org/build/puppet/raw-file/production/modules/generic_worker/templates/generic-worker.config.erb",
[taskcluster 2018-05-07T14:04:29.812Z]       "docs": "https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/generic_worker"
[taskcluster 2018-05-07T14:04:29.812Z]     }
[taskcluster 2018-05-07T14:04:29.812Z]   }
[taskcluster 2018-05-07T14:04:29.812Z] Task ID: VJJNLd9CSo6zB2seUWJxWQ
[taskcluster 2018-05-07T14:04:29.812Z] === Task Starting ===
[taskcluster 2018-05-07T14:04:30.308Z] Executing command 0: /bin/bash -vxec 'echo "Querying my own task definition using taskcluster proxy..."
[taskcluster 2018-05-07T14:04:30.308Z] curl -L "http://taskcluster:80/queue/v1/task/${TASK_ID}"
[taskcluster 2018-05-07T14:04:30.308Z] '
echo "Querying my own task definition using taskcluster proxy..."
+ echo 'Querying my own task definition using taskcluster proxy...'
Querying my own task definition using taskcluster proxy...
curl -L "http://taskcluster:80/queue/v1/task/${TASK_ID}"
+ curl -L http://taskcluster:80/queue/v1/task/VJJNLd9CSo6zB2seUWJxWQ
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   234  100   234    0     0  20434      0 --:--:-- --:--:-- --:--:-- 21272
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /queue/v1/task/VJJNLd9CSo6zB2seUWJxWQ was not found on this server.</p>
</body></html>
[taskcluster 2018-05-07T14:04:30.334Z] Exit Code: 0
[taskcluster 2018-05-07T14:04:30.334Z] === Task Finished ===
[taskcluster 2018-05-07T14:04:30.334Z] Task Duration: 25.660764ms
- changed the generic-worker version
- added version tag for tasckcluster-proxy
- Added an exect block to generate generic-worker signingKey
- changed generic-worker config file by add information for tasckcluster-proxy
- installed tasck-cluster proxy on /usr/bin location
Attachment #8965761 - Attachment is obsolete: true
Attachment #8972813 - Attachment is obsolete: true
Attachment #8973639 - Attachment is obsolete: true
Attachment #8973639 - Flags: review?(pmoore)
Attachment #8973639 - Flags: review?(jwatkins)
Attachment #8973681 - Flags: review?(pmoore)
Attachment #8973681 - Flags: review?(jwatkins)
Comment on attachment 8973681 [details] [diff] [review]
Bug_1452095_Upgrade_mac_generic_worker.patch

Review of attachment 8973681 [details] [diff] [review]:
-----------------------------------------------------------------

I think now that we have a different worker type name, we should be able to make a try push against that worker type, to check that everything is ok.

For worker type naming consistency, it might be better to call the workerType "gecko-t-osx-1010-beta" instead of "test-generic-worker", to be consistent with the names we use on Windows for generic-worker:

  * https://github.com/mozilla-releng/OpenCloudConfig/tree/master/userdata/Manifest

We add a -beta suffix to the production workerType name, or simply -b if we don't have enough characters available (workerType names are limited to 22 chars iirc).

I think it would be nice to leave this one worker with workerType gecko-t-osx-1010-beta indefinitely, so that it effectively becomes a staging workerType, and we can make try pushes against it when we want to test changes we'd like to roll out to production. Probably one worker is enough for the staging pool, and then wouldn't encroach too much on production, only taking one worker away.

If you are happy to change the worker type name, I'm happy to make a full try push that runs all the macOS test tasks, and then we'll get a good feeling about if the changes are ok or not.

Many thanks for all your work on this, I realise it is taking quite a while, but I think the benefit is that in future, testing worker upgrades will be much easier.

Thanks,
Pete

::: modules/packages/manifests/mozilla/generic_worker.pp
@@ +7,5 @@
>          'packages::mozilla::generic_worker::begin': ;
>          'packages::mozilla::generic_worker::end': ;
>      }
>  
> +    $tag = 'v10.7.8'

This is no longer the latest release, we're up to 10.7.12 currently, with new bug fixes - I think it might be better to go for 10.7.12.

@@ +26,5 @@
>                      group  => wheel,
>              }
> +            # install taskcluster proxy, Bug 1452095
> +            file {
> +                '/usr/bin/taskcluster-proxy':

nit: generic-worker is installed to /usr/local/bin, but taskcluster-proxy is installed to /usr/bin - is it intentional that they are installed to different directories? If both are in the path, I guess no problem, but perhaps a little confusing. /usr/local/bin feels like a better place, given the choice.
(In reply to Pete Moore [:pmoore][:pete] from comment #42)

> I think now that we have a different worker type name, we should be able to
> make a try push against that worker type, to check that everything is ok.
> 
> For worker type naming consistency, it might be better to call the
> workerType "gecko-t-osx-1010-beta" instead of "test-generic-worker", to be
> consistent with the names we use on Windows for generic-worker:
> 
>   * https://github.com/mozilla-releng/OpenCloudConfig/tree/master/userdata/Manifest
> 
> We add a -beta suffix to the production workerType name, or simply -b if we
> don't have enough characters available (workerType names are limited to 22
> chars iirc).

Note, when we do this, we should update https://bugzilla.mozilla.org/show_bug.cgi?id=1400012#c12 at the same time. This will allow us to easily make a try push that runs across the entire staging pool for macOS and Windows.
(In reply to Pete Moore [:pmoore][:pete] from comment #42)
>
> I think it would be nice to leave this one worker with workerType
> gecko-t-osx-1010-beta indefinitely, so that it effectively becomes a staging
> workerType, and we can make try pushes against it when we want to test
> changes we'd like to roll out to production. Probably one worker is enough
> for the staging pool, and then wouldn't encroach too much on production,
> only taking one worker away.

I'd take that a step further: take one host each from MDC1 and MDC2 and set them up in puppet as a separate group, so that we can easily make changes to them and so we have redundancy. Bonus points for doing the same with the linux64 moonshots (after they're all built and verified)!
Comment on attachment 8973681 [details] [diff] [review]
Bug_1452095_Upgrade_mac_generic_worker.patch

Review of attachment 8973681 [details] [diff] [review]:
-----------------------------------------------------------------

Fix issues noted inline

::: modules/packages/manifests/mozilla/generic_worker.pp
@@ +26,5 @@
>                      group  => wheel,
>              }
> +            # install taskcluster proxy, Bug 1452095
> +            file {
> +                '/usr/bin/taskcluster-proxy':

Pete is correct.  The proxy bin should share the same /usr/local/bin location as the generic-worker.
Attachment #8973681 - Flags: review?(jwatkins) → review+
- installed generic-worker v10.7.12:
[root@t-yosemite-r7-380.test.releng.mdc1.mozilla.com ~]# generic-worker --version
generic-worker 10.7.12 [ revision: https://github.com/taskcluster/generic-worker/commits/5c96402034cfa573c30e773f1cf8b5469dad6b4c ]
- changed workerType to  "workerType": "gecko-t-osx-1010-beta",
- installed taskcluster-proxy to /usr/local/bin/taskcluster-proxy
[root@t-yosemite-r7-380.test.releng.mdc1.mozilla.com ~]# which taskcluster-proxy
/usr/local/bin/taskcluster-proxy
- deleted generic-worker.openpgp.key and run puppet again, now generic-worker.openpgp.key was created as cltbld not as root (don't know why was changed from cltbld to root)

pmoore: can you please send a try push to gecko-t-osx-1010-beta?
Flags: needinfo?(pmoore)
(In reply to Dragos Crisan [:dragrom] from comment #46)

> pmoore: can you please send a try push to gecko-t-osx-1010-beta?

I have to pop out now, but I've prepared a patch you can run in your own try push:

  * https://bug1400012.bmoattachments.org/attachment.cgi?id=8973995

Hope it helps, and let me know if it doesn't, so we can fix it in bug 1400012. Good luck!
Flags: needinfo?(pmoore)
I run a simple task and I received the error:
[taskcluster 2018-05-08T14:31:13.554Z] Worker Type (gecko-t-osx-1010-beta) settings:
[taskcluster 2018-05-08T14:31:13.554Z]   {
[taskcluster 2018-05-08T14:31:13.554Z]     "config": {
[taskcluster 2018-05-08T14:31:13.554Z]       "deploymentId": "",
[taskcluster 2018-05-08T14:31:13.554Z]       "runTasksAsCurrentUser": true
[taskcluster 2018-05-08T14:31:13.554Z]     },
[taskcluster 2018-05-08T14:31:13.554Z]     "generic-worker": {
[taskcluster 2018-05-08T14:31:13.554Z]       "go-arch": "amd64",
[taskcluster 2018-05-08T14:31:13.554Z]       "go-os": "darwin",
[taskcluster 2018-05-08T14:31:13.554Z]       "go-version": "go1.10.2",
[taskcluster 2018-05-08T14:31:13.554Z]       "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.7.12",
[taskcluster 2018-05-08T14:31:13.554Z]       "revision": "5c96402034cfa573c30e773f1cf8b5469dad6b4c",
[taskcluster 2018-05-08T14:31:13.554Z]       "source": "https://github.com/taskcluster/generic-worker/commits/5c96402034cfa573c30e773f1cf8b5469dad6b4c",
[taskcluster 2018-05-08T14:31:13.554Z]       "version": "10.7.12"
[taskcluster 2018-05-08T14:31:13.554Z]     },
[taskcluster 2018-05-08T14:31:13.554Z]     "machine-setup": {
[taskcluster 2018-05-08T14:31:13.554Z]       "config": "https://hg.mozilla.org/build/puppet/raw-file/production/modules/generic_worker/templates/generic-worker.config.erb",
[taskcluster 2018-05-08T14:31:13.554Z]       "docs": "https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/generic_worker"
[taskcluster 2018-05-08T14:31:13.554Z]     }
[taskcluster 2018-05-08T14:31:13.554Z]   }
[taskcluster 2018-05-08T14:31:13.554Z] Task ID: eFJ_LWv9QfivM6ly-dDG4Q
[taskcluster 2018-05-08T14:31:13.554Z] === Task Starting ===
[taskcluster:error] Task not successful due to following exception(s):
[taskcluster:error] Exception 1)
[taskcluster:error] Could not start taskcluster proxy: exec: "taskcluster-proxy": executable file not found in $PATH
[taskcluster:error] 

The values for PATH environment variable is: 
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

the taskcluster-proxy is installed on :
which taskcluster-proxy
/usr/local/bin/taskcluster-proxy
Run another task without proxy feature, only to see the PATH:
echo $PATH
+ echo /usr/bin:/bin:/usr/sbin:/sbin
/usr/bin:/bin:/usr/sbin:/sbin

I fixed this by adding full path for "taskclusterProxyExecutable": "/usr/local/bin/taskcluster-proxy" on /etc/generic-worker.config file

pmoore: any concern with this change?
So, I saw the following thing:

- taskcluster-proxy listen to port 8080:
taskcluster-proxy
2018/05/08 08:24:20 Version: Taskcluster proxy 4.1.1 (git revision 10806818a42605e3af59895c1f84d7f9a4bac8d0)
2018/05/08 08:24:20 Listening on: :8080

- generic-worker call taskclusterProxyPort on port 80.
the result is: 
+ curl -L http://taskcluster:80/queue/v1/task/HahrnFTYQu-omWWzd_vkJQ
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   234  100   234    0     0  37729      0 --:--:-- --:--:-- --:--:-- 39000
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /queue/v1/task/HahrnFTYQu-omWWzd_vkJQ was not found on this server.</p>
</body></html>
[taskcluster 2018-05-08T15:22:51.956Z] Exit Code: 0
[taskcluster 2018-05-08T15:22:51.956Z] === Task Finished ===
[taskcluster 2018-05-08T15:22:51.956Z] Task Duration: 19.190439ms

The request url not found:

If I changed the taskclusterProxyPort to 8080, the result was:
  "routes": [],
  "priority": "lowest",
  "retries": 5,
  "created": "2018-05-08T15:15:47.694Z",
  "deadline": "2018-05-09T15:15:47.694Z",
  "expires": "2019-05-09T15:15:47.694Z",
  "scopes": [],
  "payload": {
    "features": {
      "taskclusterProxy": true
    },
    "maxRunTime": 60,
    "command": [
      [
        "/bin/bash",
        "-vxec",
        "echo \"Querying my own task definition using taskcluster proxy...\"\ncurl -L \"http://taskcluster:8080/queue/v1/task/${TASK_ID}\"\n"
      ]
    ]
  },
  "metadata": {
    "name": "taskcluster-proxy-test-macOS",
    "description": "Test taskcluster-proxy on Mac",
    "owner": "dcrisan@mozilla.com",
    "source": "https://bugzilla.mozilla.org/show_bug.cgi?id=1452095#c29"
  },
  "tags": {},
  "extra": {}
}[taskcluster 2018-05-08T15:15:52.903Z] Exit Code: 0
[taskcluster 2018-05-08T15:15:52.903Z] === Task Finished ===
[taskcluster 2018-05-08T15:15:52.903Z] Task Duration: 271.799226ms

pmoore: It is a way to configure taskcluster-proxy to listen on port 80 instead of 8080?
Flags: needinfo?(pmoore)
:dragrom,

Run:

  taskcluster-proxy --help

or see:

  https://github.com/taskcluster/taskcluster-proxy/blob/master/README.md
Flags: needinfo?(pmoore)
(In reply to Dragos Crisan [:dragrom] from comment #49)
> Run another task without proxy feature, only to see the PATH:
> echo $PATH
> + echo /usr/bin:/bin:/usr/sbin:/sbin
> /usr/bin:/bin:/usr/sbin:/sbin
> 
> I fixed this by adding full path for "taskclusterProxyExecutable":
> "/usr/local/bin/taskcluster-proxy" on /etc/generic-worker.config file
> 
> pmoore: any concern with this change?

Well that is certainly explicit, and will do the trick.

Alternatives:

1) Add an LSEnvironment xml element to https://hg.mozilla.org/build/puppet/file/tip/modules/generic_worker/templates/generic-worker.plist.erb (see https://developer.apple.com/library/content/documentation/General/Reference/InfoPlistKeyReference/Articles/LaunchServicesKeys.html#//apple_ref/doc/uid/20001431-106825) to set full PATH for the user agent
2) Export the full path explicitly in https://hg.mozilla.org/build/puppet/file/tip/modules/generic_worker/templates/run-generic-worker.sh.erb

All three solutions would achieve the same. Jake, do you have a preference?

What I like about the way you've set it Dragos, is that it is the "closest to home" - so no matter who you invoke the worker with the given generic-worker.config file, you can be sure you'll get the right taskcluster-proxy; regardless of whether it is launched as a user agent by the system, or executed manually by a user from the command line, for example.

It also means if you want to troubleshoot why the worker isn't starting, the first place you have to look is in generic-worker.config to see the executable, and you've found your answer straight away. With the full path to the executable in there, you don't then have to search around in the launch bash script to see what it set the PATH to, only to discover it didn't override the inherited PATH, and then look in the .plist launch configuration that called the bash script, to find it isn't set there either, to then try to work out what the system default PATH is when launching launch daemons and user agents.

However, my first thought when I saw it was that it felt a bit ugly, but the more I think about it, the harder I find it to reason why it is a bad thing, so maybe I do like it! But this is why I'd like Jake's opinion, as it doesn't sit completely comfortably with me...
Flags: needinfo?(jwatkins)
Added export PATH to run-generic-worker.sh.erb file
Attachment #8974336 - Flags: review?(pmoore)
Attachment #8974336 - Flags: review?(jwatkins)
Comment on attachment 8974336 [details] [diff] [review]
Bug_1452095_Upgrade_mac_generic_worker.patch

Review of attachment 8974336 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/generic_worker/templates/generic-worker.config.erb
@@ +28,5 @@
>    "sentryProject": "generic-worker",
>    "shutdownMachineOnIdle": false,
>    "shutdownMachineOnInternalError": false,
> +  "taskclusterProxyExecutable": "taskcluster-proxy",
> +  "taskclusterProxyPort": 80,

From comment 40 you were getting:

> curl -L "http://taskcluster:80/queue/v1/task/${TASK_ID}"
> + curl -L http://taskcluster:80/queue/v1/task/VJJNLd9CSo6zB2seUWJxWQ
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> 
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
> 100   234  100   234    0     0  20434      0 --:--:-- --:--:-- --:--:-- 21272
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>404 Not Found</title>
> </head><body>
> <h1>Not Found</h1>
> <p>The requested URL /queue/v1/task/VJJNLd9CSo6zB2seUWJxWQ was not found on this server.</p>
> </body></html>

This makes me wonder if there is already a http daemon running on the mac workers on port 80 (that generated this 404 message). Have you checked that the port is free, or can you confirm if the proxy works when you set it to run on port 80?

Or maybe this 404 comes from taskcluster-proxy for some other reason - but that seems less likely.

::: modules/generic_worker/templates/run-generic-worker.sh.erb
@@ +5,5 @@
>  # (/etc/generic-worker.config) we set numberOfTasksToRun to 1 so that the worker
>  # will exit after running a single task. This script is then responsible for
>  # rebooting the machine.
>  
> +# Export PATH environment variable to be used by generic worker tasks

This isn't quite right. The PATH used by gecko macOS test tasks is set here:

https://dxr.mozilla.org/mozilla-central/rev/0cd106a2eb78aa04fd481785257e6f4f9b94707b/taskcluster/taskgraph/transforms/job/mozharness_test.py#268

This PATH affects what the generic-worker process itself will have, which it uses to locate taskcluster-proxy and livelog, e.g. here:

https://github.com/taskcluster/generic-worker/blob/v10.7.12/tcproxy/tcproxy.go#L45

Indeed if the PATH wasn't set explicitly in gecko, it may be that env vars set in this script would make it into the default task environment, but that is really a side-effect, we want to set it explicitly here to affect the PATH used to launch taskcluster-proxy, rather than the PATH to be used in a task.
(In reply to Pete Moore [:pmoore][:pete] from comment #54)
> Comment on attachment 8974336 [details] [diff] [review]
> Bug_1452095_Upgrade_mac_generic_worker.patch
> 
> Review of attachment 8974336 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: modules/generic_worker/templates/generic-worker.config.erb
> @@ +28,5 @@
> >    "sentryProject": "generic-worker",
> >    "shutdownMachineOnIdle": false,
> >    "shutdownMachineOnInternalError": false,
> > +  "taskclusterProxyExecutable": "taskcluster-proxy",
> > +  "taskclusterProxyPort": 80,
> 
> From comment 40 you were getting:
> 
> > curl -L "http://taskcluster:80/queue/v1/task/${TASK_ID}"
> > + curl -L http://taskcluster:80/queue/v1/task/VJJNLd9CSo6zB2seUWJxWQ
> >   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
> >                                  Dload  Upload   Total   Spent    Left  Speed
> > 
> >   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
> > 100   234  100   234    0     0  20434      0 --:--:-- --:--:-- --:--:-- 21272
> > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> > <html><head>
> > <title>404 Not Found</title>
> > </head><body>
> > <h1>Not Found</h1>
> > <p>The requested URL /queue/v1/task/VJJNLd9CSo6zB2seUWJxWQ was not found on this server.</p>
> > </body></html>
> 
> This makes me wonder if there is already a http daemon running on the mac
> workers on port 80 (that generated this 404 message). Have you checked that
> the port is free, or can you confirm if the proxy works when you set it to
> run on port 80?
> 
> Or maybe this 404 comes from taskcluster-proxy for some other reason - but
> that seems less likely.
> 
You are right, there is a http daemon running on the mac: 
[root@t-yosemite-r7-380.test.releng.mdc1.mozilla.com ~]# launchctl list|grep httpd
73	0	org.apache.httpd

So, maybe we need to change the proxy port to 8080?

> ::: modules/generic_worker/templates/run-generic-worker.sh.erb
> @@ +5,5 @@
> >  # (/etc/generic-worker.config) we set numberOfTasksToRun to 1 so that the worker
> >  # will exit after running a single task. This script is then responsible for
> >  # rebooting the machine.
> >  
> > +# Export PATH environment variable to be used by generic worker tasks
> 
> This isn't quite right. The PATH used by gecko macOS test tasks is set here:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> 0cd106a2eb78aa04fd481785257e6f4f9b94707b/taskcluster/taskgraph/transforms/
> job/mozharness_test.py#268
> 
> This PATH affects what the generic-worker process itself will have, which it
> uses to locate taskcluster-proxy and livelog, e.g. here:
> 
> https://github.com/taskcluster/generic-worker/blob/v10.7.12/tcproxy/tcproxy.
> go#L45
> 
> Indeed if the PATH wasn't set explicitly in gecko, it may be that env vars
> set in this script would make it into the default task environment, but that
> is really a side-effect, we want to set it explicitly here to affect the
> PATH used to launch taskcluster-proxy, rather than the PATH to be used in a
> task.
Why is apache running on a worker?
I checked other workers, and http daemon is running on them. Another example:

[root@t-yosemite-r7-0022.test.releng.scl3.mozilla.com ~]# launchctl list|grep httpd
71	0	org.apache.httpd

Does http need to run on a worker?
Flags: needinfo?(pmoore)
:dragrom, The taskcluster-proxy needs to run on port 80, as requests to taskcluster-proxy should be made with urls like the following to be consistent with docker-worker:

  http://taskcluster/<service>/v1/<path>


:catlee, any idea why Mac workers would have an apache http service running on them - a throwback from buildbot days?
Flags: needinfo?(pmoore) → needinfo?(catlee)
No, I don't know.
Flags: needinfo?(catlee)
:dividehex suggests that it might be for running talos tests, and that Wander might know? Wander, do you know why apache is running on OSX workers?
Flags: needinfo?(wcosta)
(In reply to Kendall Libby [:fubar] from comment #60)
> :dividehex suggests that it might be for running talos tests, and that
> Wander might know? Wander, do you know why apache is running on OSX workers?

No idea.
Flags: needinfo?(wcosta)
Apache ships with macOS, and runs by default, so the first question I would ask would be "what are we doing to turn off the default Apache, and is it failing to work?"
(In reply to Phil Ringnalda (:philor) from comment #62)
> Apache ships with macOS, and runs by default, so the first question I would
> ask would be "what are we doing to turn off the default Apache, and is it
> failing to work?"

The taskcluster-proxy needs to run on port 80, as requests to taskcluster-proxy should be made with urls like the following to be consistent with docker-worker:

  http://taskcluster/<service>/v1/<path>
If we need apache server to run on mac workers, I can add a check in puppet code to determine if the generic worker run on docker or on mac worker, and change the proxyPord according.

:pmoore Are you OK with this approach if we need to keep apache server running on mac workers?
Flags: needinfo?(pmoore)
Attachment #8974336 - Flags: review?(pmoore)
Attachment #8974336 - Flags: review?(jwatkins)
Removed review request until I'll know what additional changes I'll need to made for tasccluster proxy port
Attachment #8973681 - Flags: review?(pmoore)
(In reply to Kendall Libby [:fubar] from comment #60)
> :dividehex suggests that it might be for running talos tests, and that
> Wander might know? Wander, do you know why apache is running on OSX workers?

That seems likely from a cursory glance at:

  https://github.com/mozilla/build-puppet/search?utf8=%E2%9C%93&q=apache&type=

Joel, do you know if talos needs an apache web service running on port 80 to work? Background is, I wanted taskcluster-proxy to be available on port 80, but this seems to clash.
Flags: needinfo?(pmoore) → needinfo?(jmaher)
The other option, of course, is that we put taskcluster-proxy behind apache, and apache can proxy inbound requests to http://taskcluster/<path> to the taskcluster-proxy running on a different port (e.g. 8080).

This might not be such a bad thing, I suspect if talos relies on apache, it is using a similar integration, where apache acts as a forwarding service. Maybe I'm being naive and optimistic. xD
Dragos, can you paste the apache config here from a live worker (removing any secrets, of course, if there are any)?

This will help to establish if it is just a front end for some other listener(s).

Adding an additional taskcluster module in Apache might not be too much trouble in puppet - I haven't done it myself, but maybe there is native support?

Dustin, do you know much about this?
Flags: needinfo?(dustin)
Flags: needinfo?(dcrisan)
Maybe just an extra apache config file that declares a host "taskcluster" and routes traffic through to the same url on port 8080, and then we run taskcluster-proxy on port 8080 (assuming that one is free)...
Here is httpd.conf file:

cat /private/etc/apache2/httpd.conf
#
# This is the main Apache HTTP server configuration file.  It contains the
# configuration directives that give the server its instructions.
# See <URL:http://httpd.apache.org/docs/2.4/> for detailed information.
# In particular, see 
# <URL:http://httpd.apache.org/docs/2.4/mod/directives.html>
# for a discussion of each configuration directive.
#
# Do NOT simply read the instructions in here without understanding
# what they do.  They're here only as hints or reminders.  If you are unsure
# consult the online docs. You have been warned.  
#
# Configuration and logfile names: If the filenames you specify for many
# of the server's control files begin with "/" (or "drive:/" for Win32), the
# server will use that explicit path.  If the filenames do *not* begin
# with "/", the value of ServerRoot is prepended -- so "logs/access_log"
# with ServerRoot set to "/usr/local/apache2" will be interpreted by the
# server as "/usr/local/apache2/logs/access_log", whereas "/logs/access_log" 
# will be interpreted as '/logs/access_log'.

#
# ServerRoot: The top of the directory tree under which the server's
# configuration, error, and log files are kept.
#
# Do not add a slash at the end of the directory path.  If you point
# ServerRoot at a non-local disk, be sure to specify a local disk on the
# Mutex directive, if file-based mutexes are used.  If you wish to share the
# same ServerRoot for multiple httpd daemons, you will need to change at
# least PidFile.
#
ServerRoot "/usr"

#
# Mutex: Allows you to set the mutex mechanism and mutex file directory
# for individual mutexes, or change the global defaults
#
# Uncomment and change the directory if mutexes are file-based and the default
# mutex file directory is not on a local disk or is not appropriate for some
# other reason.
#
# Mutex default:/private/var/run

#
# Listen: Allows you to bind Apache to specific IP addresses and/or
# ports, instead of the default. See also the <VirtualHost>
# directive.
#
# Change this to Listen on specific IP addresses as shown below to 
# prevent Apache from glomming onto all bound IP addresses.
#
#Listen 12.34.56.78:80
Listen 80

#
# Dynamic Shared Object (DSO) Support
#
# To be able to use the functionality of a module which was built as a DSO you
# have to place corresponding `LoadModule' lines at this location so the
# directives contained in it are actually available _before_ they are used.
# Statically compiled modules (those listed by `httpd -l') do not need
# to be loaded here.
#
# Example:
# LoadModule foo_module modules/mod_foo.so
#
LoadModule authn_file_module libexec/apache2/mod_authn_file.so
#LoadModule authn_dbm_module libexec/apache2/mod_authn_dbm.so
#LoadModule authn_anon_module libexec/apache2/mod_authn_anon.so
#LoadModule authn_dbd_module libexec/apache2/mod_authn_dbd.so
#LoadModule authn_socache_module libexec/apache2/mod_authn_socache.so
LoadModule authn_core_module libexec/apache2/mod_authn_core.so
LoadModule authz_host_module libexec/apache2/mod_authz_host.so
LoadModule authz_groupfile_module libexec/apache2/mod_authz_groupfile.so
LoadModule authz_user_module libexec/apache2/mod_authz_user.so
#LoadModule authz_dbm_module libexec/apache2/mod_authz_dbm.so
#LoadModule authz_owner_module libexec/apache2/mod_authz_owner.so
#LoadModule authz_dbd_module libexec/apache2/mod_authz_dbd.so
LoadModule authz_core_module libexec/apache2/mod_authz_core.so
#LoadModule authnz_ldap_module libexec/apache2/mod_authnz_ldap.so
LoadModule access_compat_module libexec/apache2/mod_access_compat.so
LoadModule auth_basic_module libexec/apache2/mod_auth_basic.so
#LoadModule auth_form_module libexec/apache2/mod_auth_form.so
#LoadModule auth_digest_module libexec/apache2/mod_auth_digest.so
#LoadModule allowmethods_module libexec/apache2/mod_allowmethods.so
#LoadModule file_cache_module libexec/apache2/mod_file_cache.so
#LoadModule cache_module libexec/apache2/mod_cache.so
#LoadModule cache_disk_module libexec/apache2/mod_cache_disk.so
#LoadModule cache_socache_module libexec/apache2/mod_cache_socache.so
#LoadModule socache_shmcb_module libexec/apache2/mod_socache_shmcb.so
#LoadModule socache_dbm_module libexec/apache2/mod_socache_dbm.so
#LoadModule socache_memcache_module libexec/apache2/mod_socache_memcache.so
#LoadModule watchdog_module libexec/apache2/mod_watchdog.so
#LoadModule macro_module libexec/apache2/mod_macro.so
#LoadModule dbd_module libexec/apache2/mod_dbd.so
#LoadModule dumpio_module libexec/apache2/mod_dumpio.so
#LoadModule echo_module libexec/apache2/mod_echo.so
#LoadModule buffer_module libexec/apache2/mod_buffer.so
#LoadModule data_module libexec/apache2/mod_data.so
#LoadModule ratelimit_module libexec/apache2/mod_ratelimit.so
LoadModule reqtimeout_module libexec/apache2/mod_reqtimeout.so
#LoadModule ext_filter_module libexec/apache2/mod_ext_filter.so
#LoadModule request_module libexec/apache2/mod_request.so
#LoadModule include_module libexec/apache2/mod_include.so
LoadModule filter_module libexec/apache2/mod_filter.so
#LoadModule reflector_module libexec/apache2/mod_reflector.so
#LoadModule substitute_module libexec/apache2/mod_substitute.so
#LoadModule sed_module libexec/apache2/mod_sed.so
#LoadModule charset_lite_module libexec/apache2/mod_charset_lite.so
#LoadModule deflate_module libexec/apache2/mod_deflate.so
#LoadModule xml2enc_module libexec/apache2/mod_xml2enc.so
#LoadModule proxy_html_module libexec/apache2/mod_proxy_html.so
LoadModule mime_module libexec/apache2/mod_mime.so
#LoadModule ldap_module libexec/apache2/mod_ldap.so
LoadModule log_config_module libexec/apache2/mod_log_config.so
#LoadModule log_debug_module libexec/apache2/mod_log_debug.so
#LoadModule log_forensic_module libexec/apache2/mod_log_forensic.so
#LoadModule logio_module libexec/apache2/mod_logio.so
LoadModule env_module libexec/apache2/mod_env.so
#LoadModule mime_magic_module libexec/apache2/mod_mime_magic.so
#LoadModule expires_module libexec/apache2/mod_expires.so
LoadModule headers_module libexec/apache2/mod_headers.so
#LoadModule usertrack_module libexec/apache2/mod_usertrack.so
##LoadModule unique_id_module libexec/apache2/mod_unique_id.so
LoadModule setenvif_module libexec/apache2/mod_setenvif.so
LoadModule version_module libexec/apache2/mod_version.so
#LoadModule remoteip_module libexec/apache2/mod_remoteip.so
LoadModule proxy_module libexec/apache2/mod_proxy.so
LoadModule proxy_connect_module libexec/apache2/mod_proxy_connect.so
LoadModule proxy_ftp_module libexec/apache2/mod_proxy_ftp.so
LoadModule proxy_http_module libexec/apache2/mod_proxy_http.so
LoadModule proxy_fcgi_module libexec/apache2/mod_proxy_fcgi.so
LoadModule proxy_scgi_module libexec/apache2/mod_proxy_scgi.so
#LoadModule proxy_fdpass_module libexec/apache2/mod_proxy_fdpass.so
LoadModule proxy_wstunnel_module libexec/apache2/mod_proxy_wstunnel.so
LoadModule proxy_ajp_module libexec/apache2/mod_proxy_ajp.so
LoadModule proxy_balancer_module libexec/apache2/mod_proxy_balancer.so
LoadModule proxy_express_module libexec/apache2/mod_proxy_express.so
#LoadModule session_module libexec/apache2/mod_session.so
#LoadModule session_cookie_module libexec/apache2/mod_session_cookie.so
#LoadModule session_dbd_module libexec/apache2/mod_session_dbd.so
LoadModule slotmem_shm_module libexec/apache2/mod_slotmem_shm.so
#LoadModule slotmem_plain_module libexec/apache2/mod_slotmem_plain.so
#LoadModule ssl_module libexec/apache2/mod_ssl.so
#LoadModule dialup_module libexec/apache2/mod_dialup.so
LoadModule lbmethod_byrequests_module libexec/apache2/mod_lbmethod_byrequests.so
LoadModule lbmethod_bytraffic_module libexec/apache2/mod_lbmethod_bytraffic.so
LoadModule lbmethod_bybusyness_module libexec/apache2/mod_lbmethod_bybusyness.so
#LoadModule lbmethod_heartbeat_module libexec/apache2/mod_lbmethod_heartbeat.so
LoadModule unixd_module libexec/apache2/mod_unixd.so
#LoadModule heartbeat_module libexec/apache2/mod_heartbeat.so
#LoadModule heartmonitor_module libexec/apache2/mod_heartmonitor.so
#LoadModule dav_module libexec/apache2/mod_dav.so
LoadModule status_module libexec/apache2/mod_status.so
LoadModule autoindex_module libexec/apache2/mod_autoindex.so
#LoadModule asis_module libexec/apache2/mod_asis.so
#LoadModule info_module libexec/apache2/mod_info.so
#LoadModule cgi_module libexec/apache2/mod_cgi.so
#LoadModule dav_fs_module libexec/apache2/mod_dav_fs.so
#LoadModule dav_lock_module libexec/apache2/mod_dav_lock.so
#LoadModule vhost_alias_module libexec/apache2/mod_vhost_alias.so
LoadModule negotiation_module libexec/apache2/mod_negotiation.so
LoadModule dir_module libexec/apache2/mod_dir.so
#LoadModule imagemap_module libexec/apache2/mod_imagemap.so
#LoadModule actions_module libexec/apache2/mod_actions.so
#LoadModule speling_module libexec/apache2/mod_speling.so
#LoadModule userdir_module libexec/apache2/mod_userdir.so
LoadModule alias_module libexec/apache2/mod_alias.so
#LoadModule rewrite_module libexec/apache2/mod_rewrite.so
#LoadModule php5_module libexec/apache2/libphp5.so
LoadModule hfs_apple_module libexec/apache2/mod_hfs_apple.so

<IfModule unixd_module>
#
# If you wish httpd to run as a different user or group, you must run
# httpd as root initially and it will switch.  
#
# User/Group: The name (or #number) of the user/group to run httpd as.
# It is usually good practice to create a dedicated user and group for
# running httpd, as with most system services.
#
User _www
Group _www

</IfModule>

# 'Main' server configuration
#
# The directives in this section set up the values used by the 'main'
# server, which responds to any requests that aren't handled by a
# <VirtualHost> definition.  These values also provide defaults for
# any <VirtualHost> containers you may define later in the file.
#
# All of these directives may appear inside <VirtualHost> containers,
# in which case these default settings will be overridden for the
# virtual host being defined.
#

#
# ServerAdmin: Your address, where problems with the server should be
# e-mailed.  This address appears on some server-generated pages, such
# as error documents.  e.g. admin@your-domain.com
#
ServerAdmin you@example.com

#
# ServerName gives the name and port that the server uses to identify itself.
# This can often be determined automatically, but we recommend you specify
# it explicitly to prevent problems during startup.
#
# If your host doesn't have a registered DNS name, enter its IP address here.
#
#ServerName www.example.com:80

#
# Deny access to the entirety of your server's filesystem. You must
# explicitly permit access to web content directories in other 
# <Directory> blocks below.
#
<Directory />
    AllowOverride none
    Require all denied
</Directory>

#
# Note that from this point forward you must specifically allow
# particular features to be enabled - so if something's not working as
# you might expect, make sure that you have specifically enabled it
# below.
#

#
# DocumentRoot: The directory out of which you will serve your
# documents. By default, all requests are taken from this directory, but
# symbolic links and aliases may be used to point to other locations.
#
DocumentRoot "/Library/WebServer/Documents"
<Directory "/Library/WebServer/Documents">
    #
    # Possible values for the Options directive are "None", "All",
    # or any combination of:
    #   Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
    #
    # Note that "MultiViews" must be named *explicitly* --- "Options All"
    # doesn't give it to you.
    #
    # The Options directive is both complicated and important.  Please see
    # http://httpd.apache.org/docs/2.4/mod/core.html#options
    # for more information.
    #
    Options FollowSymLinks Multiviews
    MultiviewsMatch Any

    #
    # AllowOverride controls what directives may be placed in .htaccess files.
    # It can be "All", "None", or any combination of the keywords:
    #   AllowOverride FileInfo AuthConfig Limit
    #
    AllowOverride None

    #
    # Controls who can get stuff from this server.
    #
    Require all granted
</Directory>

#
# DirectoryIndex: sets the file that Apache will serve if a directory
# is requested.
#
<IfModule dir_module>
    DirectoryIndex index.html
</IfModule>

#
# The following lines prevent .htaccess and .htpasswd files from being 
# viewed by Web clients. 
#
<FilesMatch "^\.([Hh][Tt]|[Dd][Ss]_[Ss])">
    Require all denied
</FilesMatch>

#
# Apple specific filesystem protection.
#
<Files "rsrc">
    Require all denied
</Files>
<DirectoryMatch ".*\.\.namedfork">
    Require all denied
</DirectoryMatch>

#
# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <VirtualHost>
# container, error messages relating to that virtual host will be
# logged here.  If you *do* define an error logfile for a <VirtualHost>
# container, that host's errors will be logged there and not here.
#
ErrorLog "/private/var/log/apache2/error_log"

#
# LogLevel: Control the number of messages logged to the error_log.
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
#
LogLevel warn

<IfModule log_config_module>
    #
    # The following directives define some format nicknames for use with
    # a CustomLog directive (see below).
    #
    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
    LogFormat "%h %l %u %t \"%r\" %>s %b" common

    <IfModule logio_module>
      # You need to enable mod_logio.c to use %I and %O
      LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio
    </IfModule>

    #
    # The location and format of the access logfile (Common Logfile Format).
    # If you do not define any access logfiles within a <VirtualHost>
    # container, they will be logged here.  Contrariwise, if you *do*
    # define per-<VirtualHost> access logfiles, transactions will be
    # logged therein and *not* in this file.
    #
    CustomLog "/private/var/log/apache2/access_log" common

    #
    # If you prefer a logfile with access, agent, and referer information
    # (Combined Logfile Format) you can use the following directive.
    #
    #CustomLog "/private/var/log/apache2/access_log" combined
</IfModule>

<IfModule alias_module>
    #
    # Redirect: Allows you to tell clients about documents that used to 
    # exist in your server's namespace, but do not anymore. The client 
    # will make a new request for the document at its new location.
    # Example:
    # Redirect permanent /foo http://www.example.com/bar

    #
    # Alias: Maps web paths into filesystem paths and is used to
    # access content that does not live under the DocumentRoot.
    # Example:
    # Alias /webpath /full/filesystem/path
    #
    # If you include a trailing / on /webpath then the server will
    # require it to be present in the URL.  You will also likely
    # need to provide a <Directory> section to allow access to
    # the filesystem path.

    #
    # ScriptAlias: This controls which directories contain server scripts. 
    # ScriptAliases are essentially the same as Aliases, except that
    # documents in the target directory are treated as applications and
    # run by the server when requested rather than as documents sent to the
    # client.  The same rules about trailing "/" apply to ScriptAlias
    # directives as to Alias.
    #
    ScriptAliasMatch ^/cgi-bin/((?!(?i:webobjects)).*$) "/Library/WebServer/CGI-Executables/$1"

</IfModule>

<IfModule cgid_module>
    #
    # ScriptSock: On threaded servers, designate the path to the UNIX
    # socket used to communicate with the CGI daemon of mod_cgid.
    #
    #Scriptsock cgisock
</IfModule>

#
# "/Library/WebServer/CGI-Executables" should be changed to whatever your ScriptAliased
# CGI directory exists, if you have that configured.
#
<Directory "/Library/WebServer/CGI-Executables">
    AllowOverride None
    Options None
    Require all granted
</Directory>

<IfModule mime_module>
    #
    # TypesConfig points to the file containing the list of mappings from
    # filename extension to MIME-type.
    #
    TypesConfig /private/etc/apache2/mime.types

    #
    # AddType allows you to add to or override the MIME configuration
    # file specified in TypesConfig for specific file types.
    #
    #AddType application/x-gzip .tgz
    #
    # AddEncoding allows you to have certain browsers uncompress
    # information on the fly. Note: Not all browsers support this.
    #
    #AddEncoding x-compress .Z
    #AddEncoding x-gzip .gz .tgz
    #
    # If the AddEncoding directives above are commented-out, then you
    # probably should define those extensions to indicate media types:
    #
    AddType application/x-compress .Z
    AddType application/x-gzip .gz .tgz

    #
    # AddHandler allows you to map certain file extensions to "handlers":
    # actions unrelated to filetype. These can be either built into the server
    # or added with the Action directive (see below)
    #
    # To use CGI scripts outside of ScriptAliased directories:
    # (You will also need to add "ExecCGI" to the "Options" directive.)
    #
    #AddHandler cgi-script .cgi

    # For type maps (negotiated resources):
    #AddHandler type-map var

    #
    # Filters allow you to process content before it is sent to the client.
    #
    # To parse .shtml files for server-side includes (SSI):
    # (You will also need to add "Includes" to the "Options" directive.)
    #
    #AddType text/html .shtml
    #AddOutputFilter INCLUDES .shtml
</IfModule>

#
# The mod_mime_magic module allows the server to use various hints from the
# contents of the file itself to determine its type.  The MIMEMagicFile
# directive tells the module where the hint definitions are located.
#
#MIMEMagicFile /private/etc/apache2/magic

#
# Customizable error responses come in three flavors:
# 1) plain text 2) local redirects 3) external redirects
#
# Some examples:
#ErrorDocument 500 "The server made a boo boo."
#ErrorDocument 404 /missing.html
#ErrorDocument 404 "/cgi-bin/missing_handler.pl"
#ErrorDocument 402 http://www.example.com/subscription_info.html
#

#
# MaxRanges: Maximum number of Ranges in a request before
# returning the entire resource, or one of the special
# values 'default', 'none' or 'unlimited'.
# Default setting is to accept 200 Ranges.
#MaxRanges unlimited

#
# EnableMMAP and EnableSendfile: On systems that support it, 
# memory-mapping or the sendfile syscall may be used to deliver
# files.  This usually improves server performance, but must
# be turned off when serving from networked-mounted 
# filesystems or if support for these functions is otherwise
# broken on your system.
# Defaults: EnableMMAP On, EnableSendfile Off
#
#EnableMMAP off
#EnableSendfile on

TraceEnable off

# Supplemental configuration
#
# The configuration files in the /private/etc/apache2/extra/ directory can be 
# included to add extra features or to modify the default configuration of 
# the server, or you may simply copy their contents here and change as 
# necessary.

# Server-pool management (MPM specific)
Include /private/etc/apache2/extra/httpd-mpm.conf

# Multi-language error messages
#Include /private/etc/apache2/extra/httpd-multilang-errordoc.conf

# Fancy directory listings
Include /private/etc/apache2/extra/httpd-autoindex.conf

# Language settings
#Include /private/etc/apache2/extra/httpd-languages.conf

# User home directories
#Include /private/etc/apache2/extra/httpd-userdir.conf

# Real-time info on requests and configuration
#Include /private/etc/apache2/extra/httpd-info.conf

# Virtual hosts
#Include /private/etc/apache2/extra/httpd-vhosts.conf

# Local access to the Apache HTTP Server Manual
#Include /private/etc/apache2/extra/httpd-manual.conf

# Distributed authoring and versioning (WebDAV)
#Include /private/etc/apache2/extra/httpd-dav.conf

# Various default settings
#Include /private/etc/apache2/extra/httpd-default.conf

# Configure mod_proxy_html to understand HTML4/XHTML1
<IfModule proxy_html_module>
Include /private/etc/apache2/extra/proxy-html.conf
</IfModule>

# Secure (SSL/TLS) connections
#Include /private/etc/apache2/extra/httpd-ssl.conf
#
# Note: The following must must be present to support
#       starting without SSL on platforms with no /dev/random equivalent
#       but a statically compiled-in mod_ssl.
#
<IfModule ssl_module>
SSLRandomSeed startup builtin
SSLRandomSeed connect builtin
</IfModule>

Include /private/etc/apache2/other/*.conf

#
# uncomment out the below to deal with user agents that deliberately
# violate open standards by misusing DNT (DNT *must* be a specific
# end-user choice)
#
#<IfModule setenvif_module>
#BrowserMatch "MSIE 10.0;" bad_DNT
#</IfModule>
#<IfModule headers_module>
#RequestHeader unset DNT env=bad_DNT
#</IfModule>
Flags: needinfo?(dcrisan)
and /private/etc/apache2/other/talos.conf  file:

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.

# DocumentRoot: The directory out of which you will serve your
# documents. By default, all requests are taken from this directory, but
# symbolic links and aliases may be used to point to other locations.
#
DocumentRoot "/builds/slave/talos-data/talos" 

#
# This should be changed to whatever you set DocumentRoot to.
#
<Directory "/builds/slave/talos-data/talos">

    #
    # Possible values for the Options directive are "None", "AFll",
    # or any combination of:
    #   Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews
    #
    # Note that "MultiViews" must be named *explicitly* --- "Options All"
    # doesn't give it to you.
    #
    # The Options directive is both complicated and important.  Please see
    # http://httpd.apache.org/docs/2.2/mod/core.html#options
    # for more information.
    #
    Options Indexes FollowSymLinks MultiViews

    #
    # AllowOverride controls what directives may be placed in .htaccess files.
    # It can be "All", "None", or any combination of the keywords:
    #   Options FileInfo AuthConfig Limit
    #
    AllowOverride None

    #
    # Controls who can get stuff from this server.
    #
    Require all granted
    Order allow,deny
    Allow from all

</Directory>
Thanks!

After looking at https://httpd.apache.org/docs/trunk/vhosts/examples.html#proxy I bet you can do something like this:

Create /private/etc/apache2/other/taskcluster-proxy.conf with content:

<VirtualHost *:*>
    ProxyPreserveHost On
    ProxyPass        "/" "http://localhost:8080/"
    ProxyPassReverse "/" "http://localhost:8080/"
    ServerName taskcluster
</VirtualHost>


And then run taskcluster-proxy on port 8080 (check it is free first, of course!)

Note, I'm no apache config expert, but has to be worth a shot!
Not sure if the other directives would need to also be VirtualHost, or if doing this might steal traffic away. Anyway, let's see if we can work out how to do it - I think this would be the best way forward.
thanks for checking Pete, Talos used Apache in the past, but for the last ~4 years it has used a dynamic python webserver; If there is apache running, it could be from old setup scripts, or it could be for something else.
Flags: needinfo?(jmaher)
Thanks, Joel! Dragos is going to check the logs and see if anything's been using it, too. If it looks clear, let's get rid of it.
I don't know enough to be helpful here.
Flags: needinfo?(dustin)
I had a look on t-yosemite-r7-0022.test.releng.scl3.mozilla.com worker:

cat /var/log/apache2/access_log

127.0.0.1 - - [08/May/2018:11:58:17 -0700] "GET http://ia.media-imdb.com/images/M/MV5BMTQ2NzUxMTAxN15BMl5BanBnXkFtZTcwMzEyMTIwMg@@._V1._SX94_SY140_.jpg HTTP/1.1" 404 276
::1 - - [08/May/2018:15:38:19 -0700] "GET /404.ogv HTTP/1.1" 404 205
::1 - - [08/May/2018:15:38:19 -0700] "GET /404.wav HTTP/1.1" 404 205
::1 - - [08/May/2018:15:38:20 -0700] "GET /404.webm HTTP/1.1" 404 206
::1 - - [09/May/2018:03:45:42 -0700] "GET /browser/devtools/client/netmonitor/test/sjs_cors-test-server.sjs HTTP/1.1" 404 262
127.0.0.1 - - [09/May/2018:05:09:46 -0700] "GET /content-security-policy/generic/positiveTest.js HTTP/1.1" 404 245
127.0.0.1 - - [09/May/2018:05:30:57 -0700] "GET /test-1 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:30:57 -0700] "GET /test-2 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:30:58 -0700] "GET /test-1 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:30:58 -0700] "GET /test-2 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:30:58 -0700] "GET /test-1 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:30:58 -0700] "GET /test-1 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:30:58 -0700] "GET /test-2 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:30:59 -0700] "GET /test-2 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:31:00 -0700] "GET /test-2 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:31:00 -0700] "GET /test-1 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:31:00 -0700] "GET /test-2 HTTP/1.1" 404 204
127.0.0.1 - - [09/May/2018:05:31:00 -0700] "GET /test-1 HTTP/1.1" 404 204
::1 - - [09/May/2018:05:54:42 -0700] "GET /browser/devtools/client/netmonitor/test/sjs_cors-test-server.sjs HTTP/1.1" 404 262
::1 - - [09/May/2018:10:28:25 -0700] "GET /a.png HTTP/1.1" 404 203
127.0.0.1 - - [09/May/2018:16:35:24 -0700] "GET /xhr/resources/status.py?content=hello3 HTTP/1.1" 404 221
::1 - - [09/May/2018:16:54:55 -0700] "GET /serverGone.gif HTTP/1.1" 404 212
::1 - - [09/May/2018:21:49:49 -0700] "GET http://ia.media-imdb.com/images/M/MV5BMTQ2NzUxMTAxN15BMl5BanBnXkFtZTcwMzEyMTIwMg@@._V1._SX94_SY140_.jpg HTTP/1.1" 404 276
::1 - - [10/May/2018:01:47:46 -0700] "GET /a.png HTTP/1.1" 404 203
127.0.0.1 - - [10/May/2018:07:35:52 -0700] "GET http://ia.media-imdb.com/images/M/MV5BMTQ2NzUxMTAxN15BMl5BanBnXkFtZTcwMzEyMTIwMg@@._V1._SX94_SY140_.jpg HTTP/1.1" 404 276
::1 - - [10/May/2018:09:56:57 -0700] "GET /404.ogv HTTP/1.1" 404 205
::1 - - [10/May/2018:09:56:57 -0700] "GET /404.wav HTTP/1.1" 404 205
::1 - - [10/May/2018:09:56:58 -0700] "GET /404.webm HTTP/1.1" 404 206

Looks like apache server is in use.
sjs_cors-test-server.sjs is a specific type we use for our xpcshell webserver, that is launched dynamically and binds to many ports on the machine.  Can you find the log of the test that ran on that machine? I would like to look for errors.
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #78)
> sjs_cors-test-server.sjs is a specific type we use for our xpcshell
> webserver, that is launched dynamically and binds to many ports on the
> machine.  Can you find the log of the test that ran on that machine? I would
> like to look for errors.

There is the lists with all jobs that run on that machine: 
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/scl3/t-yosemite-r7-0022
sadly this only shows the last 17 hours of tests (20 jobs) and 2 could have ran the devtools/client/netmonitor/test/browser_net* tests, but they were the wrong chunks.

Are there references to inspectedwindow-reload-target.sjs?  I see it accessed in this log:
https://taskcluster-artifacts.net/WfYWgBlYQrSm7m7SCXln-Q/0/public/logs/live_backing.log

If we do not see that in the apache logs, then I assume we are using the xpcshell server properly.


it looks like the 404.* files are from dom/media/test/test_info_leak.html, but I don't see a job running of that type.  This is why I really want >20 jobs in history (there is a bug on file already).

Could we turn the apache server off and see what fails?  I don't believe we have this running on linux or windows machines, if we can confirm that it would give us more confidence to turn this off.  If we do, maybe a controlled set of machines on a Monday morning and coordinate with the sheriffs?
Looking over the net, for OSX I saw the following:

Normally only root processes can bind to port 80 (or to any port below 1024).

taskcluster-proxy is launching by cltbld user, so it cannot bind to port 80. I'll have a look ho to use taskcluster-proxy to use port 80 directly, maybe to run this process as root
The easiest way is to use apache as a front end proxy and run the process on a non-privileged port. But for this, we will need to change "taskclusterProxyPort" from 80 to 8080 on the /etc/generic-worker.config
So, to be more explicit: we can use the call http://taskcluster/<path>, but we will need to to use apache as a front end proxy and run the process on a non-privileged port. But for this, we will need to change "taskclusterProxyPort" from 80 to 8080 on the /etc/generic-worker.config. I can generate the generic-worker.config dinamicaly, based on os: if it is installed to OSX the value for taskclusterProxyPort will be 8080, if not the value will be 80. then we can use apache web server as a proxy

As a note: the httpd daemon is stopped on the t-yosemite-r7-380.test.releng.mdc1.mozilla.com, so the port 80 is not used in this moment

pmoore: Works this approach for you?
Flags: needinfo?(pmoore)
(In reply to Dragos Crisan [:dragrom] from comment #83)
> So, to be more explicit: we can use the call http://taskcluster/<path>, but
> we will need to to use apache as a front end proxy and run the process on a
> non-privileged port. But for this, we will need to change
> "taskclusterProxyPort" from 80 to 8080 on the /etc/generic-worker.config. I
> can generate the generic-worker.config dinamicaly, based on os: if it is
> installed to OSX the value for taskclusterProxyPort will be 8080, if not the
> value will be 80. then we can use apache web server as a proxy
> 
> As a note: the httpd daemon is stopped on the
> t-yosemite-r7-380.test.releng.mdc1.mozilla.com, so the port 80 is not used
> in this moment
> 
> pmoore: Works this approach for you?

Hi Dragos,

Yes, no problem (see comment 72).

We're currently using generic-worker on macOS and Windows - but Windows machines are managed with https://github.com/mozilla-releng/OpenCloudConfig - therefore you shouldn't currently need to worry too much about other platforms.

Good luck! :-)
Pete
Flags: needinfo?(pmoore)
Depends on: 1461901
Note this dependency on bug 1461901 is just for the fix described in comment 21 above - really I should have put that in a separate bug rather than dirtying this one with it. But this shouldn't block any of the work you're doing on the mac. It should also get released to production in the next day or two.
Sorry for the delayed NI response. I can see things have progressed so I'll try to pick up here.

First regarding the initial NI...  I'm not a fan of altering PATH in an environment to find an executable when explicitly calling the executable by full path is available and doing so doesn't introduce complexity.  There are situations where we don't make assumptions of the location of an executable and PATH would be a fit.  That isn't the case here.  Rule of thumb, explicit is almost always better (and more readable to the humans). :-)  Regarding the 'LSEnvironment xml element',  I would suggest not doing fancy things with apple launchd services.  History says apple will remove or change the use of the element in future releases and it will just create more work and troubleshooting on our end.  KISS

Second, regarding 'why we run apache?!?'...  Joel's correct, python webserver should "have" replaced apache.  But if you look at the current puppet manifests they clearly show the talos module is still being install/setup across all OSes. [1]  If apache is still actually being used by talos, then we should fix that and force the test(s) to use the python webserver unless there is a strong reason against it.  Then we can update the puppet talos module to ensure http/apache is not running especially considering how heavy handed apache is with resources.  Although, that is a lot of work and it now looks like we might need a proxy for our proxy. :-(

Last point, it is unfortunate taskcluster proxy assumes the use of port 80 especially considering it is in the privileged port range and the most popular port of all time.  But I lament and move on.

TLDR; The path of least resistance, on these macOS generic worker talos hosts, is to run taskcluster proxy on an unprivileged port and supply apache with proxy* directives to keep the port 80 assumption and redirect it to the taskcluster proxy.  ...godspeed


[1] https://hg.mozilla.org/build/puppet/file/tip/modules/talos/manifests/init.pp#l6
Flags: needinfo?(jwatkins)
We will soon be publishing this taskcluster-proxy URL in an env var (TASKCLUSTER_ROOT_URL), at which point we're not necessarily tied to port 80 anymore..
I just looked at some talos jobs and we do not use port :80 for loading talos pages.  I really think we can ignore talos here.
- changed the generic-worker version
- added version tag for tasckcluster-proxy
- Added an exect block to generate generic-worker signingKey
- changed generic-worker config file by add information for tasckcluster-proxy
- installed tasck-cluster proxy on /usr/bin location
- added proxy httpd file, to redirect curl http://taskcluster to port 8080 (taskcluster-proxy listen on this port)
Attachment #8973681 - Attachment is obsolete: true
Attachment #8974336 - Attachment is obsolete: true
Attachment #8979877 - Flags: review?(pmoore)
Attachment #8979877 - Flags: review?(jwatkins)
Run a short test, and everything works as expected:

[taskcluster 2018-05-23T08:31:48.184Z] Worker Type (gecko-t-osx-1010-beta) settings:
[taskcluster 2018-05-23T08:31:48.184Z]   {
[taskcluster 2018-05-23T08:31:48.184Z]     "config": {
[taskcluster 2018-05-23T08:31:48.184Z]       "deploymentId": "",
[taskcluster 2018-05-23T08:31:48.184Z]       "runTasksAsCurrentUser": true
[taskcluster 2018-05-23T08:31:48.184Z]     },
[taskcluster 2018-05-23T08:31:48.184Z]     "generic-worker": {
[taskcluster 2018-05-23T08:31:48.184Z]       "go-arch": "amd64",
[taskcluster 2018-05-23T08:31:48.184Z]       "go-os": "darwin",
[taskcluster 2018-05-23T08:31:48.184Z]       "go-version": "go1.10.2",
[taskcluster 2018-05-23T08:31:48.184Z]       "release": "https://github.com/taskcluster/generic-worker/releases/tag/v10.7.12",
[taskcluster 2018-05-23T08:31:48.184Z]       "revision": "5c96402034cfa573c30e773f1cf8b5469dad6b4c",
[taskcluster 2018-05-23T08:31:48.184Z]       "source": "https://github.com/taskcluster/generic-worker/commits/5c96402034cfa573c30e773f1cf8b5469dad6b4c",
[taskcluster 2018-05-23T08:31:48.184Z]       "version": "10.7.12"
[taskcluster 2018-05-23T08:31:48.184Z]     },
[taskcluster 2018-05-23T08:31:48.184Z]     "machine-setup": {
[taskcluster 2018-05-23T08:31:48.184Z]       "config": "https://hg.mozilla.org/build/puppet/raw-file/production/modules/generic_worker/templates/generic-worker.config.erb",
[taskcluster 2018-05-23T08:31:48.184Z]       "docs": "https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/generic_worker"
[taskcluster 2018-05-23T08:31:48.184Z]     }
[taskcluster 2018-05-23T08:31:48.184Z]   }
[taskcluster 2018-05-23T08:31:48.185Z] Task ID: CjbJbHpiSf2wwoJSmpzisQ
[taskcluster 2018-05-23T08:31:48.185Z] === Task Starting ===
[taskcluster 2018-05-23T08:31:49.180Z] Executing command 0: /bin/bash -vxec 'echo "Querying my own task definition using taskcluster proxy..."
[taskcluster 2018-05-23T08:31:49.180Z] curl -L "http://taskcluster/queue/v1/task/${TASK_ID}"
[taskcluster 2018-05-23T08:31:49.180Z] '
echo "Querying my own task definition using taskcluster proxy..."
+ echo 'Querying my own task definition using taskcluster proxy...'
Querying my own task definition using taskcluster proxy...
curl -L "http://taskcluster/queue/v1/task/${TASK_ID}"
+ curl -L http://taskcluster/queue/v1/task/CjbJbHpiSf2wwoJSmpzisQ
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   962  100   962    0     0   4038      0 --:--:-- --:--:-- --:--:--  4042
{
  "provisionerId": "releng-hardware",
  "workerType": "gecko-t-osx-1010-beta",
  "schedulerId": "-",
  "taskGroupId": "CjbJbHpiSf2wwoJSmpzisQ",
  "dependencies": [],
  "requires": "all-completed",
  "routes": [],
  "priority": "lowest",
  "retries": 5,
  "created": "2018-05-23T08:31:42.589Z",
  "deadline": "2018-05-24T08:31:42.589Z",
  "expires": "2019-05-24T08:31:42.589Z",
  "scopes": [],
  "payload": {
    "features": {
      "taskclusterProxy": true
    },
    "maxRunTime": 60,
    "command": [
      [
        "/bin/bash",
        "-vxec",
        "echo \"Querying my own task definition using taskcluster proxy...\"\ncurl -L \"http://taskcluster/queue/v1/task/${TASK_ID}\"\n"
      ]
    ]
  },
  "metadata": {
    "name": "taskcluster-proxy-test-macOS",
    "description": "Test taskcluster-proxy on Mac",
    "owner": "dcrisan@mozilla.com",
    "source": "https://bugzilla.mozilla.org/show_bug.cgi?id=1452095#c29"
  },
  "tags": {},
  "extra": {}
}[taskcluster 2018-05-23T08:31:49.429Z] Exit Code: 0
[taskcluster 2018-05-23T08:31:49.429Z] === Task Finished ===
[taskcluster 2018-05-23T08:31:49.429Z] Task Duration: 248.602262ms
I have not done push in try so far.
pmoore: can you help me with this, and run some tests on "workerType": "gecko-t-osx-1010-beta" ? thanks
Flags: needinfo?(pmoore)
Comment on attachment 8979877 [details] [diff] [review]
Bug_1452095_Upgrade_mac_generic_worker.patch

Review of attachment 8979877 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, but Pete should be the final r+ to land this.
Attachment #8979877 - Flags: review?(jwatkins) → review+
(In reply to Jake Watkins [:dividehex] from comment #86)

> First regarding the initial NI...  I'm not a fan of altering PATH in an
> environment to find an executable when explicitly calling the executable by
> full path is available and doing so doesn't introduce complexity.  There are
> situations where we don't make assumptions of the location of an executable
> and PATH would be a fit.  That isn't the case here.  Rule of thumb, explicit
> is almost always better (and more readable to the humans). :-)  Regarding
> the 'LSEnvironment xml element',  I would suggest not doing fancy things
> with apple launchd services.  History says apple will remove or change the
> use of the element in future releases and it will just create more work and
> troubleshooting on our end.  KISS

On reflection I agree with Jake here, and would like to rescind my objection in comment 52. Indeed setting full path in taskclusterProxyExecutable in generic-worker.config probably offers the most explicit and maintainable solution, as you did originally Dragos. Sorry for the noise on this.

> 
> Second, regarding 'why we run apache?!?'...  Joel's correct, python
> webserver should "have" replaced apache.  But if you look at the current
> puppet manifests they clearly show the talos module is still being
> install/setup across all OSes. [1]  If apache is still actually being used
> by talos, then we should fix that and force the test(s) to use the python
> webserver unless there is a strong reason against it.  Then we can update
> the puppet talos module to ensure http/apache is not running especially
> considering how heavy handed apache is with resources.  Although, that is a
> lot of work and it now looks like we might need a proxy for our proxy. :-(

++

Agreed apache httpd service purging should be done, probably best in another bug.

> Last point, it is unfortunate taskcluster proxy assumes the use of port 80
> especially considering it is in the privileged port range and the most
> popular port of all time.  But I lament and move on.

I agree - and am glad to hear in comment 87 that soon this will no longer be the case.

> 
> TLDR; The path of least resistance, on these macOS generic worker talos
> hosts, is to run taskcluster proxy on an unprivileged port and supply apache
> with proxy* directives to keep the port 80 assumption and redirect it to the
> taskcluster proxy.  ...godspeed
> 

This aligns with comment 72, so I think we're in agreement here.

> 
> [1]
> https://hg.mozilla.org/build/puppet/file/tip/modules/talos/manifests/init.
> pp#l6
Comment on attachment 8979877 [details] [diff] [review]
Bug_1452095_Upgrade_mac_generic_worker.patch

Review of attachment 8979877 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/generic_worker/templates/generic-worker.config.erb
@@ +27,5 @@
>    "runTasksAsCurrentUser": true,
>    "sentryProject": "generic-worker",
>    "shutdownMachineOnIdle": false,
>    "shutdownMachineOnInternalError": false,
> +  "taskclusterProxyExecutable": "taskcluster-proxy",

Let's set the full path here, as you did before.

::: modules/generic_worker/templates/run-generic-worker.sh.erb
@@ +7,5 @@
>  # rebooting the machine.
>  
> +# Export PATH environment variable to be used by generic worker tasks
> +export PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
> +

We should probably do this in generic-worker.config.erb, as you did before, based on Jake's feedback. Sorry!
Attachment #8979877 - Flags: review?(pmoore) → review+
Flags: needinfo?(pmoore)
- commit command: hg commit -m "Enable windows beta worker types; try: -b do -p macosx64 -u all -t none"
- push command: hg push -f -r . ssh://hg.mozilla.org/try
Attachment #8983457 - Flags: review?(pmoore)
- commit command: hg commit -m "Enable OSX beta worker types; try: -b do -p macosx64 -u all -t none"
- push command: hg push -f -r . ssh://hg.mozilla.org/try
Attachment #8983459 - Flags: review?(pmoore)
- commit command: hg commit -m "Enable OSX beta worker types; try: -b do -p macosx64 -u all -t none"
- push command: hg push -f -r . ssh://hg.mozilla.org/try
Attachment #8983457 - Attachment is obsolete: true
Attachment #8983459 - Attachment is obsolete: true
Attachment #8983457 - Flags: review?(pmoore)
Attachment #8983459 - Flags: review?(pmoore)
Attachment #8983461 - Flags: review?(pmoore)
Comment on attachment 8983461 [details] [diff] [review]
Bug_1452095_task_to_push_in_try.patch

Review of attachment 8983461 [details] [diff] [review]:
-----------------------------------------------------------------

Looks perfect! :-)
Attachment #8983461 - Flags: review?(pmoore) → review+
Note, the current latest releae of generic-worker is generic-worker 10.8.4 now, which we've rolled out to Windows workers in bug 1465113.
yesterday I run a set of tests on the beta worker: https://treeherder.mozilla.org/#/jobs?repo=try&revision=054e7e6d4c501b5bfd894602507619137c13657a
everything went well.

Today I pushed on try another patch, to run all the tests for macosx64 on the beta worker type:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e35d7e2f7bd620439362df6bc4bb8a04b3f492de
The try push looks great! Nice work Dragos.

Soon the remaining tasks that did not have time to run will expire, but this is nothing to be concerned with. Since we only have one worker, and the deadline for tasks to run is 24 hours, it isn't feasible that all tasks will run in this time. However, everything that has run so far has passed, so this gives us enough assurance that we should be able to roll out to production.

Thanks also for testing with 10.8.4; I'm updating bug title to reflect this change. This now includes even more enhancements to the worker, which saves us a future upgrade later, which is very much appreciated. ++
Summary: Upgrade mac taskcluster workers to generic-worker 10.7.8 → Upgrade mac taskcluster workers to generic-worker 10.8.4
Merged yesterday the PR. Monitored the workers and everything looks good. Move the bug to RESOLVED-Fixed
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: