Closed Bug 1387541 Opened 7 years ago Closed 6 years ago

check for signing servers' ssl cert expiration

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: arny)

Details

Attachments

(4 files)

Related: bug 673303, bug 1373986
https://community.letsencrypt.org/t/it-there-a-command-to-show-how-many-days-certificate-you-have/11351 shows several scripts that help show ssl cert expiration dates. Some can run on the server itself, given a path (e.g. ssl-cert-check) or

    openssl x509 -noout -dates -in /path/to/cert.pem

This can run outside the server, as long as it can reach the ssl port:

    echo | openssl s_client -connect SERVER:PORT -servername FQDN 2>/dev/null | openssl x509 -noout -dates

So it's a question of whether we want to run this check on each server, or have a central location checking the other locations.
Priority: -- → P2
Some of our certs are going to expire in about 6 months. We'll get bitten by this again unless we get notifications ahead of time :)
Component: Release Automation → Buildduty
QA Contact: catlee → jlund
Usually, the certificate issuer will send out notice, to the admin email of ssl, with 90 days before the ssl expire so the admin can renew it in time.

If you want a second check maybe we can set a cron job with a script like https://github.com/Matty9191/ssl-cert-check so we get notices before the ssl expire.
 
The above script can check local certificates and remote ones. Also, can work like a nagios plugin.

Maybe the best is to run the check from a central location if all the ssl domains are accessible from remote.
Thanks Attila.

90 days should be sufficient but it will be a matter of where it's sent and who catches it. Do we know who owns it?
According to the new SSL standards, the admin email can be one of administrator/postmaster/webmaster/admin/hostmaster@domain or any other email which is in the whois data of the domain - this email where used to validate the domain during the process of issuing the ssl certificate. Another email that can receive the notice is the email account used to place the SSL order. 

Can anyone give me the list of the ssl domains so I can check them?
Ex mozilla.org is issued by www.digicert.com
@aki - ciduty (buildduty) would like to help here but we have some questions. At a minimum it sounds like we need a check on each of our signing servers from an outside host that has the right network flows. If the date of expiry is within some near window, we alert via nagios.

They need some context about how these certs were generated (by us) and where they are used (e.g. signingscript -> signing server). As well as where you would recommend we run these checks and how.

aki - would you have 15min to meet with attila (arny)
Assignee: nobody → acraciun
I'll leave it to Attila to reach out to you
The certs are self-signed, per https://mana.mozilla.org/wiki/display/RelEng/Signing#Signing-SSL

They're currently used from all buildbot masters, all buildbot workers, funsize signing workers, partner-repack1, and signing scriptworkers. When esr52 dies and we have an off-cycle partner repack story from taskcluster, we'll narrow this list down to just the signing scriptworkers.

I believe the ports are the signing ports: 9100, 9110, 9120. I also believe the cert is shared across all signing servers and ports, so as long as that assumption is true, checking one may be sufficient.
It may be possible to also do `openssl x509 -noout -dates -in` against the host cert.

The signing scriptworker host.cert is here: https://github.com/mozilla-releng/signingscript/blob/master/signingscript/data/host.cert  We use that for everything but esr52.

The esr52 host.cert is here: https://hg.mozilla.org/build/tools/file/tip/release/signing/host.cert

These are public files, so we could even create a cron'ed taskcluster task that runs periodically, that:

* clones tools from hg
* clones signingscript from git
* runs `openssl x509 -noout -dates -in host.cert` against both host.cert files
* checks the notBefore and notAfter dates to make sure they're still valid, and will still be valid for x amount of time (90 days?)
** if true, exit 0
** if false, exit non-zero
* has taskclusterNotify set so it emails on failure, similar to the cot-gpg-keys expiration monitoring hook https://tools.taskcluster.net/hooks/project-releng/cot-gpg-keys%2Fexpiration
(In reply to Aki Sasaki [:aki] from comment #9)

> The esr52 host.cert is here:
> https://hg.mozilla.org/build/tools/file/tip/release/signing/host.cert

We should use default, not tip: https://hg.mozilla.org/build/tools/file/default/release/signing/host.cert

> These are public files, so we could even create a cron'ed taskcluster task
> that runs periodically, that:
> 
> * clones tools from hg
> * clones signingscript from git

We could also just wget those 2 URLs and avoid git and mercurial completely.
Aki, I have tried to connect to Signing-Host from https://mana.mozilla.org/wiki/display/RelEng/Signing#Signing-Hosts on the ports you mentioned, but looks like there are no open ports. Also, I have checked the signing-linux-18.srv.releng.usw2.mozilla.com server for any process running on this ports but there is nothing. 

Maybe we need to check the certs locally on each server? 

Also, at https://mana.mozilla.org/wiki/display/RelEng/Signing#Signing-Inventory the quickest that expire are:

Releases & Nightlies Authenticode (SHA2)	6/23/2017	6/28/2019	Linux signing machines

Releases & Nightlies GPG 6/22/2017 6/22/2019 All signing machines


Is the list up to date?
(In reply to Attila Craciun [:arny] from comment #11)
> Aki, I have tried to connect to Signing-Host from
> https://mana.mozilla.org/wiki/display/RelEng/Signing#Signing-Hosts on the
> ports you mentioned, but looks like there are no open ports. Also, I have
> checked the signing-linux-18.srv.releng.usw2.mozilla.com server for any
> process running on this ports but there is nothing. 
> 
> Maybe we need to check the certs locally on each server? 

It's possible. We have a firewall that restricts those ports to an allowlist of IPs: https://hg.mozilla.org/build/puppet/file/tip/modules/fw/manifests/networks.pp#l191

signing-linux-18 is a signing scriptworker; it talks to the signing servers, e.g. signing4 and mac-v2-signing1. The ports and certs are on the signing servers, not the signing scriptworkers.

Right now, for the ssl certs, host.cert is probably the easiest solution. I'm not sure if the openssl check checks both certs, or if we have to split it apart for expiration checking.

> Also, at
> https://mana.mozilla.org/wiki/display/RelEng/Signing#Signing-Inventory the
> quickest that expire are:
> 
> Releases & Nightlies Authenticode (SHA2)	6/23/2017	6/28/2019	Linux signing
> machines
> 
> Releases & Nightlies GPG 6/22/2017 6/22/2019 All signing machines
> 
> 
> Is the list up to date?

It should be, but we're not always good at keeping that up to date. Otherwise we could just check the list rather than running automation to check expiration :)

For signing key expiration, you're not going to be able to log into the signing servers to check the keys, which will make writing the nagios checks difficult. If we can get you throwaway or dep keys of each type, you might be able to write the checks against those, and then we can deploy those checks on the production instances.
I tried to ssh to signing4.srv.releng.scl3.mozilla.com with my user, it ask me for DUO but later I get:

debug1: Offering RSA public key: /home/arny/.ssh/id_rsa
debug1: Authentications that can continue: publickey
debug1: No more authentication methods to try.
Permission denied (publickey).

I don't think I'll need ssh access because:

This https://hg.mozilla.org/build/tools/file/default/release/signing/host.cert has two certificates for the following FQDNs:

Valid From: August 30, 2017
Valid To: August 30, 2018
mac-v2-signing1.srv.releng.scl3.mozilla.com
mac-v2-signing2.srv.releng.scl3.mozilla.com
mac-v2-signing3.srv.releng.scl3.mozilla.com
mac-v2-signing4.srv.releng.scl3.mozilla.com
mac-v2-signing6.srv.releng.scl3.mozilla.com
mac-v2-signing7.srv.releng.scl3.mozilla.com



(looks like the bellow was renewed recently)
 Valid From: January 31, 2018
 Valid To: January 31, 2019
 mac-v2-signing8.srv.releng.mdc1.mozilla.com
 mac-v2-signing9.srv.releng.mdc1.mozilla.com
 mac-v2-signing10.srv.releng.mdc1.mozilla.com
 
 mac-depsigning1.srv.releng.mdc1.mozilla.com
 mac-depsigning2.srv.releng.mdc1.mozilla.com
 mac-depsigning3.srv.releng.mdc1.mozilla.com
 
 signing7.srv.releng.mdc1.mozilla.com
 signing8.srv.releng.mdc1.mozilla.com
 signing9.srv.releng.mdc1.mozilla.com
 
signing4.srv.releng.scl3.mozilla.com
signing5.srv.releng.scl3.mozilla.com
signing6.srv.releng.scl3.mozilla.com

Are all this the signing servers? If yes, then probably the best is to create one new certificate for all the above servers for at least 3 years and upload it to the repo by replacing the actual host.cert? :)
Yup, looks like this are the signing server according to https://github.com/mozilla/build-puppet/blob/master/modules/buildmaster/templates/passwords.py.erb

So, we can do:

1) we can create a cronjob on each server to check the ssl expire date with  

   openssl x509 -noout -dates -in /path/to/cert.pem

2) Add nagios plugin to check this. 

3) Use another central location to do the checks, however, once we have nagios, I think is the best central location :).
(In reply to Attila Craciun [:arny] from comment #13)
> I tried to ssh to signing4.srv.releng.scl3.mozilla.com with my user, it ask
> me for DUO but later I get:

yeah, you won't be able to access the signing servers directly. fwiw - I don't have access either :)
Aki, what do you think?
If there is no way to check cert remotely from a machine with valid network flows, this probably is not be in ciduty's scope.
It seems like a cron-hook to check the two host.certs is a good path forward.
Why are both certificates in one file?

The openssl x509 -noout -dates -in /path/to/cert.pem checks only the first ssl, so we have to split the files like:

cat host.cert |awk  'split_after == 1 {n++;split_after=0}  /-----END CERTIFICATE-----/ {split_after=1}{print > "cert" n ".pem"}'
- this will produce cert.pem and cert1.pen which can be checked with openssl.

How about the notification? We can use an central server, pull the file, split it, check it and send notifications. Any Linux box with a mail servers? Like a jumphost?
(In reply to Attila Craciun [:arny] from comment #19)
> Why are both certificates in one file?

Signtool only allows for one cert file, so we put the certs in the same file.

> The openssl x509 -noout -dates -in /path/to/cert.pem checks only the first
> ssl, so we have to split the files like:
> 
> cat host.cert |awk  'split_after == 1 {n++;split_after=0}  /-----END
> CERTIFICATE-----/ {split_after=1}{print > "cert" n ".pem"}'
> - this will produce cert.pem and cert1.pen which can be checked with openssl.

Sounds good to me :)

> How about the notification? We can use an central server, pull the file,
> split it, check it and send notifications. Any Linux box with a mail
> servers? Like a jumphost?

Also sounds good. I'm open about the server. Is there a machine that ciduty uses for things like this? Cruncher or aws-manager2? If you write a script, where to run it is a little easier than otherwise. Running it on a machine would probably require a small puppet module; we could have our pick of machines to run it on once that's written. Or we could write a cron hook to run the script and run it on taskcluster.
Product: Release Engineering → Infrastructure & Operations
Attached image email.png
Today I have tested the check ssl script from Comment#3 on rejh1.srv.releng.mdc1.mozilla.com and works fine (see attachment). I have manually splited the cert file and checked one from the command line.

Now, if all you agree to use this jumphost, I can do the script which will do all the "hard work", then, via puppet, put it to /etc/cron/weekly.
Van, what do you think using one of the jumphost?
(In reply to Attila Craciun [:arny] from comment #22)
> Van, what do you think using one of the jumphost?

unping van, relops own the jumphost infra afaik. 

Attila, as aki pointed out in comment 20, you should be able to use many servers within our network. The jumphosts are used when we are trying to reach a host from outside our network. e.g. from our local machine.

Given that, I believe we could use something like our miscellaneous worker host, cruncher: https://dxr.mozilla.org/build-central/source/puppet/manifests/moco-nodes.pp#318

To do so, we would want to add a new cron job via puppet. Here are the cron jobs we currently have on cruncher: https://dxr.mozilla.org/build-central/source/puppet/modules/cruncher/manifests/cron.pp
Flags: needinfo?(acraciun)
(In reply to Jordan Lund (:jlund) from comment #23)
> (In reply to Attila Craciun [:arny] from comment #22)
> > Van, what do you think using one of the jumphost?
> 
> unping van, relops own the jumphost infra afaik. 
> 
> Attila, as aki pointed out in comment 20, you should be able to use many
> servers within our network. The jumphosts are used when we are trying to
> reach a host from outside our network. e.g. from our local machine.
> 
> Given that, I believe we could use something like our miscellaneous worker
> host, cruncher:
> https://dxr.mozilla.org/build-central/source/puppet/manifests/moco-nodes.
> pp#318
> 
> To do so, we would want to add a new cron job via puppet. Here are the cron
> jobs we currently have on cruncher:
> https://dxr.mozilla.org/build-central/source/puppet/modules/cruncher/
> manifests/cron.pp

needinfo attila: does this help?
Yes, got it. I'll do a patch and upload it here for review.
Flags: needinfo?(acraciun)
Attached patch check_ssl.diffSplinter Review
Attached the patch.
Attachment #8981392 - Flags: review?(jlund)
Attachment #8981392 - Flags: review?(aki)
Comment on attachment 8981392 [details] [diff] [review]
check_ssl.diff

Thanks!
Attachment #8981392 - Flags: review?(aki) → review+
Comment on attachment 8981392 [details] [diff] [review]
check_ssl.diff

Review of attachment 8981392 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks arny! Looks good. We should fix up the email name since as it stands, this will go to a black hole. r- till that gets addressed.

Since this landed already, a follow up patch is fine.

::: modules/cruncher/manifests/cron.pp
@@ +12,4 @@
>          '/etc/cron.d/allthethings':
>              mode    => '0644',
>              content => template('cruncher/allthethings_cron.erb');
> +        '/usr/local/bin/cert_check.sh':

perhaps the bash script that cron runs should be added here: https://dxr.mozilla.org/build-central/source/puppet/modules/cruncher/manifests/init.pp#4 or in its own file, e.g. check_cert.pp

Not a big deal, it's also reasonable to leave it beside the cron definition here.

::: modules/cruncher/templates/check_sign_srv_ssl_exp.sh.erb
@@ +11,5 @@
> +
> +cat $CERT | awk  'split_after == 1 {n++;split_after=0}  /-----END CERTIFICATE-----/ {split_after=1}{print > "/tmp/cert" n ".pem"}'
> +
> +sh $CHECKSSL -c $CERT1 -x $DAYS -ab -e builddutty@mozilla.com
> +sh $CHECKSSL -c $CERT2 -x $DAYS -ab -e builddutty@mozilla.com

s/builddutty/buildduty/

Better yet, we can now use ciduty@mozilla.com
Attachment #8981392 - Flags: review?(jlund) → review-
Updated notification email.
Attachment #8982134 - Flags: review?(riman)
Attachment #8982134 - Flags: review?(riman) → review+
Pushed by acraciun@mozilla.com:
https://hg.mozilla.org/build/puppet/rev/a9f248ebbabe
Updated notification email for the signing servers' ssl cert expiration. Bug 1387541. r=riman.
:jlund: I have checked with :dragrom once I made the first patch and he suggested to put it in the cron.pp. I wanted to put it somewhere else however, since the cron need that bash script, we wanted to make sure that is there each time when it runs.
Now that the script is in place, how can I can access the server so I can do a manually run for the check. Maybe there is a missing program like mailx. I have tried to ssh but get (with my ssh user).

channel 2: open failed: administratively prohibited: open failed
Stdio forwarding request failed: Session open refused by peer
ssh_exchange_identification: Connection closed by remote host
(In reply to Attila Craciun [:arny] from comment #32)
> Now that the script is in place, how can I can access the server so I can do
> a manually run for the check. Maybe there is a missing program like mailx. I
> have tried to ssh but get (with my ssh user).
> 
> channel 2: open failed: administratively prohibited: open failed
> Stdio forwarding request failed: Session open refused by peer
> ssh_exchange_identification: Connection closed by remote host

Sorry, please needinfo me as I don't check cc bugmail daily.

Strange, you are should be able to get to cruncher through the jumphost:

[jlund@cruncher-aws.srv.releng.usw2.mozilla.com ~]$ sudo cat /home/acraciun/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDXYkHS2dBgka49LH7lwQaJx3ORpFUiTSvNCcA5rzPPFRql1y6/qcLEDmw0+v4JRFMubrrp1WfL5XLGqyPc2RBX5r3Z0eJbI9rEWxU3Lkp02XR7oRpOoY8Ffhx0yZ4nxXU+BaA3NK+i4kM6UMhH0e7KCPW+9ACKNEgUNSI2UZmsS+kL/veIcfyipPDVX44u06I1ytekZ/x/uXRZcZjf1jMsVYacuOUyfzAvkfW+5XEFxvwkf71uSE73C/dw7Tt3TCsi5qptqN4AcYF5XobksykN8fwwjWbza+JuQhiTz9LvfQfvOhhdvbv/PphC1+pmSaAjVHfiMPUmjykRD/9XcfJMQN7eY8jACA1h2JAb+kG4ABnmQElzCPH8ygerwaWQ682cxMmGzwqFRDEhP9Qo4J2qR9lPaznOooHk2psOsS/h0dtFWGJEfgOIJkmb9Nf8HDzkUEx0n3rY9yqbxu6HZw7EJisKOqLudIJ4eZa/AudSnkcsFDV1SWx3zC32IRv3N76uwqNjHgU7qLOeyFH1y2U91QdxdkjTiSuCe0TzNW5zjSzOSyVEfjTyHqFZuSCT55ZvG/hrelrrNuQVgV050o9e/WWp8dsZCzddMaoVw+VJnCpt/eZSwhTtxn/rOiUqQ/gVDLhdE9jI/IrJndjCVcc3D0Ayq89QrfEapFrs1Vahqw== Mozilla_key
Works now the ssh login. Made a check and looks like I made a mistake on the script name. This patch fix it.
Attachment #8984805 - Flags: review?(bcrisan)
Attachment #8984805 - Flags: review?(bcrisan) → review+
:aki, can you push the patch to github? Looks like I don't have access.

fatal: unable to access 'https://github.com/mozilla/build-puppet.git/': The requested URL returned error: 403
Flags: needinfo?(aki)
You need to fork the repo, push the patch to a branch on your own fork, and create a PR. Do you want to try that, or would you like me to?
Flags: needinfo?(aki)
Done the fork, pushed and created the PR. Looks good?
Flags: needinfo?(aki)
Approved&merged. Thanks!
Flags: needinfo?(aki)
:arny - can we close this bug? Has it been verified?
Flags: needinfo?(acraciun)
:aki - next week I'll close it after I receive the cron email with the warning.

How we proceed once we get warning with the ssl expiration? We'll generate a new one and push it to the repo?
Flags: needinfo?(acraciun) → needinfo?(aki)
Sure, or file a bug for a new ssl cert for someone in releng to create and deploy. As long as we have heads up about the expiration, we're good.
Flags: needinfo?(aki)
Looks like the warning emails where sent from root@localhost.localdomain and the ciduty AT mozilla dot com was not accepting.
Created PR with a patch https://github.com/mozilla-releng/build-puppet/pull/101/commits/cccea32cca5c7c5890f475a6029e108c7471facf 

Please merge, thanks.
Flags: needinfo?(aki)
Done, thanks!
Flags: needinfo?(aki)
It worked! sent from root@cruncher-aws.srv.releng.usw2.mozilla.com. I think we are done here. Please reopen if I'm wrong.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: