Closed Bug 1391275 Opened 7 years ago Closed 7 years ago

Windows buildbot test jobs fail: Unable to establish SSL connection.

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Unassigned)

References

Details

(Whiteboard: [stockwell infra])

Log from a previously working job:

--06:50:11--  https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/utils/archiver_client.py
           => `archiver_client.py'
Resolving hg.mozilla.org... 63.245.215.25, 63.245.215.102
Connecting to hg.mozilla.org|63.245.215.25|:443... connected.
WARNING: Certificate verification error for hg.mozilla.org: certificate signature failure
HTTP request sent, awaiting response... 200 Script output follows
Length: 12,179 (12K) [text/x-python]

    0K .......... .                                          100%    8.42 MB/s

06:50:11 (8.42 MB/s) - `archiver_client.py' saved [12179/12179]

As noticed, we still hit a warning before this actual error occurred.
Let the record show that we're executing the following command:

  $ 'bash' '-c' 'wget -Oarchiver_client.py --no-check-certificate --tries=10 --waitretry=3 https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/utils/archiver_client.py'

We then run the downloaded archiver_client.py and use it to download other things, like mozharness.

The use of --no-check-certificate completely undermines security. We might as well be using plaintext. This was a pre-existing security bug and it is a good thing this insecure download stopped working. Unfortunately, it is causing a tree closure :/
It's an issue between the openssl version, which is sending a SSLv2 client hello and Zeus is dropping it (after the upgrade).

:fox2mike will go downgrade Zeus and IT will follow up with Zeus
 * Support for SSLv2 ClientHellos has been removed in this release.  Following
   the earlier removal of support for SSLv2, the traffic manager has previously
   accepted an SSLv2-compatible ClientHello when acting as a TLS server, so
   long as it indicated that the client was willing to upgrade to SSLv3/TLS
   records for the remainder of the connection.  Following the upgrade, the
   traffic manager will respond to SSLv2-compatible ClientHellos with an SSL
   alert message, before dropping the connection.
   VTM-34550

OpenSSL 0.9.x appears to, by default, send an SSLv2 ClientHello unless instructed otherwise through, for instance, "openssl s_client -tls1" or whatever your Python/Wget/Curl/etc mechanisms are for indicating a similar "minimum protocol version of TLS 1.0".

It appears that Zeus has broken compatibility with all OpenSSL 0.9.x clients. Revert and discuss with Zeus Actual.
Just noting here for the record that we rolled back Zeus back to 17.1 by 1010 PDT.
I just reopened inbound/autoland since the retriggers were coming back green.
Richard's analysis looks right. It looks like you can maybe wget to do this by
"--secure-protocol=protocol" (https://linux.die.net/man/1/wget) [the fact that TLSv1 is what they list as the newest version is terrifying]. Maybe time to switch to curl?

GPS, good catch on --no-check-certificate
I'm not sure what --secure-protocol implies beyond ClientHello - it may simply be specifying the Hello version, or it might have deeper effects in the wget code.

Recompiling wget (and python) against openssl 1.x instead of the current 0.x would likely resolve the issue without any changes to code, command lines, and so forth.
(In reply to Gregory Szorc [:gps] from comment #3)
> Let the record show that we're executing the following command:
> 
>   $ 'bash' '-c' 'wget -Oarchiver_client.py --no-check-certificate --tries=10
> --waitretry=3
> https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/utils/
> archiver_client.py'
> 
> We then run the downloaded archiver_client.py and use it to download other
> things, like mozharness.
> 
> The use of --no-check-certificate completely undermines security. We might
> as well be using plaintext. This was a pre-existing security bug and it is a
> good thing this insecure download stopped working. Unfortunately, it is
> causing a tree closure :/

bug 798025 :-(
Whiteboard: [stockwell infra]
Priority: -- → P1
The solution was to roll back the zeus code. Webops is investigating moving to the LTS branch of the zeus code until we can decommission the machines using older ssl versions.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.