Closed Bug 936248 Opened 11 years ago Closed 11 years ago

EC2 jobs failing due to lost network traffic

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

All
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 940356

People

(Reporter: KWierso, Unassigned)

Details

Attachments

(1 file)

Some of these were network timeouts, too.

This seems to be limited to use1, so I suspect this is the usual use1 connection problem.

Arzhel's seeing "%-RT_IPSEC_REPLAY: Replay packet detected on IPSec tunnel" suggesting that the Amazon end is re-transmitting packets and that both the original and re-transmitted packets are eventually arriving.
Summary: EC2 jobs failing due to Error <urlopen error timed out> while getting http://pypi.pvt.build.mozilla.org/pub/blobuploader-1.0b.tar.gz (from http://pypi.pvt.build.mozilla.org/pub/) error → EC2 jobs failing due to lost network traffic
Does this seem solvable? Do we still need to gather more data?

Should releng continue to setup in-house buildbot masters to minimize cross-colo traffic (bug 927129)?
(In reply to Chris Cooper [:coop] from comment #3)
> Does this seem solvable? Do we still need to gather more data?
> 
> Should releng continue to setup in-house buildbot masters to minimize
> cross-colo traffic (bug 927129)?

I believe this was a different part of our system.

Ec2 host trying to read in-house host:
tst-linux32-ec2-134 -> http://pypi.pvt.build.mozilla.org

A better bug would be to setup a pypi host on EC2.
That's true, but difficult since pypi is part of the releng web cluster and thus based on infra puppet and assuming the presence of a Zeus load balancer and a NetApp backend.  Following this line of thought, you'll need to move *everything* into AWS - clobberer, npm-mirror, ftp, hgmo, gitmo, ..

And maybe that is the right solution, but it's not a decision we should back into lightly or without some serious thought, as it will be a lot of work, and far *more* work if we don't approach it systematically.
Arzhel is debugging this in bug 940356.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: