Closed
Bug 936248
Opened 11 years ago
Closed 11 years ago
EC2 jobs failing due to lost network traffic
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 940356
People
(Reporter: KWierso, Unassigned)
Details
Attachments
(1 file)
Comment 1•11 years ago
|
||
Some of these were network timeouts, too. This seems to be limited to use1, so I suspect this is the usual use1 connection problem. Arzhel's seeing "%-RT_IPSEC_REPLAY: Replay packet detected on IPSec tunnel" suggesting that the Amazon end is re-transmitting packets and that both the original and re-transmitted packets are eventually arriving.
Summary: EC2 jobs failing due to Error <urlopen error timed out> while getting http://pypi.pvt.build.mozilla.org/pub/blobuploader-1.0b.tar.gz (from http://pypi.pvt.build.mozilla.org/pub/) error → EC2 jobs failing due to lost network traffic
Comment 2•11 years ago
|
||
smokeping for the last 30h
Comment 3•11 years ago
|
||
Does this seem solvable? Do we still need to gather more data? Should releng continue to setup in-house buildbot masters to minimize cross-colo traffic (bug 927129)?
Comment 4•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #3) > Does this seem solvable? Do we still need to gather more data? > > Should releng continue to setup in-house buildbot masters to minimize > cross-colo traffic (bug 927129)? I believe this was a different part of our system. Ec2 host trying to read in-house host: tst-linux32-ec2-134 -> http://pypi.pvt.build.mozilla.org A better bug would be to setup a pypi host on EC2.
Comment 5•11 years ago
|
||
That's true, but difficult since pypi is part of the releng web cluster and thus based on infra puppet and assuming the presence of a Zeus load balancer and a NetApp backend. Following this line of thought, you'll need to move *everything* into AWS - clobberer, npm-mirror, ftp, hgmo, gitmo, .. And maybe that is the right solution, but it's not a decision we should back into lightly or without some serious thought, as it will be a lot of work, and far *more* work if we don't approach it systematically.
Comment 6•11 years ago
|
||
Arzhel is debugging this in bug 940356.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•