Closed
Bug 1137322
Opened 9 years ago
Closed 9 years ago
osx test slaves are failing to download a test zip from similiar rev
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: jlund, Unassigned)
Details
there was a spike in failed download attempts from ftp. sheriffs reported 5 or 6 instances close together around 09:30 PT log example: https://treeherder.mozilla.org/logviewer.html#?job_id=7024399&repo=mozilla-inbound snippet: 09:24:14 INFO - Can't download from https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1424962152/firefox-39.0a1.en-US.mac.tests.zip to /builds/slave/talos-slave/test/build/firefox-39.0a1.en-US.mac.tests.zip! host example of a slave trying to download from ftp: t-snow-r4-0061 builder: builder: mozilla-inbound_snowleopard_test-mochitest-2
Reporter | ||
Comment 1•9 years ago
|
||
sheriffs have reported that this is coming from hosts sharing the same revision of change. Usul has reported no anomalies within ftp. I am starting to suspect that since this is the same rev and thus far all osx, we uploaded a corrupted test zip for that revision: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1424962152/firefox-39.0a1.en-US.mac.tests.zip and that is why test jobs that are trying to download are having issues. sheriffs are pushing a new rev. I'll leave this bug open while we wait on results.
Reporter | ||
Updated•9 years ago
|
Summary: downloading from ftp is sporadically failing → osx test slaves are failing to download a test zip from similiar rev
Comment 2•9 years ago
|
||
The opt Mac build was double-retriggered, finishing 3 minutes apart. That means that while test jobs triggered by the first retrigger were downloading, the second retrigger was uploading over the top. That doesn't go well for us, and never will as long as retriggers upload over the top of the previous job.
Comment 3•9 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #2) > The opt Mac build was double-retriggered, finishing 3 minutes apart. That > means that while test jobs triggered by the first retrigger were > downloading, the second retrigger was uploading over the top. That doesn't > go well for us, and never will as long as retriggers upload over the top of > the previous job. Is this happening because we're pulling from latest-* rather than from the revision-specific dir? Do we have a bug on file for that already?
Comment 4•9 years ago
|
||
No, it's happening because retriggers intentionally replace the job they redo, in-place. Even though we had the same conversation over and over when we first saw this, I can't remember what catlee's part of it is, where he explains why that's a good thing even though it means that retriggering a job causes all evidence of the original job to completely disappear, and causes this if you retrigger twice, and causes the results of retriggering a build because you didn't like the results of the tests run on it to be completely nondeterministic because you absolutely cannot tell whether the new build or the old build was downloaded by the new test jobs.
Reporter | ||
Comment 5•9 years ago
|
||
sounds like this issue could be improved. However, since it doesn't seem to happen on a regular frequency (correct me if I'm wrong), I think this will sit on lower priority against what we currently have on our plate. filed 1138512 to track the effort but I won't leave it open without an assignee.
No longer depends on: 1138512
Reporter | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•