Closed Bug 1184422 Opened 9 years ago Closed 9 years ago

repo failures breaking b2g builds

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

ARM
Gonk (Firefox OS)
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Unassigned)

References

Details

A bad commit landed on upstream master for "repo". The b2g build system insists on taking updates for "repo", so all build slaves got into a bad state, forcing the trees to be closed.
Gaia and trunk trees were closed at 18:11 PT for this.
Severity: normal → blocker
workaround to get trees open again:
 - stop mirroring from https://gerrit.googlesource.com/git-repo to git.mozilla.org
 - strip offending commit from tip of master on vcs-sync host
 - force push to git.mozilla.org (required temporary unset of denyNonFastForward protection)
Trees reopened at 21:02 PT.
(In reply to Hal Wine [:hwine] (use NI) from comment #2)
> workaround to get trees open again:
>  - stop mirroring from https://gerrit.googlesource.com/git-repo to
> git.mozilla.org

NOTE: once issue fixed, the vcs-sync mirroring of git-repo will need to be restarted. A bug should be filed to Developer Services :: General requesting that.
We're seeing frequent 600s timeouts again like we had a month ago. I wonder if we ended up with a corrupted repo during all of this again. Trees re-closed.
https://treeherder.mozilla.org/logviewer.html#?job_id=11756591&repo=mozilla-inbound
No output during the ten minutes before the timeout, will try to hop on a node to take a look:


22:44:34 INFO - repo has been initialized in /builds/slave/b2g_m-in_l64-b2g-haz_dep-00000/build
22:54:34 INFO - Automation Error: mozprocess timed out after 600 seconds running ['./config.sh', '-q', 'emulator-jb', '/builds/slave/b2g_m-in_l64-b2g-haz_dep-00000/build/tmp_manifest/emulator-jb.xml']
22:54:34 ERROR - timed out after 600 seconds of no output
22:54:34 ERROR - Return code: -9
Was there any changes made to the repo forAll based repository verification step that was being done?
Component: Gaia::Build → Buildduty
Product: Firefox OS → Release Engineering
QA Contact: bugspam.Callek
The last time we saw this, and there was a corrupted repo, the smoking gun was a hung git process on a host. There wasn't any log output indicating which repository was hanging everything because repo spawns VCS commands in parallel.
No corrupted repo this time.
Okay, closing this bug. This bug is about a specific issue with 'repo' which was addressed by the workaround in comment 2.

The 600s timeout appears to be a coincidental-in-time issue, with same tree impact, but is reported differently and appears to have a different root cause. A new bug will be opened for that.

The root cause of this bug remains the way 'repo' is being deployed by mozharness in buildbot land. I'll open another but to find a long term fix for that.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 1184594
See Also: → 1184635
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.