Closed Bug 937771 Opened 11 years ago Closed 11 years ago

Run linux64-br-haz builder on b2g-inbound/mozilla-inbound/fx-team/mozilla-central and try-by-default

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sfink, Assigned: sfink)

References

Details

Attachments

(3 files, 2 obsolete files)

The hazards builders seem to be working, and people are constantly adding new hazards. So we really want to run these per-checkin on mozilla-inbound.
Blocks: 898606
Depends on: 916677
This patch both needs review as in "is this the right way to do it?" (especially wrt the static analysis exclusions), but also "is releng ok with us doing this?"

Specifically, this will add a JS shell clobber build + full-browser clobber build + analysis for every inbound and try push. (All 3 of those are done in the linux64-br-haz builder's mozharness script.) It's definitely some added load, but currently we're seeing several regressions a week that this would catch, and these regressions are going to become very serious (often security-sensitive) once we turn on exact rooting for GGC.

The clobber build is needed because the analysis only gets to see the code that was compiled, so we need to make sure to compile all of the code.

I haven't hooked it up yet, but these builds will not just be used for catching analysis regressions. They also upload the analysis results, and I plan to fetch those automatically to feed into an IRC bot and a web reporting mechanism. (It's done via pulsebuildmonitor, so I only need the buildbot stuff to do the upload and report the job results as usual.)
Attachment #830974 - Flags: review?(bhearsum)
Found in triage.
Component: Other → General Automation
QA Contact: joduinn → catlee
(In reply to Steve Fink [:sfink] from comment #1)
> Created attachment 830974 [details] [diff] [review]
> Run linux64-br-haz builder on m-c derived branches and try
> 
> This patch both needs review as in "is this the right way to do it?"
> (especially wrt the static analysis exclusions), but also "is releng ok with
> us doing this?"

So we're talking about one extra build per push. How much time does the job take?
Flags: needinfo?(sphink)
(In reply to Ben Hearsum [:bhearsum] from comment #3)
> So we're talking about one extra build per push. How much time does the job
> take?

The last one I ran took 1 hour 50 minutes: https://tbpl.mozilla.org/php/getParsedLog.php?id=30127285&tree=Try&full=1

The breakdown looks something like:

 6 minutes - check out the code and set up the mock environment
 6 minutes - optimized JS shell clobber build
 1 hour 21 minutes - full clobber browser build with analysis plugin enabled
 15 minutes - analyzing results
 15 seconds - uploading results and finishing up

Whoa! That seems horribly slow to me, but then I compared it to a standard linux64 opt build - https://tbpl.mozilla.org/php/getParsedLog.php?id=30494152&tree=Try - and that came to 1 hour 44 minutes. I hadn't realized the builders were that slow.

Anyway, it seems it's pretty much the same cost as a regular linux64 opt build.
Flags: needinfo?(sphink)
(In reply to Steve Fink [:sfink] from comment #4)
> (In reply to Ben Hearsum [:bhearsum] from comment #3)
> > So we're talking about one extra build per push. How much time does the job
> > take?
> 
> The last one I ran took 1 hour 50 minutes:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=30127285&tree=Try&full=1

Given that these jobs scale to ec2, we shouldn't be impacting turnaround times by adding them. It's probably okay to start running them given that. We can always shut them off later if we think they cost too much. Any thoughts John or Chris?
Flags: needinfo?(joduinn)
Flags: needinfo?(catlee)
(thanks bhearsum!)

sfink: some quick thoughts from catlee/myself at offsite:

1) Glad to hear these additional jobs will *only* run on linux64, which we can run on AWS... meaning no impact to our inhouse capacity/load.

2) any idea how often you expect these hazard builds to catch a problem? If its "not often", we could save ourselves some financial cost by scheduling these to be run on a periodic basis (like we do for PGO builds). If these hazard builds catch problems "often", we can schedule these to be run per-checkin. (The definition of predicting "often" in future events is a gut call, so as you know these builds best, you tell us what you think feels right here!)
Flags: needinfo?(joduinn)
(bah - missed the needinfo checkbox, sorry for noise)
Flags: needinfo?(catlee) → needinfo?(sphink)
(In reply to John O'Duinn [:joduinn] from comment #6)
> (thanks bhearsum!)
> 
> 2) any idea how often you expect these hazard builds to catch a problem? If
> its "not often", we could save ourselves some financial cost by scheduling
> these to be run on a periodic basis (like we do for PGO builds). If these
> hazard builds catch problems "often", we can schedule these to be run
> per-checkin. (The definition of predicting "often" in future events is a gut
> call, so as you know these builds best, you tell us what you think feels
> right here!)

As I said in comment 0, these would catch several problems a week. Although given the last week or two, it's looking even more frequently than that.

Hopefully, the regression rate will drop down eventually, at which time we could definitely consider running them on a schedule. But right now, my general gut feel is "very very often!" (We end up having to chase these down ourselves right now; with this build, we could get the committer to fix it.)
Flags: needinfo?(sphink)
Flags: needinfo?(joduinn)
Summary of irc w/sfink:

1) Currently, sfink is running this on his own machine, and getting notified every 4 hours if a regression happens. This gives the JS team a 4 hour regression window to find and fix any regression.

2) In advance of work week next week, sfink would like this new build type stood up in production. Having these run in production, and per checkin, would allow sheriffs (not the JS team) to monitor and back out any bustages.

3) These hazard builds take ~90mins each. As these new builds all run on AWS, we're not worried about impact to production load, but we do want to make sure we are being prudent with cash. Some options are:
3a) per-checkin on some branches
3b) per-checkin on all branches
3c) periodic 4? times per day (like our PGO builds)
3d) nightly

4) Based on comments above, and irc, it sounds like nightly and periodic don't feel right. For now, we've agreed to go with :
* per checkin on the 3 *inbound branches (~33% of checkins)
* on try-by-default. (~50% of checkins).


This is a significant increase in load, but sfink thinks this is worth it for now, so lets do this and get it running in advance of the workweek. There's a big downtime scheduled for this weekend which will complicate matters, but let's see if we can do this. 

After the JS workweek is over, sfink to revisit with RelEng and see if this load feels worthwhile and should be continued or whether we should switch to reduced cadence. Also sfink to followup with sheriffs to see if try-by-default is required or if we can instead use try-by-opt-in.
Flags: needinfo?(joduinn)
Summary: Run linux64-br-haz builder on m-c derived branches and try → Run linux64-br-haz builder on b2g-inbound/mozilla-inbound/fx-team/mozilla-central and try-by-default
Don't run these on try by default yet, unless the sheriffs request it. I have sent email to the sheriffs. Until they respond, I believe this patch represents our currently agreed-upon set of builds.
Attachment #832749 - Flags: review?(bhearsum)
Attachment #830974 - Attachment is obsolete: true
Attachment #830974 - Flags: review?(bhearsum)
"The sheriffs" are far from monolithic, but I request it in the strongest possible terms. Allowing ASan to not be try_by_default was a terrible mistake, and I have no reason to believe that this would not be one as well.
I have one sheriff response in comment 11, so reverting this to try-by-default. If opinions change by bhearsum-morning, then bhearsum is welcome to set it whichever way. :-)
Attachment #832755 - Flags: review?(bhearsum)
Attachment #832749 - Attachment is obsolete: true
Attachment #832749 - Flags: review?(bhearsum)
Comment on attachment 832755 [details] [diff] [review]
Run linux64-br-haz builder on m-c derived branches and try

Review of attachment 832755 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me, we can reconfig for it this morning sometime.

::: mozilla/config.py
@@ +2322,5 @@
>  
>  # Static analysis happens only on m-c and derived branches.
>  for branch in ("mozilla-aurora", "mozilla-beta", "mozilla-release",
>                 "mozilla-b2g18", "mozilla-b2g18_v1_0_1",
>                 "mozilla-b2g18_v1_1_0_hd", "mozilla-esr17", "mozilla-esr24"):

Not really relevant to this patch, but I wonder if we can improve this conditional some how. Maybe some helper function that returns "m-c and derived branches", similar to the items_before() helper. Maybe something that returns all branches that have a given gecko version....
Attachment #832755 - Flags: review?(bhearsum) → review+
Oh, right, I need to add it to trychooser.

Ugly. We're going to someday have to either structure the platforms options, or come up with a different mechanism for these "build test" jobs.
Attachment #832981 - Flags: review?(emorley)
in production
So the problem with the b2g-inbound hazard builds turned out to be that there's a catchall rule for b2g builds:

        /b2g.*_dep/i.test(name) ? "Unknown B2G Device Image Build" :

and it came before the hazard rules:

        /-br-haz/.test(name) ? "Static Rooting Hazard Analysis, Full Browser" :
        /-sh-haz/.test(name) ? "Static Rooting Hazard Analysis, JS Shell" :

The build name in question is

        linux64-br-haz_b2g-inbound_dep
Attachment #833008 - Flags: review?(ryanvm)
Attachment #833008 - Flags: review?(ryanvm) → review+
Attachment #832981 - Flags: review?(emorley) → review+
Comment on attachment 833008 [details] [diff] [review]
Reorder build name tests to fix b2g-inbound hazards

https://hg.mozilla.org/webtools/tbpl/rev/a694339933b9
Attachment #833008 - Flags: checked-in+
Depends on: 939954
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #18)
> Comment on attachment 833008 [details] [diff] [review]
> Reorder build name tests to fix b2g-inbound hazards
> 
> https://hg.mozilla.org/webtools/tbpl/rev/a694339933b9

TBPL patch in production.
Attachment #832755 - Flags: checked-in+
Attachment #832981 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Depends on: 1045165
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: