Closed Bug 913658 Opened 11 years ago Closed 6 years ago

Need buildername regular expressions and associated properties published to an API

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jeads, Unassigned)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3329] )

Attachments

(1 file)

There are many applications utilizing build/test data that derive platform, build, and job type information by parsing buildername strings. The buildername property is overloaded with a multitude of different pieces of information and exists in several string formats but is not structured and requires the application of regular expressions to retrieve useful information. Maintaining this list of regexes and property inferences introduces fragility into a number of applications.

We tried to address this by adding explicit properties to the pulse data stream (Bug 862595 Bug 862612), but these have not been implemented yet, perhaps because of resource limitations.

Another way of providing this information, that would not require modifying buildbot directly, would be to publish the list of buildername regular expressions along with their associated property inferences in a structured format. We've built a data structure that does this in treeherder, it looks like this,

   'fedora64': {
        'regexes': [
            re.compile('Rev3 Fedora 12x64 .+'),
            re.compile('jetpack-.*-fedora64'),
            re.compile('^Linux x86-64'),
        ],

        'attributes': {
            'os': 'linux',
            'os_platform': 'Fedora 12',
            'arch': 'x86_64',
            'vm': False
        }
    },

    ...

The same sort of approach is taken with the build and job type inferences. The complete structure can be found here, https://github.com/mozilla/treeherder-service/blob/master/treeherder/etl/buildbot.py#L25, it was adapted from the buildername regular expressions found in http://mxr.mozilla.org/build/source/buildapi/buildapi/model/util.py.

The main shortcoming of this strategy is that as new buildernames are added or modified the list of regular expressions/properties become obsolete.

If a buildername regular expression/property structure could be published in a JSON file, downstream applications could ingest it dynamically. This would have the following benefits:

1.) Any modification to buildername regular expressions or property inferences would be immediately incorporated into downstream applications.

2.) Release Engineering could control the way that buildernames are parsed and the way that properties are inferred. Most downstream applications fail to do this accurately in one way or another.

3.) There would be a single source of regular expressions and inferred properties on which we could all rely. This would enable us to safely use try/except logic when parsing buildernames so any exceptions could be detected and reported immediately.

4.) The world would be a better place. Mozillans working on these applications would be more pleasant to be around.
This would probably belong in BuildAPI.
Component: Release Automation → Tools
QA Contact: bhearsum → hwine
We can do better than publishing regular expressions (which can be different depending on the programming language, etc). Instead, I would publish a set of string tags/properties that describe each builder.

We'd have tags like {linux, 32-bit, osx, windows, pgo, non-pgo, debug, release, asan, xpcshell-test, mochitest-1}. Each builder (and buildbot job by extension) can have N tags applied to it. A builder is thus a unique set of active tags.

We could still "index" jobs by their builder name. But having tags is much more powerful in that it facilitates rapid filtering without requiring downstream consumers to invent the filtering ontology. Instead, they simply apply "is a"/"in" checks against the set of active tags for a builder/job. Want to easily obtain all "xpcshell test jobs for Win32 non-debug non-pgo builds?" Simple: just filter on {win32, release, nonpgo, xpcshell-test} in <builder tags>.
Publishing regular expressions is certainly not an ideal solution. I would prefer no regular expressions and explicit properties but that doesn't look like it's doable in a reasonable amount of time, so we're looking for some kind of compromise that can be accomplished soon.

Tags would definitely be a significant improvement over regular expressions. As described by gps, they would work great for filtering but without a bit more meta data there would be some limitations in our options to order and group relevant data.

So in addition to the tag list associated with a builder name, {linux, 32-bit, osx, windows, pgo, non-pgo, debug, release, asan, xpcshell-test, mochitest-1}, we could also publish some meta data associated with the tags to help group/order by them when necessary. So maybe something like this:

{ os: { android, mac, linux, win ... },
  os_platform: { 'OS X 10.8.2', 'OS X 10.7.2', 'fedora 17', 'Ubuntu 12.04' ... },
  arch: { x86, x86_64 ... },
  build_type: { opt, debug ... },
  job_type: { build, unittest, talos, repack ... },

  ...
}

This could be a single file published to the buildapi that we could ingest as needed. This would provide downstream consumers with a single source of tag meta data.
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3322]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3322] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3327]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3327] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3329]
I took a stab at this today because I didn't want to write yet another set of regexes to categorize different types of jobs.

Attached is the mapping of buildbot buildername to a set of tags I've given each type of job.

Please have a look through and see if the naming makes sense. I'll proceed onto the test side of things once the build side is nailed down.

The idea is that these tags will be available as properties on the job, as well via reports like allthethings.json.
:catlee--  This looks really promising!  For reference, we have a file we use to test our code against a bunch of buildernames and the expected values after we run our regexes against them.  Here's that test file: https://github.com/mozilla/treeherder-service/blob/master/tests/etl/test_buildbot.py

But for your api, we wouldn't need all that.  Could you get all of this information, for example?:
(note, this buildername may or may not be out of date.  we have a few older ones in the tests)

{
    'Linux x86-64 mozilla-inbound leak test spidermonkey_tier_1-rootanalysis build': {
        'build_type': 'debug',
        'job_type': 'build',
        'platform': {
            'arch': 'x86_64',
            'os': 'linux',
            'os_platform': 'linux64',
            'vm': False
        }
    }
}

Note: bonus points if we could have some human-readable name in it, too:
    'name': 'SpiderMonkey Root Analysis Build'

This would really be huge for us.  we would LOVE to not have to handle regexes for these ourselves.  :)
Hmm--  I think I take back what I said in Comment 5.  I think we DO need the ``name`` section, like we have in the tests.  At least, those values in your structure one way or another.  Does that sound feasible? 

Thanks for tackling this!!  :)
Not sure what you mean - the buildbot name is already in this data structure. e.g.

  "Linux x86-64 mozilla-inbound leak test spidermonkey_tier_1-rootanalysis build": [
    "platform:linux64-debug", 
    "product:spidermonkey", 
    "schedule:perpush", 
    "type:build"
  ], 

I think a human friendly name is doable. e.g. "name:SpiderMonkey Root Analysis Build" as a tag?
Component: Tools → General
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: