Closed Bug 810439 Opened 12 years ago Closed 12 years ago

mozharness support for mozpool

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: jhopkins)

References

Details

(Whiteboard: [mozharness][mozpool])

Attachments

(4 files, 1 obsolete file)

small changes 12 years ago Aki Sasaki (not active) 1.25 KB, patch	dustin : review+ mozilla : checked-in+	Details \| Diff \| Splinter Review
wip: mozpool api tie-in, through the devices section 12 years ago Aki Sasaki (not active) 12.50 KB, patch		Details \| Diff \| Splinter Review
[mozharness] client-side api + unit tests, initial b2g test + config 12 years ago John Hopkins (:jhopkins) 53.64 KB, patch	mozilla : review-	Details \| Diff \| Splinter Review
[mozharness] updated patch 12 years ago John Hopkins (:jhopkins) 50.35 KB, patch	mozilla : review+ jhopkins : checked-in+	Details \| Diff \| Splinter Review
[mozharness] add 'environment' to request_device() 12 years ago John Hopkins (:jhopkins) 5.66 KB, patch	dustin : review+ jhopkins : checked-in+	Details \| Diff \| Splinter Review

Aki Sasaki (not active)

Reporter

Description

•

12 years ago

http://hg.mozilla.org/build/bmm/file/default/API.txt
http://hg.mozilla.org/build/bmm/file/default/README.md

Mark says Dustin may have a staging server set up shortly, which would help development/verifying that this actually works.

Dustin J. Mitchell [:dustin] (he/him)

Comment 1

•

12 years ago

You should be able to do initial development against a local copy.  The README.md pretty much explains how to do that, but Mark or I can help, too.

Aki Sasaki (not active)

Reporter

Comment 2

•

12 years ago

Attached patch small changes — Details — Splinter Review

s,bmm-inventorysync,mozpool-inventorysync,
add an argparse dependency, which mozpool-inventorysync depends on.

Attachment #680914 - Flags: review?(dustin)

Aki Sasaki (not active)

Reporter

Comment 3

•

12 years ago

Starting... https://github.com/escapewindow/mozharness/compare/master...mozpool

Dustin J. Mitchell [:dustin] (he/him)

Comment 4

•

12 years ago

Comment on attachment 680914 [details] [diff] [review]
small changes

Thanks!

Attachment #680914 - Flags: review?(dustin) → review+

Dustin J. Mitchell [:dustin] (he/him)

Comment 5

•

12 years ago

Looking good!

Is mozpool-server adequate for your testing?  Currently all of the hardware in scl1 is hooked up to the same pool of devices, some of which are in production, so it could be a little tricky to get you a real-life server to play with.

When we get the power situation sorted out (just came up last night), I could probably isolate a rack for you as a different pool, where you would not worry about affecting production pandas.  Let me know if/when that becomes desirable.

Aki Sasaki (not active)

Reporter

Comment 6

•

12 years ago

Comment on attachment 680914 [details] [diff] [review]
small changes

http://hg.mozilla.org/build/bmm/rev/a186e7a606f8

Attachment #680914 - Flags: checked-in+

Aki Sasaki (not active)

Reporter

Comment 7

•

12 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> Looking good!
> 
> Is mozpool-server adequate for your testing?  Currently all of the hardware
> in scl1 is hooked up to the same pool of devices, some of which are in
> production, so it could be a little tricky to get you a real-life server to
> play with.

Thanks!
I'm making do so far, but I've only written/tested read-only portions of the api.  I'm guessing the mozpool requests are harmless, but would like to double check.  I'm certain that at least some of the lifeguard/bmm api isn't safe to test against my laptop + production pandas.

> When we get the power situation sorted out (just came up last night), I
> could probably isolate a rack for you as a different pool, where you would
> not worry about affecting production pandas.  Let me know if/when that
> becomes desirable.

That would be cool if/when doable.  I'm not sure I need an entire rack, but if that's easiest to isolate I can expand the testing matrix if I have more pandas.  I imagine we'll need a staging pool when we're trying to get the tests up and running anyway.

Dustin J. Mitchell [:dustin] (he/him)

Comment 8

•

12 years ago

Incidentally, see
  http://hg.mozilla.org/build/mozpool/file/66b87baf681b/README.md#l108
specifically the part about hitting any mobile-imaging server.  In a foopyish world, it'd probably be best for each foopy to hit its "local" imaging server first, and if that server fails then select another at random.  In a foopyless world, it'd be best to just hit them randomly.  In either case, the idea is to make sure there's an even distribution of requests across imaging servers, to limit the damage if one imaging server goes down.

We can easily configure Puppet to keep the list of imaging servers up to date.

Dustin J. Mitchell [:dustin] (he/him)

Comment 9

•

12 years ago

..and the other part of the idea is that if a particular imaging server is down, work goes on without it, with foopies or buildslaves contacting other imaging servers instead, rather than failing.

Aki Sasaki (not active)

Reporter

Updated

•

12 years ago

Blocks: 805056, b2g-testing-track, 778249

cmtalbert

Comment 10

•

12 years ago

The pandas in chassis 3 can/should be used as the staging server for this if needed.

Dustin J. Mitchell [:dustin] (he/him)

Comment 11

•

12 years ago

Well, given I'm still using those, let me know first so I don't mess things up!

Aki Sasaki (not active)

Reporter

Comment 12

•

12 years ago

I have this about a quarter written.  John Hopkins may be taking over.

Blocks: 802317

Mark Côté [:mcote]

Comment 13

•

12 years ago

The requests part has been fully implemented now, btw. Slight change to the API but not much. I have explained in API.txt how requests work (high-level summary: send request with details about reservation and desired image, poll request object until state changes to 'ready' or some error).

You can test this locally, although you'll have to cycle through the imaging/power-cycling tests yourself (not difficult). I am adding a section to the MozPool wiki (https://wiki.mozilla.org/index.php?title=Auto-tools/Projects/MozPool) about testing locally. Btw the wiki page is a bit out of date now; prefer the README.txt in the source for a better, more comprehensive description. I'll update the wiki page when I have some time.

Aki Sasaki (not active)

Reporter

Comment 14

•

12 years ago

Attached patch wip: mozpool api tie-in, through the devices section — Details — Splinter Review

In patch format for posterity.

Mark Côté [:mcote]

Comment 15

•

12 years ago

Moved the testing section to its own page since it got long: https://wiki.mozilla.org/Auto-tools/Projects/MozPool/LocalTesting

It's still linked from the main mozpool page.

Mark Côté [:mcote]

Comment 16

•

12 years ago

Very slight API change that doesn't look like it affects what you've written so far: rather than returning both a "request" object and "request_url" string from a device-request call, there's now just a "request" object that has an additional "url" attribute. API.txt has been updated accordingly.

Aki Sasaki (not active)

Reporter

Updated

•

12 years ago

Assignee: aki → jhopkins

Armen [:armenzg]

Comment 17

•

12 years ago

I'm going to dupe bug 778249 to bug 777714 rather than having two releng tracking bugs.

No longer blocks: 778249

Mark Côté [:mcote]

Comment 18

•

12 years ago

Another small modification to the API that I think makes more sense: /request/list/ now, by default, doesn't include closed requests.  /request/list/?include_closed=1 will return all.  API.txt has been updated accordingly.

There might be other small modifications as I build out the UI.  I will document them all here.  I expect to have the first version done by the end of week, after which I will endeavour to make everything backwards compatible.

Mark Côté [:mcote]

Comment 19

•

12 years ago

A few small changes landing soon. Request "expires" attribute (from request/list/ or from device/*/request/) now in UTC (to avoid timezone madness!).  request/list/ now has "device_state" (if device has been assigned) and no longer includes "state_counters" and "state_timeout" (internal usage).

Armen [:armenzg]

Comment 20

•

12 years ago

Should the milestone of this bug be "B2G C2 (20nov-10dec)"?

Aki Sasaki (not active)

Reporter

Comment 21

•

12 years ago

(In reply to Armen Zambrano G. [:armenzg] from comment #20)
> Should the milestone of this bug be "B2G C2 (20nov-10dec)"?

Yes.
I emailed Alex a list of bugs to nominate on Monday? but he's been busy with the merge + mass of emails etc.  We should proceed as if this were already marked as due Dec10.

blocking-basecamp: --- → ?

Mark Côté [:mcote]

Comment 22

•

12 years ago

Hopefully last API change. /device/.../request/ now takes a JSON object as POST data with these keys (for b2g imaging, the only supported request atm):

'assignee': as before (free form, suggested to be email address or slave hostname)
'duration': as before, length of request in seconds
'image': must be 'b2g'
'b2gbase': URL to panda b2g build

So in other words, the changes are that 'image' is now required and, for the moment, must be 'b2g', and instead of a 'boot_config' object with 'version' and 'b2gbase' keys, it's now just a 'b2gbase' key in the main object (no 'boot_config' anymore).

This will ease future features (for Fennec automation, autophone, etc.) while hiding implementation details (PXE configs, low-level images, etc.) from test clients.

This should be deployed to production this afternoon (EST).

blocking-basecamp: ? → ---

John Hopkins (:jhopkins)

Assignee

Comment 23

•

12 years ago

Attached patch [mozharness] client-side api + unit tests, initial b2g test + config (obsolete) — Details — Splinter Review

Attachment #687265 - Flags: review?(aki)

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

12 years ago

Attachment #687265 - Flags: feedback?(dustin)

Aki Sasaki (not active)

Reporter

Comment 24

•

12 years ago

Landed http://hg.mozilla.org/build/mozharness/rev/3a7a1decf93c to silence spurious requests-caused pyflakes errors.

Aki Sasaki (not active)

Reporter

Comment 25

•

12 years ago

Comment on attachment 687265 [details] [diff] [review]
[mozharness] client-side api + unit tests, initial b2g test + config

Cool, thanks for this.
I'm writing the comments first, and can try helping fix some of these issues afterwards.

I'm r-'ing because there's a lot here to fix, and some of it will affect the rest of mozharness negatively.

However, I do want to see this fixed up and landed soon, and I'm willing to help.  I started an "aki" branch off the tip of mozpool: http://hg.mozilla.org/users/jhopkins_mozilla.com/mozharness-mozpool

Hopefully that will help ?

===

Pyflakes says:

scripts/b2g_panda.py:15: 'PythonErrorList' imported but unused
scripts/b2g_panda.py:16: 'TBPL_WARNING' imported but unused
scripts/b2g_panda.py:16: 'TBPL_RETRY' imported but unused
scripts/b2g_panda.py:16: 'TBPL_SUCCESS' imported but unused
scripts/b2g_panda.py:16: 'TBPL_FAILURE' imported but unused
scripts/b2g_panda.py:20: 'INFO' imported but unused
scripts/b2g_panda.py:20: 'OutputParser' imported but unused
scripts/b2g_panda.py:74: local variable 'c' is assigned to but never used

Pet peeve nit: could you remove trailing whitespace?  You've got quite a bit.
:%s, *$,,
will fix it in vim.

I have the following in my .vimrc:

" Use the below highlight group when displaying bad whitespace is desired
highlight BadWhitespace ctermbg=red guibg=red
" Display tabs at the beginning of a line in Python mode as bad.
au BufRead,BufNewFile *.py,*.pyw,*.cfg match BadWhitespace /^\t\+/
" Make trailing whitespace be flagged as bad.
au BufRead,BufNewFile *.py,*.pyw,*.cfg,*.c,*.h match BadWhitespace /\s\+$/

I think jlund had something that automatically removed trailing whitespace whenever he edited a file; I haven't gone that far.

>+config = {
>+    # Values for the foopies
>+    "exes": {
>+        'python': '/tools/buildbot/bin/python',
>+        'virtualenv': ['/tools/buildbot/bin/python', '/tools/misc-python/virtualenv.py'],
>+    },
>+    "virtualenv_path": "venv",

We may get this for free if you add virtualenv_config_options to self.config_options.

>-        self.debug("Temporary files: %s and %s" % (tmp_stdout_filename, tmp_stderr_filename))
>+        #XXX: changed from self.debug to self.log due to this error:
>+        #     TypeError: debug() takes exactly 1 argument (2 given)
>+        self.log("Temporary files: %s and %s" % (tmp_stdout_filename, tmp_stderr_filename))

I see you lost my pastebin; I'll explain why you hit that error below.
Sadly, it would have been a lot less work to fix this when I gave you the pastebin, rather than after you added a bunch of unittests using the wrong structure.

Ideally, we don't land this debug->log change at all.
If the time crunch makes this the sanest option, change this to
    self.log("Temporary files: %s and %s" % (tmp_stdout_filename, tmp_stderr_filename), level=DEBUG)
and we'll need a followup bug to revert this once the unittests are fixed.

>+def check_mozpool_status(status):
>+    if not mozpool_status_ok(status):
>+        if status == 409:
>+            raise MozpoolConflictException()
>+        import pprint
>+        raise Exception('mozpool status not ok, code %s' % pprint.pformat(status))

Calling an uncaught Exception here would seem to make this rather fragile.  Is this mainly for debugging purposes or are we planning to keep it this way?

We could change this to an UnknownMozpoolException or something if we want to catch it and throw an error.

If we actually want to exit the process at this point, I prefer self.fatal() over raising an exception, unless we explicitly want to be able to catch that exception downstream.  We're catching MozpoolConflictException; we're not catching the Exception.

self.fatal() allows us to log the message appropriately; you could take advantage of it if you moved check_mozpool_status() into MozpoolHandler and started calling self.check_mozpool_status().

>diff --git a/mpunit.sh b/mpunit.sh

At some point in the future, I'd like this testable standalone, without having to be networked and/or having mozpool-server running locally.  We'll need to have a mock response to do this, most likely.  But for this first pass, this was the perfect approach.

>+#TODO - adjust these values
>+MAX_RETRIES = 20
>+RETRY_INTERVAL = 30
>+REQUEST_DURATION = 60 * 20

At some point we may want this in a config file, or overrideable via config/commandline, but this is fine for now.

>+
>+class PandaTest(TestingMixin, BaseScript, VirtualenvMixin, MozpoolMixin, LogMixin):

You don't need LogMixin (BaseScript already inherits it).
You will need BuildbotMixin for self.read_buildbot_config(), self.set_buildbot_property(), and self.buildbot_status().

>+    virtualenv_modules = [
>+        'requests',
>+    ]

We may need mozdevice to use devicemanager.py for SUT calls, but this is fine for now.

>+        assert self.mozpool_device is not None, '--mozpool-device is required'
>+        self.mozpool_assignee = self.config.get('mozpool_assignee')
>+        if self.mozpool_assignee is None:
>+            self.mozpool_assignee = self.buildbot_config.get('properties')["slavename"]
>+        assert self.mozpool_assignee is not None, '--mozpool-assignee is required'

Could you change these asserts to self.fatal(message) ?

>+    def query_abs_dirs(self):
>+        if self.abs_dirs:
>+            return self.abs_dirs
>+        abs_dirs = super(PandaTest, self).query_abs_dirs()
>+        dirs = {}
>+        for key in dirs.keys():
>+            if key not in abs_dirs:
>+                abs_dirs[key] = dirs[key]
>+        self.abs_dirs = abs_dirs
>+        return self.abs_dirs

We don't need this method redefined here at all if we're not going to define any directories, but I'm fine leaving it til we know for sure.

>+    def _build_arg(self, option, value):
>+        """
>+        Build a command line argument
>+        """
>+        if not value:
>+            return []
>+        return [str(option), str(value)]

We don't seem to be using this at all, but maybe when we populate run_tests()

>+    def _force_device_state(self, device, assignee, duration, desired_state):
>+        mph = self.query_mozpool_handler()
>+        dd = mph.query_device_status(device)
>+        old_state = dd['state']
>+        mph.device_state_change(device, assignee, duration, old_state, desired_state)

Same here.

>+    def retry_sleep(self, sleep_time=1, max_retries=5, error_message=None):
>+        for x in range(1, max_retries + 1):
>+            yield x
>+            sleep(sleep_time)
>+        if error_message:
>+            self.log(error_message, ERROR)

I think you either mean self.log(error_message, level=ERROR) or self.error(error_message).

>+        raise Exception('Retries limit exceeded')

This should be a fatal().
We may want to do something like self.buildbot_status(TBPL_RETRY) as well if that's the behavior we want from buildbot.

>+            except MozpoolConflictException:
>+                self.log("Device unavailable.  Retry#%i.." % retry, WARNING)

I think you mean self.warning(...) or level=WARNING

>+
>+        self.request_url = response['request']['url']
>+        self.log("Got request, url=%s" % self.request_url)

I'm going to guess this needs to be self.info()

>+            if state == 'ready':
>+                return
>+            self.log("Waiting for request 'ready' stage.  Current state: '%s'" % state)

self.info()

>+    def close_request(self):
>+        mph = self.query_mozpool_handler()
>+        mph.close_request(self.request_url)
>+        print("Request '%s' deleted on cleanup" % self.request_url)
>+        self.request_url = None

>+# MozpoolTest {{{1
>+class MozpoolTest(VirtualenvMixin, MozpoolMixin, BaseScript):

Do you need this anymore?
If not we can nuke it. (scripts/mozpooltest.py)

>+class BaseMozpoolTest(unittest.TestCase, VirtualenvMixin, MozpoolMixin, BaseScript):

The reason you're having problems with debug() is unittest.TestCase has its own debug() that's conflicting.  You need to change this to only be a unittest.TestCase.

You can then set self.mozpool_script to a BaseScript object using MozpoolMixin.

To make the move less painful, you can then

    self.mph = self.mozpool_script.mph

This would have been easiest to do early on, rather than after all of these had been written to use the wrong structure.  I can try to help here.

There are also a lot of unused imports etc here.

Attachment #687265 - Flags: review?(aki) → review-

Aki Sasaki (not active)

Reporter

Comment 26

•

12 years ago

http://hg.mozilla.org/users/jhopkins_mozilla.com/mozharness-mozpool/rev/8907c9fc1da3 may have fixed some of your unittest issues.

However, I'm not passing my unittests locally; not sure if that's because my db is set up differently than yours, or if I just broke something.

John Hopkins (:jhopkins)

Assignee

Comment 27

•

12 years ago

Attached patch [mozharness] updated patch — Details — Splinter Review

Attachment #687265 - Attachment is obsolete: true

Attachment #687265 - Flags: feedback?(dustin)

Attachment #687985 - Flags: review?(aki)

Aki Sasaki (not active)

Reporter

Comment 28

•

12 years ago

Comment on attachment 687985 [details] [diff] [review]
[mozharness] updated patch

>+    def wait_for_request_ready(self):
>+        mph = self.query_mozpool_handler()
>+        for retry in self.retry_sleep(sleep_time=RETRY_INTERVAL, max_retries=MAX_RETRIES,
>+                error_message="INFRA-ERROR: Request did not become ready in time",
>+                tbpl_status=TBPL_RETRY):
>+            response = mph.query_request_status(self.request_url)
>+            state = response['state']
>+            if state == 'ready':
>+                return
>+            self.log("Waiting for request 'ready' stage.  Current state: '%s'" % state)

self.info() ?

Attachment #687985 - Flags: review?(aki) → review+

John Hopkins (:jhopkins)

Assignee

Comment 29

•

12 years ago

Comment on attachment 687985 [details] [diff] [review]
[mozharness] updated patch

Landed in http://hg.mozilla.org/build/mozharness/rev/da5349059059

Attachment #687985 - Flags: checked-in+

John Hopkins (:jhopkins)

Assignee

Comment 30

•

12 years ago

Keeping this bug open as we will need some more client-side changes in response to server-side updates in bug 817035.

Aki Sasaki (not active)

Reporter

Updated

•

12 years ago

Depends on: 817035

John Hopkins (:jhopkins)

Assignee

Comment 31

•

12 years ago

Attached patch [mozharness] add 'environment' to request_device() — Details — Splinter Review

Attachment #689285 - Flags: review?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

12 years ago

Attachment #689285 - Flags: review?(dustin) → review+

John Hopkins (:jhopkins)

Assignee

Comment 32

•

12 years ago

Comment on attachment 689285 [details] [diff] [review]
[mozharness] add 'environment' to request_device()

Landed in http://hg.mozilla.org/build/mozharness/rev/c3e91b76976e

Attachment #689285 - Flags: checked-in+

John Hopkins (:jhopkins)

Assignee

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Chris Cooper [:coop] (he/him)

Updated

•

10 years ago

Component: General Automation → Mozharness

You need to log in before you can comment on or make changes to this bug.