Closed Bug 1083237 Opened 10 years ago Closed 6 years ago

Wrong test suite duration when using mozlog as test reporter

Categories

(Remote Protocol :: Marionette, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: parkouss, Unassigned)

Details

(Keywords: pi-marionette-runner)

Attachments

(1 file, 1 obsolete file)

If you run some Marionette tests like this:

$ cd mozilla-central/testing/marionette/client/marionette
$ python runtests.py --binary=/path/to/firefox --xml-output=file1.xml --log-xunit=file2.xml tests/unit/test_click.py

It will output two files with some content which looks like:

file1.xml:

<?xml version="1.0" encoding="utf-8"?>
<testsuite errors="0" failures="1" name="Marionette" skips="0" tests="2" time="6.7347099781">
	<testcase classname="test_click.TestClick" name="test_click" time="0.189944982529">
		<failure message="test failure">Traceback (most recent call last):
  File &quot;/home/jp/dev/mozilla/testing/marionette/client/marionette/marionette_test.py&quot;, line 267, in run
    testMethod()
  File &quot;/home/jp/dev/mozilla/testing/marionette/client/marionette/tests/unit/test_click.py&quot;, line 17, in test_click
    self.assertNotEqual(&quot;Clicked&quot;, self.marionette.execute_script(&quot;return document.getElementById('mozLink').innerHTML;&quot;))
AssertionError: 'Clicked' == u'Clicked'</failure>
	</testcase>
	<testcase classname="test_click.TestClick" name="testClickingALinkMadeUpOfNumbersIsHandledCorrectly" time="0.352576971054"/>
</testsuite>

file2.xml:

<?xml version="1.0" encoding="utf8"?>
<testsuite errors="0" failures="1" skips="0" tests="2" time="0.589000">
	<testcase classname="test_click_py TestClick_testClickingALinkMadeUpOfNumbersIsHandledCorrectly" name="test_click_py TestClick_testClickingALinkMadeUpOfNumbersIsHandledCorrectly" time="0.353000"/>
	<testcase classname="test_click_py TestClick_test_click" name="test_click_py TestClick_test_click" time="0.190000">
		<failure message="Expected PASS, got FAIL">AssertionError: 'Clicked' == u'Clicked'
Traceback (most recent call last):
  File &quot;/home/jp/dev/mozilla/testing/marionette/client/marionette/marionette_test.py&quot;, line 267, in run
    testMethod()
  File &quot;/home/jp/dev/mozilla/testing/marionette/client/marionette/tests/unit/test_click.py&quot;, line 17, in test_click
    self.assertNotEqual(&quot;Clicked&quot;, self.marionette.execute_script(&quot;return document.getElementById('mozLink').innerHTML;&quot;))
</failure>
	</testcase>
</testsuite>

Only the  "time" attribute of "testsuite" tag is interesting for us here. You can see that there is a big difference between "6.7347099781" (in file1.xml obtained with the --xml-output flag) and "0.589000" (in file2.xml obtained with the --log-xunit flag). Obviously "0.589000" is wrong here.

This is caused by the call of logger.start_suite (http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/runner/base.py#637) which is done after the creation of Marionnette and the http server startup.

Maybe moving this call up in the code (before marionette and http server creation) will resolve the issue here.
(In reply to Julien Pagès from comment #0)
> Maybe moving this call up in the code (before marionette and http server
> creation) will resolve the issue here.

Thanks for raising this Julien! I think it would make sense to move the start_suite up to where we set the start_time to give a better indication of the length of time the suite takes to run. Would you be interested in working on this?
Yes sure. :)

I was thinking that it is not really "obvious" that the current behavior is wrong; but to be consistent, it seems better to fix it like this. Just a thought.
Assignee: nobody → j.parkouss
Attachment #8505569 - Flags: review?(dave.hunt)
Comment on attachment 8505569 [details] [diff] [review]
Wrong test suite duration when using mozlog as test reporter

Review of attachment 8505569 [details] [diff] [review]:
-----------------------------------------------------------------

Just one comment, but deflecting review to Malini for her thoughts on this patch.

::: testing/marionette/client/marionette/runner/base.py
@@ +611,5 @@
>          self.marionette = Marionette(**self._build_kwargs())
>  
>      def run_tests(self, tests):
>          self.reset_test_stats()
>          self.start_time = time.time()

I think start_time should be immediately before suite_start.
Attachment #8505569 - Flags: review?(dave.hunt) → review?(mdas)
Comment on attachment 8505569 [details] [diff] [review]
Wrong test suite duration when using mozlog as test reporter

Review of attachment 8505569 [details] [diff] [review]:
-----------------------------------------------------------------

looks good, but I agree with davehunt, can you upload a modified patch?
I have updated the patch and ran some of the marionette tests locally with success.
Attachment #8505569 - Attachment is obsolete: true
Attachment #8505569 - Flags: review?(mdas)
Attachment #8507540 - Flags: review?(mdas)
Attachment #8507540 - Flags: review?(mdas) → review+
I think I understand what's going on:

By moving up the code, we call BaseMarionetteTestRunner.add_test before creating the marionette instance.

But in the add_test method, there is a need for the device when using manifest.ini files (http://dxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/runner/base.py#732)

Then the device ask for capabilities (http://dxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/runner/base.py#525)

And finally capabilities ask for the marionette instance, which is None (http://dxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/runner/base.py#515), and that leads to the exception : AttributeError: 'NoneType' object has no attribute 'start_session'.

It may be harder than I thought to fix this this way;

Maybe another solution for this problem is to add the ability for mozlog logger to specify the time the suite started. Something like:

time = time.time()
...
self.logger.suite_start(self.tests, time=time)

I don't think this works currently, but it could be an easy way to solve the problem. What do you think ? I can try the above code and open a bug if it does not work to implement it if you like the idea.
Could we just move the logger.suite_start up and leave the add_test calls where they are?
Good point. A lot easier. :)
Looking at this again, it is possible to call start_suite with an empty tests list identifiers, as stated in the doc: http://mozbase.readthedocs.org/en/latest/mozlog_structured.html.

I don't know what it implies (calling with or without test identifiers - this may be good to put it in the documentation once we know), but the results here are the same with the command line used in comment 0. But this time it won't break testing with manifest files. :)

Dave's solution may be the best if we don't care about passing the test identifiers in the logger.suite_start call. Tell me if that's OK for you and I will update the patch accordingly.
Since working on patches for bug 1065402 and bug 1087251 it appears that we're going to need to pass more details when we call start_suite, and some of these details will also require a Marionette session. I'm therefore thinking that we should consider instead moving the `self.start_time = time.time()` down to where the `self.logger.suite_start` call is, rather than moving the `self.logger.suite_start` up. This does mean that the Marionette initialisation is outside of the suite duration, but perhaps that's acceptable as a one-off cost.
I don't know if the reported time needs to reflect more the real time spent (with browser starting and all) or the time spent just for the tests, once initialization is done.

There is still the possible implementation indicated in comment 9 if the first proposal is retained.
(In reply to Julien Pagès from comment #14)
> I don't know if the reported time needs to reflect more the real time spent
> (with browser starting and all) or the time spent just for the tests, once
> initialization is done.

I think ideally we'd include the setup, but I'd prefer we had consistency between these start times. The logging can be timestamped so it would still be possible to determine the setup costs.

> There is still the possible implementation indicated in comment 9 if the
> first proposal is retained.

That won't work once we need to pass additional values to start_suite that require a Marionette session.
(In reply to Dave Hunt (:davehunt) from comment #15)
> > There is still the possible implementation indicated in comment 9 if the
> > first proposal is retained.
> 
> That won't work once we need to pass additional values to start_suite that
> require a Marionette session.

Hm it should work because the call to suite_start will stay at the current place, it's just that time is passed as an optional argument to override the starting time of a suite. So marionette will be instanciated when suite_start will be called.

But it may look strange to have a behaviour that allow to override the starting time. :)
(In reply to Julien Pagès from comment #16)
> (In reply to Dave Hunt (:davehunt) from comment #15)
> > > There is still the possible implementation indicated in comment 9 if the
> > > first proposal is retained.
> > 
> > That won't work once we need to pass additional values to start_suite that
> > require a Marionette session.
> 
> Hm it should work because the call to suite_start will stay at the current
> place, it's just that time is passed as an optional argument to override the
> starting time of a suite. So marionette will be instanciated when
> suite_start will be called.
> 
> But it may look strange to have a behaviour that allow to override the
> starting time. :)

Oh, sorry. I misunderstood. James: Do you think having an optional time argument in start_suite makes sense?
Flags: needinfo?(james)
Yeah, so I see the problem here.

On the one hand I don't think it's really that bad if the runtime is off by a ~constant factor (for marionette tests it's likely to be off by at least ~5s due to the 5s pause after marionette starts the browser). On the other hand it could be useful to include this if we are measuring end-to-end time.

I think that adding a time parameter to suite start is very hacky. It feels like the problem is an incorrect design in mozlog; suite-start should have been two messages; one just recording the startup and one providing the extra info about the tests that will be run.

Maybe the solution is to add a harness_start and harness_end action? Chris: wdyt?
Flags: needinfo?(james) → needinfo?(cmanchester)
I only mostly skimmed the bug so far. I apologize if I'm missing something here.

Why do I want to know the overall time for a test suite? I think I would want to know so I can prioritize efforts to optimize test run times, for the sake of load on our infrastructure and for those running and debugging tests locally. For both of these end users I'm interested to know if the test harness is taking an unexpectedly long time, so I'd say keep the longer suite time. 

On the other hand, if these reports are already used in a way that is sensitive at a granularity that would be disturbed by this jump in time or there's an overarching convention in xunit test output that the suite time be reported as the time between the first test starting and the last test ending regardless of harness overhead, I'd think we could modify xunit.py to keep a running sum of test times for this purpose, or just set its own suite start time based on the first test_start action seen. As far as I know these runs/chunks tend to be in the 10-15 minute range, so I'd be surprised if this were the case.

I don't think adding a time argument to suite_start is desirable for reasons others have mentioned. I don't think adding harness_start and harness_end makes a great deal of sense in this context because the process calling the harness process would do a better job with that (mozharness already uses the resource monitoring that lives in mozbase and could itself use some integration with mozlog).

There isn't too much benefit to passing a list of test identifiers to suite_start today, but I'd like to preserve that as far as possible because it gives us the potential for cool stuff later, like a formatter that tells us what % of the way we are through a test run.
Flags: needinfo?(cmanchester)
So we can't move the start_suite up because we need a Marionette instance for the arguments we're passing, and we're against passing an alternate start time when we call start_suite. Unless there's any other proposals I think that means we'll just have to accept that suite duration will not include the Marionette initialisation time. If that's the case, perhaps we should move the line setting self.start_time _down_ to where we call start_suite so that at least these two values will be roughly equivilent?
I had another proposal: add harness_start. Alternatively, I guess, use the time of the first message of any type that's logged.

I don't necessarily agree with chmanchester's argument that the calling process can always do a better job with the time; that assumes that the logs are being handled synchronously whereas in this case they are (or could be) processed asynchronously, so we will miss any context that isn't saved in the logs themselves. But there will certainly always be a time between initalising the process and constructing the logger that we won't be able to account for. Having said that I'm not sure if this is such a big problem that adding the extra boilerplate is necessary. I suppose we could avoid it by making the logger itself emit a message when a logger when a given name is first constructed. We wouldn't be able to handle the logger being destroyed automagically (since __del__ is unreliable), but maybe that doesn't matter as much.
I have to apologize -- I was just reading this over again and realized I misconstrued the goal of the bug when I made comment 19. I thought the goal was to make us report a shorter time to conform to the existing reporting, not to adjust mozlog to report a longer time. So all that was to propose we wontfix the bug, but by my own argument there is something to fix here. 

Anyhow, I agree with jgraham. There is potential value to accessing a time earlier than when the set of tests to run has been determined. The two ways I would think to go about this would be: recording a time that corresponds to instantiating the logger made available to formatters (but not necessarily an action in the logger) and keeping existing suite_start messages about where they are as jgraham suggested, or moving suite start somewhere as early as possible in the execution of the harness and having a new action that gives the set of tests to be run when it's available.
Chris: Do you have a preference between the two options? Could you perhaps provide Julien some guidance for working on a patch - if he's still interested.
Flags: needinfo?(cmanchester)
For the purposes of this bug, I think a reasonable way forward would be to record a start time in the formatter's __init__. This will be very close to the instantiation of the logger for current users and ought to reflect a more complete duration than basing start time on suite_start.
Flags: needinfo?(cmanchester)
(In reply to Chris Manchester [:chmanchester] from comment #24)
> For the purposes of this bug, I think a reasonable way forward would be to
> record a start time in the formatter's __init__. This will be very close to
> the instantiation of the logger for current users and ought to reflect a
> more complete duration than basing start time on suite_start.

Does that mean that all formatters must declare and use their own start time instead of the value given by the structured logger ? It seems to me that it is not really a formater responsibility, and it may end in duplicating logic and code in xunit formatter, html formatter, etc.
(In reply to Julien Pagès from comment #25)
> (In reply to Chris Manchester [:chmanchester] from comment #24)
> > For the purposes of this bug, I think a reasonable way forward would be to
> > record a start time in the formatter's __init__. This will be very close to
> > the instantiation of the logger for current users and ought to reflect a
> > more complete duration than basing start time on suite_start.
> 
> Does that mean that all formatters must declare and use their own start time
> instead of the value given by the structured logger ? It seems to me that it
> is not really a formater responsibility, and it may end in duplicating logic
> and code in xunit formatter, html formatter, etc.

Right, it's not ideal. If we're worried about duplicating the code we could just set something in BaseFormatter. It's also conceivable that someone would instantiate a formatter than expected and this number wouldn't be exactly what we want.

I'm thinking we could do it for the xunit formatter, for the particular issue we've identified with marionette tests. I'm just not sure this warrants a change in the api that would require all existing callers to be modified.
Assignee: j.parkouss → nobody
I'm not really interested in working on this. Futhermore, I'm not sure it is really important (marionette is starting faster now, that reduced the difference).
Given Marionette starts faster now, is this something we need to be concerned about?
Flags: needinfo?(cmanchester)
Clarifying what is meant be suite_start to differentiate between harness overhead and test execution time has some potential benefit, so marking this bug fixed might not make sense, but I doubt the case motivating this bug is cause for concern right now (I guess it was the 5 second sleep we killed).
Flags: needinfo?(cmanchester)
[mass update] Setting Harness bugs to all P3
Priority: -- → P3
I believe this is still a problem, but from the context of reading
this bug I understand it is less of an issue than it used to be
because Marionette is faster at starting.  In any case, it is
unlikely that we will take any action on this bug after it has been
stranded for three years.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: