Closed Bug 962918 Opened 10 years ago Closed 6 years ago

Save screenshot images in S3 server when tests do not pass on Travis

Categories

(Testing Graveyard :: JSMarionette, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: evanxd, Unassigned)

References

Details

(Whiteboard: [priority][p=13][productivity])

Attachments

(4 files, 1 obsolete file)

Save the below kind of JSON file in the S3 server, when there is a failed test with a PR on Travis.
The json format just refers to the Travis format.
{
  "id": "build-id",
  "message": "Bug 888888 - I am a bug",
  "pull_request": {
    "id": "#15575",
    "url": "https://github.com/mozilla-b2g/gaia/pull/15575",
  },
  "duration": "6 min 19 sec"
  "finished_datetime": "1995-12-17T03:24:00",
  "screenshots": [
    "base64 encoded image 1",
    "base64 encoded image 2"
  ]
}
And the file name is "build-id.json".

Then we will have a this kind of JSON file for each PR.
The question we need to discuss is how many screenshot image we should send to server for each one.

Do we just need to send the last state of b2g-desktop client before tests are failed,
or we should send the all states from the begging of running test?
Flags: needinfo?(jrburke)
Flags: needinfo?(gaye)
We could refer to Travis to provide useful info in our debugging tool.
Assignee: nobody → evanxd
Whiteboard: [priority][p=13]
I would be fine with just the last N of screen shots before the failure, may the last 3-5 state screenshots, as in the last 3-5 states of the UI after the last 3-5 marionette calls. Ideally that number would be configurable so could rerun the test with more screenshots configured if needed.
Flags: needinfo?(jrburke)
Whiteboard: [priority][p=13] → [priority][p=13][productivity]
Assignee: evanxd → nobody
Blocks: 962923
I would prefer all the screenshots sorted by timestamp if there's no reason not to. Storage is so cheap and travis is free :)
Flags: needinfo?(gaye)
there is one big problem about taking screenshots at each marionette command. **taking screenshots changes the test output!!!**

there is something really weird about the Travis environment that makes some of the marionette tests to fail once in a while (maybe because they are executed too slow?), so I was doing some investigation and had the *brilliant* idea to take screenshots just before I tried to click on the element (to make sure element is really there), but after I introduced the screenshot to the whole process the tests that was failing before simply stopped to fail.

see builds: https://travis-ci.org/mozilla-b2g/gaia/builds/17565064 (w/ screenshots - only test that fails is not related to bug that I was checking) and https://travis-ci.org/mozilla-b2g/gaia/builds/17563401 (w/o screenshot it fails between 20-50% of the time).

this kind of bug is also known as [Heisenbug](https://en.wikipedia.org/wiki/Heisenbug)
Hi Miller,

Yes, as I know, it might take few(2-5 seconds) seconds to take one screenshot.

If we take screenshot for each marionette server call, running the marionette test might take us too much time for each time.

I am thinking that could we just capture one screenshot image once the test is failed. Then we don't need to take screenshot before we have good screenshot things(take less time).

How do you guys think?
(In reply to Gareth Aye [:gaye] from comment #4)
> I would prefer all the screenshots sorted by timestamp if there's no reason
> not to. Storage is so cheap and travis is free :)

Hi Gareth,

Do you think taking too much time to capture screenshot is a issue?
It might take few(2-5 seconds) seconds to take one screenshot in local,
it will take more time on Travis.

Thanks.
We might capture screenshot at each marionette command just for the tests added or updated in a pull request.
It will save more time and resource to do debugging things.

But I'm not sure that there is no issue at Comment 5 once we follow this way.
We could test for that.

How do you guys think?
Flags: needinfo?(gaye)
Flags: needinfo?(jrburke)
Flags: needinfo?(mmedeiros)
the screenshot issue that I described was related to a race condition. The keyboard was not disappearing fast enough in some cases, and the screenshot added a delay, "fixing" the problem.

the screenshot is probably useful to check CSS/markup related issues, so checking only if the test files changed might not be enough to detect potential problems. Maybe another option is to take screenshots of all tests for the app that had code changes. And re-run failed tests (taking screenshots only on 2nd time).

Adding +5s per test seems a lot of time and might not be justified. I think we need a clear "user story" describing what problems we are trying to solve and benefits of the screenshots. That way it will be easier to balance pros/cons.
Flags: needinfo?(mmedeiros)
For my needs, I just wanted more information when an error happened, and a screenshot seemed like one of those things. So for my purposes, just having a screenshot when it failed, taken after the failure happens, would be enough for me to start with.
Flags: needinfo?(jrburke)
Can we dig into why taking screenshots takes 2-5s?
(In reply to Gareth Aye [:gaye] from comment #11)
> Can we dig into why taking screenshots takes 2-5s?

Sure, we could needinfo mdas (ateam member) for this.
It seems to be a marionette server issue.
We might get the answer from ateam.
Hi mdas,

I have a question about the "screenShot" command in marionette server side.
Why does it need to take 2 to 5 seconds to capture a screenshot,
after client send the "screenShot" command to marionette server.
It is a little bit slow. Is it possible to run fast?

It seems be to a marionette server issue, so you might know more then us.
Thanks.
Flags: needinfo?(gaye) → needinfo?(mdas)
AutomatedTester is more familiar with the screenshot internals, I'm ni?ing him
Flags: needinfo?(dburns)
thats probably down to the environment, we copy the DOM over to canvas object and then ask it for a base64 data uri which we return to the client code.

I have never seen this in a normal* machine. Does this happen only on Travis or everywhere?

* a developers machine
Flags: needinfo?(mdas)
Flags: needinfo?(dburns)
Hi David,

This is happened on local laptop machine.
It might be slower on Travis.

We would like to capture screenshot on Travis for debugging propose.
It will capture many screenshots.
So is it possible we could let it run faster?

Thanks.
since screenshot delay might change the test behavior, and it takes so long, I think we should couple it with the assertion library, so when you do `assert.equal(foo, 'bar')` it will ONLY take the screenshot if the assertion fails (before the error is thrown). That should give the most accurate result and won't affect the assertion result.

It would also need to take screenshots on timeouts, since that is a common type of error as well (could happen on setup/teardown/test).

I'm also not sure how the screenshot is implemented, but I think it should block any rendering/timers/script while it's running, otherwise it will be hard to catch errors caused by CSS transitions or anything similar.
(In reply to Miller Medeiros [:millermedeiros] from comment #18)
> since screenshot delay might change the test behavior, and it takes so long,
> I think we should couple it with the assertion library, so when you do
> `assert.equal(foo, 'bar')` it will ONLY take the screenshot if the assertion
> fails (before the error is thrown). That should give the most accurate
> result and won't affect the assertion result.

scratch that. it probably won't make any difference if it happens before or after.. I was just being dumb.
Thanks to :mcav for the idea, I think this could work well in the meantime. What do you guys think?
Attachment #8393720 - Flags: review?(gaye)
Attachment #8393720 - Flags: review?(evanxd)
Hi Kevin,

Good job!

But if we capture screenshots at each marionette command like https://github.com/mozilla-b2g/marionette-js-client/pull/106, do we have same problem at Comment 5?

Thanks.
Well this only takes screenshots during a timeout event, so this isn't a full solution, but it should still help quite a bit. (We could also try taking screenshots at every step - but then we'd definitely run into the same issues as comment 5).
Anything is better than nothing; it seems like :kgrandon's patch would be super nice for a lot of situations.
Not that this should delay landing, but we can also/alternatively listen for the mocha "fail" event which should cover more than just timeouts:
https://github.com/visionmedia/mocha/blob/master/lib/runner.js#L200
(In reply to Andrew Sutherland (:asuth) from comment #24)
> Not that this should delay landing, but we can also/alternatively listen for
> the mocha "fail" event which should cover more than just timeouts:
> https://github.com/visionmedia/mocha/blob/master/lib/runner.js#L200

Yeah - we can definitely look at doing that, I'm not too familiar with our Marionette stack and how all of the events get propagated, so if someone wants to submit a patch please do so :)

In the meantime - review ping? :) This would definitely help with trying to open the tree today. Thanks!
Comment on attachment 8393720 [details] [review]
Quick solution - log the screenshot

LGTM! Thanks kgrandon!
Attachment #8393720 - Flags: review?(gaye) → review+
Comment on attachment 8393720 [details] [review]
Quick solution - log the screenshot

Thanks for the review!
Attachment #8393720 - Flags: review?(evanxd)
Nice, thanks for Kevin's work.
B2g-inbound was busted for a while, it's now fixed so I pushed a new try: https://tbpl.mozilla.org/?tree=Try&rev=e7df4381de5f
Since Evan just asked everybody's opinion, so listed a "crazy" idea here...

If we can redirect Travis' xvfb, we can capture all images during the test and forward it to network. For example, write a dummy xvfb and then record it and then forward to YouTube. Then we can see how different the Travis and local tests it would perform.
Not crazy at all.  :lightsofapollo already has written some wrapper scripts to do this using xvfb and ffmpeg to perform the recording:
https://github.com/lightsofapollo/x-recorder

I'm currently setting the e-mail app up to optionally do this as a prototype as I finish up its integration test cleanup/overhaul.  I'm planning to treat the webm video as a failure build artifact.
Hi all,

So base on the Andrew's work at Comment 33, we could have a debugging tool to record any specific(or all) test case on Travis.

For example, we could configure https://github.com/mozilla-b2g/gaia/blob/master/tests/travis_ci/marionette_js/script#L3 as `... make test-integration TEST_FILES=apps/calendar/test/marionette/caldav_test.js CAPTURE=1` to record the test as a webm video. Then we could store the video in the S3 server or youtube for the debugging purpose.

I think we could do the above idea to instead of saving screenshot images in S3 server. How do you guys think?
Problem with commit in comment 28: it output stuff on the standard output that cause |make test-perf| to generate an invalid JSON. Will submit and update.
(In reply to Hubert Figuiere [:hub] from comment #35)
> Problem with commit in comment 28: it output stuff on the standard output
> that cause |make test-perf| to generate an invalid JSON. Will submit and
> update.

But this only happens on a timeout correct? So why does make test-perf trigger a timeout?
|make test-perf| triggers a time out because of various condition I'm working on fixing. It is not the timeout the problem, but that we send stuff to the output causing the JSON that the reporter outputs to be broken.
Hi Andrew,
How do you think about Comment 34?
Flags: needinfo?(bugmail)
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #34)
> For example, we could configure
> https://github.com/mozilla-b2g/gaia/blob/master/tests/travis_ci/
> marionette_js/script#L3 as `... make test-integration
> TEST_FILES=apps/calendar/test/marionette/caldav_test.js CAPTURE=1` to record
> the test as a webm video. Then we could store the video in the S3 server or
> youtube for the debugging purpose.
> 
> I think we could do the above idea to instead of saving screenshot images in
> S3 server. How do you guys think?

Yes, with a specific approach building on what we currently do:

The gaia-ui-tests already uses travis-artifacts to upload stuff to S3.  In its after_script it tries to upload everything from the artifacts/ directory of Gaia to S3.  There are inherent issues with secrets so the keys only get decrypted on branch builds (not pull requests, which could include stuff that tried to leak the sensitive keys) and in general it's brittle, so it already sucks a bit on Travis.  It's my understanding that our post-Travis solution does not have this problem since the privileged test-infrastructure code can just suck the results out of the docker instance's artifacts/ dir and upload it for branch builds, pull requests, etc.

I think it makes sense for us to accumulate logs/videos/screenshots/whatever inside a per-test or per-test-file directory.  If a failure occurs, we keep the contents of that directory by default.  If no failure occurs we delete / do not upload the files.

We should indeed support some type of flag that would let individual pull-requests request that these directories be kept for all tests for a subset of the tests.  We should bias this so that it's easier to request a specific app/test than all apps/tests since usually people will only care about a single app.

Because there is the potential to synchronize videos with logs/etc. (I'm currently working on this a bit), I think we'd want to stick with raw webm videos served from s3.  Youtube probably wouldn't enable us to embed the video in a <video> tag, plus it's arguably bad form to spam youtube with automated build videos that no one cares about but us, and frequently not even us.
Flags: needinfo?(bugmail)
If we are only talking about images, you could just throw it to Imgur :P

FYI http://twolfson.com/2014-02-25-visual-regression-testing-in-travis-ci
Blocks: 987383
This block bug 987383 which mean that the make test-perf automation is completely broken.
Hi Hubert,
Sorry, why does this bug block Bug 987383?
Because you print non JSON data to stdout causing |make test-perf| to generate invalid JSON. Don't randomly output to stdout.
I am proposing we revert the patch from attachment 8393720 [details] [review] and figure out a more longer term solution.
Flags: needinfo?(gaye)
Flags: needinfo?(evanxd)
(In reply to Hubert Figuiere [:hub] from comment #44)
> Because you print non JSON data to stdout causing |make test-perf| to
> generate invalid JSON. Don't randomly output to stdout.

Can you provide more context about the problem?  Is this a case of:
- make test-perf assuming that no data except JSON will ever be output to stdout or stderr?
- interleaved writes to stdout or stderr by multiple sub-processes that results in a non-coherent output stream?
- something else?

It sounds like it's the first case, but then that just seems like bad architecture for make test-perf if any writes to stdout or stderr break it.
(In reply to Andrew Sutherland (:asuth) from comment #46)
> (In reply to Hubert Figuiere [:hub] from comment #44)
> > Because you print non JSON data to stdout causing |make test-perf| to
> > generate invalid JSON. Don't randomly output to stdout.
> 
> Can you provide more context about the problem?  Is this a case of:
> - make test-perf assuming that no data except JSON will ever be output to
> stdout or stderr?

Just stdout. I'm fine with whatever is output to stderr.

> It sounds like it's the first case, but then that just seems like bad
> architecture for make test-perf if any writes to stdout or stderr break it.

Just to stdout, again.

Yes that's the way it has always worked - it is very UNIX-ish. Surely not from random npm module ; Implementing that feature by dumping to stdout IS bad design.
For a quick fix, can make test-perf simply override client.onScriptTimeout? (Hacks on top of hacks, yaay).

I'm not against backing this out, but I'd prefer to override that method, or find the root cause of the timeouts and eliminate them. The screenshots have already helped us fix some intermittent tests, so it would be a shame to lose this.
(In reply to Kevin Grandon :kgrandon from comment #48)

> I'm not against backing this out, but I'd prefer to override that method, or
> find the root cause of the timeouts and eliminate them. The screenshots have
> already helped us fix some intermittent tests, so it would be a shame to
> lose this.

Finding the root cause wouldn't fix it, because if a similar problem arise, it would cause the same issue.
Why can't test-perf write its output to the GAIA/artifacts/ directory where we put other desired byproducts (currently gaia-ui-tests' HTML output, but soon other things) so they can be uploaded to S3 or otherwise stored as publicly accessible data (depends on the continuous integration driver)?

And/or since it sounds like test-perf directly just wants to upload stuff to datazilla, why can't it write to a named file and then just upload that named file?

It's been a huge developer ergonomics problem in general that the b2g-desktop stdout/stderr disappears into a black hole.  I don't think we'll improve things by requiring that to be the case.  We should treat the test runs and b2g-desktop runs as gecko-like wild-wild-west dysfunctional stdout/stderrs and log to specific files for things we care about.
(In reply to Andrew Sutherland (:asuth) from comment #50)
> Why can't test-perf write its output to the GAIA/artifacts/ directory where
> we put other desired byproducts (currently gaia-ui-tests' HTML output, but
> soon other things) so they can be uploaded to S3 or otherwise stored as
> publicly accessible data (depends on the continuous integration driver)?

1. Because that's not what was done originally
2. it is not convenient as by default we are supposed to show the result to the developer
3. Because of the way it is designed there are multiple consecutive output that get wrapped around by a shell script.
4. We use mocha reporting, I don't even think we can do that easily with mocha.


> And/or since it sounds like test-perf directly just wants to upload stuff to
> datazilla, why can't it write to a named file and then just upload that
> named file?

See number 3.
 
> It's been a huge developer ergonomics problem in general that the
> b2g-desktop stdout/stderr disappears into a black hole.  I don't think we'll
> improve things by requiring that to be the case.  We should treat the test
> runs and b2g-desktop runs as gecko-like wild-wild-west dysfunctional
> stdout/stderrs and log to specific files for things we care about.

And in the |make test-perf| case we don't care about b2gdesktop, which also broke differently anyway, but then it still doesn't make sense at all, so that's not the problem right now. Also I do believe it doesn't pollute stdout. You can write what you want on stderr.

But this is really akin to bikeshedding right now.

Something broke due to a side effect. Let's fix the side effect.
Hub - can we just do something like this for now? Note: I can't seem to test this because test-perf is timing out during the download for some reason. Let me know what you think.
Attachment #8409833 - Flags: review?(hub)
Comment on attachment 8409833 [details] [review]
Follow-up, override onScriptTimeout in startup tests

-These are not the only two files that would need this.

-Also why no setting it to null like it was before.

-And it should be done in the setup function where we set timeouts.

While I don't like this hack it has two benefit:

1. we stop the bikeshedding
2. we fix this.

So with the fixes above I'll r+
Attachment #8409833 - Flags: review?(hub) → review-
Attachment #8404962 - Attachment is obsolete: true
IMHO this is an issue with make test-perf more than anything else...
Flags: needinfo?(gaye)
Kevin, if you don't mind I'll take your patch, finish it, and attach it to bug 987383. Thanks a lot for the help.
Flags: needinfo?(evanxd)
No longer blocks: 987383
Landed a follow-up to actually support nulling of onScriptTimeout, which may have broken existing tests and this use case: https://github.com/mozilla-b2g/marionette-js-client/commit/f453009b7e6ab863eaf11c79f784bfd08cecb218
Good catch on that one. Thanks !
I've written a Firefox add-on to render the base64 to image tag, and append it on the log. Here is the repo:

https://github.com/snowmantw/travis-logger-render

It's coarse but it save me lots of time to copy and paste the base64 to convert it into png file.
If we upload the image to CDN server, this extension can still work if we leave a <img> tag in the log.
Greg, nice works!
Hi Kevin,

We should switch to System frame before we capture the screenshot.

Currently, if lockscreen app overlap settings app(actually happened before), we could NOT see the situation from the screenshot. We could only see the settings screen when we capture the settings frame. Once we capture the system frame, we could see the whole firefox os screen. It is better for debugging.
Attachment #8438308 - Flags: review?(kgrandon)
Comment on attachment 8438308 [details] [review]
Patch - Switch to System frame before capture the screenshot

Seems fine to me, thanks!
Attachment #8438308 - Flags: review?(kgrandon) → review+
Bulk closed as part of https://bugzilla.mozilla.org/show_bug.cgi?id=1422750
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: