Closed Bug 984726 Opened 10 years ago Closed 6 years ago

[Gaia] Add screenshot reftests

Categories

(Testing Graveyard :: JSMarionette, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: evanxd, Unassigned)

References

Details

We would like to have UI tests to keep the result of css changes and the UI of each app always correct.

For this, we could compare screenshots of correct version and the patch version.
If they are same, we could believe the patch is good for css changes and UI.
Summary: UI test for style changes. → UI test for UI and css changes.
Summary: UI test for UI and css changes. → Tests for UI and CSS changes.
There is one way we already had WIP patch to achieve this.

It is a local tool to generate a report includes screenshots of current and previous version in the git branch and the diff results between these two versions.
Please refer to http://bugzil.la/926801 - Generate a report shows screenshots in current and previous version.

How do you guys think?
Flags: needinfo?(gaye)
Flags: needinfo?(jlal)
Flags: needinfo?(mike)
Hey Evan,

This is an interesting but tricky problem. How do you see this functionality being used? During continuous integration? Or only locally during development?
Flags: needinfo?(mike)
Hi Mike,

The tool in Comment 1 is only for locally development.

For continuous integration, I think we could add screenshot tests in Gaia to check CSS and UI related changes in patches.

The below is an idea to run the screenshot tests on Travis.
We could store all correct screenshot images in a GitHub repo,
then we could add marionette tests to capture screenshot and compare with the correct one. If they are same, the test is passed.

Once we have these thing on Travis, I think we could have less and less CSS and UI related regression.

How do you think?

Thanks.
And do you guys have any good idea to skip the CSS and UI related regression?
I would like to know that.

I hope we could have a solution to skip the CSS and UI related regression.
Thanks.
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #4)
> And do you guys have any good idea to skip the CSS and UI related regression?
> I would like to know that.
> 
> I hope we could have a solution to skip the CSS and UI related regression.
> Thanks.

If I get your meaning, then you may be interested in TravisCI's "allow_failures" option.
Hi Mike,

Sorry, we don't have any "allow_failures" currently.
See it at https://github.com/mozilla-b2g/gaia/blob/master/.travis.yml.

So what do you mean that interested in TravisCI's "allow_failures" option.
Thanks.
Hi Kevin and Marcus,

I have an idea to skip the css and UI related regression.

We could store all correct screenshot images in a GitHub repo,
then we could add marionette tests to capture screenshot and compare it with the correct one. If they are same, the test is passed.

Once we have these thing on Travis, I think we could have less and less CSS and UI related regression.

How do you guys think?

Thanks.
Flags: needinfo?(kgrandon)
Summary: Tests for UI and CSS changes. → Test cases to skip css and UI related regression.
Flags: needinfo?(m)
Hi, so it looks like you want to implement reftests for gaia. This is definitely a good thing and would help catch a lot of the regressions that we have been seeing. I am a bit worried about fonts, and rendering on different platforms. If we only deal with one CI env (travis), it may not be too bad - we just need a way to easily generate new screenshots.

We should try to get this working with a single test first. Also - why do you want it to be an allowed failure? Any jobs we add to travis now need to fail the tree if they fail.

I'm also thinking that you could do this today inside of a marionette test by calling screenshot() and comparing to a data URI in a file on disk. It would be interesting to put together a prototype that did this. Perhaps we could add a folder inside of each app, apps/calendar/test/ref/add_account_test.js for example.
Flags: needinfo?(kgrandon)
I'm renaming this bug to something that I think is a little bit more clear to me.

Also if you are not familiar you may want to see how the platform handles this: https://developer.mozilla.org/en/docs/Creating_reftest-based_unit_tests (though our cases are quite a bit different)
Summary: Test cases to skip css and UI related regression. → [Gaia] Add screenshot reftests
OS: Mac OS X → All
Hardware: x86 → All
I think reftests could be neat if implemented in Gaia, per what Kevin says.
Flags: needinfo?(m)
(In reply to Kevin Grandon :kgrandon from comment #8)
> Hi, so it looks like you want to implement reftests for gaia. This is
> definitely a good thing and would help catch a lot of the regressions that
> we have been seeing. I am a bit worried about fonts, and rendering on
> different platforms. If we only deal with one CI env (travis), it may not be
> too bad - we just need a way to easily generate new screenshots.
Yes, I think we could run the reftests on Travis first.

> We should try to get this working with a single test first. Also - why do
> you want it to be an allowed failure? Any jobs we add to travis now need to
> fail the tree if they fail.
I think there might be some misunderstanding happened here. I do not want it to be an allowed failure. I hope patches must pass the tests.
 
> I'm also thinking that you could do this today inside of a marionette test
> by calling screenshot() and comparing to a data URI in a file on disk. It
> would be interesting to put together a prototype that did this. Perhaps we
> could add a folder inside of each app,
> apps/calendar/test/ref/add_account_test.js for example.
Sure, let's do a prototype to get the detailed feedback.
(In reply to Marcus Cavanaugh [:mcav] <mcav@mozilla.com> from comment #10)
> I think reftests could be neat if implemented in Gaia, per what Kevin says.

Thanks. :)
Assignee: nobody → evanxd
Depends on: 993247
We could add reference test for keyboard app as a prototype of the idea in Comment 7.
I assume you've checked with Rudy or a keyboard peer first? We can do it for calendar no problem, but I just want to make sure the module owner is up to speed :)
Depends on: 993276
Hi Kevin,

Yes, Rudy knew these things.
And the test(Bug 993247) for keyboard app is blocked by Bug 952443(about b2g focus issue).

Sure, let's do this for Calendar first, and the bug is Bug 993276.
TL;DR; I think it's a bad idea, this won't make us move any faster.

I don't think the amount of work/bureaucracy is justified. This will probably cause more headaches than it will avoid. There is no such a thing as a pixel-perfect layout.

The calendar app changes the UI based on the current date... It will also make it really hard to make CSS changes and still pass the test (small design tweaks). You basically need to make the change, execute test on travis, grab a screenshot from travis, update the reference image and push a new commit.. People will make mistakes, the tree will be closed because of perma-red tests... We won't know if error was caused by a real regression or if it was a valid update.

If people are making mistakes than they should be encouraged to write more unit/integration tests (that works locally) to validate the correctness of the app, and also to always test things manually before doing a pull request. Visual errors are very easy to catch by a human, specially since the apps doesn't have that many different screens.
As a counterpoint, I do think this makes sense on an opt-in basis for apps, probably via a plugin helper that lets you checkpoint the current visual state with a name.  Ex:
  client.screenshot.saveNow('email-message-list-full');
  ...
  client.screenshot.saveNow('email-message-list-empty');

Specifically as a value proposition:
- email has seen visual regressions from building blocks changing things.  (It would still be nice if apps each had their own specific checkout of building blocks, of course.)
- it can be a hassle to manually verify that there aren't visual regressions in e-mail because there are so many different cards.  It would be amazing if I could see all the e-mail cards that changed from a single commit.  (Note that this is mainly for building block changes again, and with web components and scoped CSS and such the chances for fallout are hopefully reduced.)

Obviously apps would only want to trigger snapshots in cases where:
- They can keep a pixel-identical layout.  So date-dependent and other potential random factors need to be eliminated.
- The developers care about how all of the given things look.

It might also make sense to include some type of normalization transform mechanism in the screenshot process where it's a hassle to control the data values.  For example, give it some selectors that it should synchronously replace the textContent of with 'lorem ipsum', synchronously screenshot, then synchronously put everything back.
Good idea, Andrew. We could capture screenshots on an pot-in basis for apps.

So I think we could do like `make test-integration APP=email SCREENSHOT=1` to show the screenshots in the terminal. The default value of `SCREENSHOT` argument is 0, and it means we will not capture and print screenshots.

And we could just reuse https://github.com/mozilla-b2g/marionette-debug and do the below script in test code to capture and print the screenshot.
```
client.debug.screenshot('email-message-list-full');
client.debug.screenshot('email-message-list-empty');
```

How do you guys think?
Flags: needinfo?(mike)
Hi Gareth, James, Mike,

How do you think about Comment 18?
This kind of test is good for catching regressions for apps that heavily rely on shared styles. To have tolerance for changes introduced by string changes or font, maybe we can have a threshold and warns developers only when the amount of difference is over a certain threshold. And we should be able to update the reference screenshots very easily as it may be painful doing the update when changing shared styles.
We need to revisit this soonish... I think having reftests is useful I will write up something post verticalhome (we could use this for some tests there at a minimum)
Flags: needinfo?(jlal)
Flags: needinfo?(mike)
Assignee: evan → nobody
Flags: needinfo?(gaye)
Bulk closed as part of https://bugzilla.mozilla.org/show_bug.cgi?id=1422750
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.