971135 - [Clock] Port all Clock integration tests from Python to JavaScript in conjunction with MarionetteJS / Gaia UI testing work

Assignee

Description

•

10 years ago

Now that bug 907177 has landed, we have the facility to add proper
JavaScript marionette integration tests for Clock. We should port all
existing Python tests to JavaScript, and subsequently remove the
Python tests.

The numerous advantages to doing this include:

- Decreasing the barrier to entry for contributors, who should be able
  to read, write, and modify any applicable integration tests when
  they work on Clock;

- Reducing the burden of maintainability for current and future
  integration tests without maintaining two completely orthogonal
  testing paths;

- Avoiding code duplication and encouraging code reuse in writing
  tests and test utilities for manipulating Clock;

- We already see intermittent failures reported via e-mail for the
  Python tests. It's not in Mozilla's (or FXOS's) best interest to
  have people spending a lot of time contributing work to Python tests
  (likely in maintenance mode) when they could instead be contributing
  to the growing, vibrant JS marionette testing initiative.

- When undergoing refactoring and major feature changes (as is
  expected in the next few releases), Python tests will inherently
  need to be changed. It would serve Firefox OS better to take that
  opportunity to port tests over, rather than try to fixup existing
  Python tests. This allows us to consolidate without wasting effort.

- The burden on the developer is already high, in making sure the unit
  tests and JS marionette tests continue to pass with each pull
  request. Python tests add an unnecessary third layer of complexity
  that weighs on developers' workflow.

Possible concerns regarding removing the Python tests, which will
be addressed in this process:

Q: Will we lose test coverage?
A: No; existing Python tests will be ported to JavaScript and/or
   deduplicated where applicable.

Q: The Python tests have existed for a long time and do their job; it
   seems wasteful just get rid of them wholesale. Why break what isn't
   broken?
A: Deleting code is one very effective way to ensure a codebase does
   not sink under its own weight. We avoid duplicating effort when
   writing code; why not do the same for tests?

Q: I only know how to write Python tests. Let me continue to do so.
A: By encouraging people to write Python tests, we perpetuate
   duplicated effort. The same argument could be made for JavaScript
   developers who have to learn Python in order to contribute to the
   Python tests. However, we are never going to write our applications
   in Python. I say this as someone who used to love Python. Effort
   would be better spent teaching any remaining Python developers how
   to use the (very similar) JavaScript API.

I am only advocating this for Clock, not for other apps, who are
welcome to do as they see fit. Please raise any concerns about this
proposal here. Unless I hear compelling arguments otherwise, I intend
to do the following:

1. If, in the course of working on Clock, existing Python tests break,
   I will port the broken test to JavaScript, and remove the Python
   version, upon r+ from the bug's reviewer.

2. Porting will not block existing work; I don't intend to go out of
   my way to refactor everything at once. In general, this is a
   medium-term initiative to ensure Clock's maintainability and
   testability down the road.

3. Eventually, all Python tests will be rewritten and reviewed, saving
   ourselves time and effort down the road as we maintain Clock in the
   months and years ahead.

Jason Smith [:jsmith]

Comment 1

•

10 years ago

(In reply to Marcus Cavanaugh [:mcav] <mcav@mozilla.com> from comment #0)
> Now that bug 907177 has landed, we have the facility to add proper
> JavaScript marionette integration tests for Clock. We should port all
> existing Python tests to JavaScript, and subsequently remove the
> Python tests.

This is a bad idea. This is essentially removes device & OOP coverage of the clock workflows, which overall decreases the scope of test coverage covered by these tests. The OOP issue is serious, as desktop builds won't capture the key gecko regressions triggered by e10s. 

> 
> The numerous advantages to doing this include:
> 
> - Decreasing the barrier to entry for contributors, who should be able
>   to read, write, and modify any applicable integration tests when
>   they work on Clock;

This is incorrect. The entry point has already proven out in the Python tests as well, as there's a well built community around those tests as well. 

> 
> - Reducing the burden of maintainability for current and future
>   integration tests without maintaining two completely orthogonal
>   testing paths;

They aren't the same - desktop in process builds won't trigger the e10s paths that are found on device.

> 
> - Avoiding code duplication and encouraging code reuse in writing
>   tests and test utilities for manipulating Clock;

Technically each sets of tests should be aiming to solve different problems. In tree tests aim for what's driven by Gaia, where as on device tests focus on the full picture.

> 
> - We already see intermittent failures reported via e-mail for the
>   Python tests. It's not in Mozilla's (or FXOS's) best interest to
>   have people spending a lot of time contributing work to Python tests
>   (likely in maintenance mode) when they could instead be contributing
>   to the growing, vibrant JS marionette testing initiative.

JS marionette tests have the same problem, if not, worse actually. These tests were shut off on TBPL on linux machines for having a high, unsustainable, intermittent failure rate. Both sets of tests use the same fundamental UI interaction pattern as well, so the intermittent failure problem here ends up being persistent across each framework.

> 
> - When undergoing refactoring and major feature changes (as is
>   expected in the next few releases), Python tests will inherently
>   need to be changed. It would serve Firefox OS better to take that
>   opportunity to port tests over, rather than try to fixup existing
>   Python tests. This allows us to consolidate without wasting effort.

No it would not - that would greatly endanger us in not having on device coverage & no e10s coverage. This snake is dead also - we've already made a front-facing decision that devs can use JS, the UI automation team can use Python.

> 
> - The burden on the developer is already high, in making sure the unit
>   tests and JS marionette tests continue to pass with each pull
>   request. Python tests add an unnecessary third layer of complexity
>   that weighs on developers' workflow.

I tend to disagree with this - the Python tests serve filling the current testing gap of a need for e10s coverage & on device coverage.

> 
> Possible concerns regarding removing the Python tests, which will
> be addressed in this process:
> 
> Q: Will we lose test coverage?
> A: No; existing Python tests will be ported to JavaScript and/or
>    deduplicated where applicable.
>

This is wrong. B2G Desktop builds currently run in process, not out of process. They are completely different code paths. There's also other factors that come into play when tests run on device as well that desktop builds won't capture.
 
> Q: The Python tests have existed for a long time and do their job; it
>    seems wasteful just get rid of them wholesale. Why break what isn't
>    broken?
> A: Deleting code is one very effective way to ensure a codebase does
>    not sink under its own weight. We avoid duplicating effort when
>    writing code; why not do the same for tests?

This is a dead snake of a discussion - we've already concluded that each set of tests serve different purposes. Each group is solving a different problem here - for QA, the intentional need of the Python tests was for an automated solution for our smoketests on device. Moving to JS tests here right now isn't go fit our equivalent need - we need e10s & on device coverage. Those gaps need to be filled first before we reconsider opening this discussion.

> 
> Q: I only know how to write Python tests. Let me continue to do so.
> A: By encouraging people to write Python tests, we perpetuate
>    duplicated effort. The same argument could be made for JavaScript
>    developers who have to learn Python in order to contribute to the
>    Python tests. However, we are never going to write our applications
>    in Python. I say this as someone who used to love Python. Effort
>    would be better spent teaching any remaining Python developers how
>    to use the (very similar) JavaScript API.

Wrong. Each group is open to choose whatever language that fits them best. We aren't a dictatorship community on programming language choice.

> 
> I am only advocating this for Clock, not for other apps, who are
> welcome to do as they see fit. Please raise any concerns about this
> proposal here. Unless I hear compelling arguments otherwise, I intend
> to do the following:

This has already been discussed extensively - I'm not in favor of considering this until e10s support & on device support is seen in the JS tests.

> 
> 1. If, in the course of working on Clock, existing Python tests break,
>    I will port the broken test to JavaScript, and remove the Python
>    version, upon r+ from the bug's reviewer.

You better not do that - that's up to zac's team to decide what they want to do with their tests. We've already had two directors (Faramarz & Lucas) confirm that developers cannot turn off Python tests as they please without consulting Zac's team first, given that he's the module owner on those tests.

> 
> 2. Porting will not block existing work; I don't intend to go out of
>    my way to refactor everything at once. In general, this is a
>    medium-term initiative to ensure Clock's maintainability and
>    testability down the road.
> 
> 3. Eventually, all Python tests will be rewritten and reviewed, saving
>    ourselves time and effort down the road as we maintain Clock in the
>    months and years ahead.

Given that I'm one of the two smoketest QA leads who own smoketest management, I'm rejecting the above proposal. When you solve the e10s problem & on device problem with the JS tests, then you can feel free to discuss this with our team again. But until those two needs are met, what's proposed here is out of question. On that regard, I'm moving this to a WONTFIX until a timeframe comes when the JS tests fix the two problems I've cited above.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

Marcus Cavanaugh [:mcav]

Assignee

Comment 2

•

10 years ago

> This is a bad idea. This is essentially removes device & OOP coverage of the
> clock workflows, which overall decreases the scope of test coverage covered
> by these tests. The OOP issue is serious, as desktop builds won't capture
> the key gecko regressions triggered by e10s. 

The problem here is not the concept of Python tests themselves, then, but the scope of on-device support. Obviously we don't want to lose support for on-device coverage -- you're right that we should resolve the OOP issue before removing the Python tests.

> > - Decreasing the barrier to entry for contributors, who should be able
> >   to read, write, and modify any applicable integration tests when
> >   they work on Clock;
> 
> This is incorrect. The entry point has already proven out in the Python
> tests as well, as there's a well built community around those tests as well. 

No, it _would_ reduce the barrier to entry. There are many more developers who know JavaScript than who know Python, and that number is only going to grow over time.

> They aren't the same - desktop in process builds won't trigger the e10s
> paths that are found on device.

Let's fix that.

> > - Avoiding code duplication and encouraging code reuse in writing
> >   tests and test utilities for manipulating Clock;
> 
> Technically each sets of tests should be aiming to solve different problems.
> In tree tests aim for what's driven by Gaia, where as on device tests focus
> on the full picture.

Gaia integration tests are intended to test the full picture as well; the problem is that they don't run OOP. The problem isn't JS-vs-Python, it's that MarionetteJS doesn't yet run on-device. 

> JS marionette tests have the same problem, if not, worse actually. These
> tests were shut off on TBPL on linux machines for having a high,
> unsustainable, intermittent failure rate. Both sets of tests use the same
> fundamental UI interaction pattern as well, so the intermittent failure
> problem here ends up being persistent across each framework.

Failures for tests are generally due to poor test-writing, not systemic problems with Marionette itself. The benefit to having tests in JavaScript is that the developers who actually work on and fix bugs in apps are very well-equipped to fix JS tests, and less well-equipped to fix Python tests.

> No it would not - that would greatly endanger us in not having on device
> coverage & no e10s coverage. This snake is dead also - we've already made a
> front-facing decision that devs can use JS, the UI automation team can use
> Python.

Who is responsible for these tests? Is the UI automation team going to rewrite tests when new features are developed and the old tests are no longer relevant? If the automation team is going to deal with breakages and new tests themselves, that's great. My concern here is that these tests, which apart from OOP differences, largely test exactly the same thing, and it doesn't make sense for Gaia developers to write two sets of identical tests. If your team is responsible for maintaining, writing, and fixing the Python tests, I'm much less concerned about the duplication of effort.

> This is wrong. B2G Desktop builds currently run in process, not out of
> process. They are completely different code paths. There's also other
> factors that come into play when tests run on device as well that desktop
> builds won't capture.

Good point, let's address that by fixing MarionetteJS.

> Wrong. Each group is open to choose whatever language that fits them best.
> We aren't a dictatorship community on programming language choice.

I don't want to remove Python tests for the sake of language unification -- it's more that there's very clearly duplicated effort here, and there's a very real case for attempting to reduce that duplicated effort.

We aren't just a bunch of groups in silos; we're a community working to build a product together. If we can make changes to improve the situation for the project as a whole, we should, even if that means we can't please everybody. It's not a matter of language preference, it's about unnecessary effort.

> This has already been discussed extensively - I'm not in favor of
> considering this until e10s support & on device support is seen in the JS
> tests. [...]
> You better not do that - that's up to zac's team to decide what
> they want to do with their tests. We've already had two directors
> (Faramarz & Lucas) confirm that developers cannot turn off Python
> tests as they please without consulting Zac's team first, given
> that he's the module owner on those tests.

As you repeatedly state:

> Given that I'm one of the two smoketest QA leads who own smoketest
> management, I'm rejecting the above proposal. When you solve the e10s
> problem & on device problem with the JS tests, then you can feel free
> to discuss this with our team again. But until those two needs are met,
> what's proposed here is out of question. On that regard, I'm moving
> this to a WONTFIX until a timeframe comes when the JS tests fix the
> two problems I've cited above.

> On that regard, I'm moving this to a
> WONTFIX until a timeframe comes when the JS tests fix the two problems I've
> cited above.

It sounds like you're in agreement that in a theoretical world where JS Marionette handles these cases properly, there would be benefit to reconsidering removing the Python tests. So let's not WONTFIX this, let's mark it as blocked on bug 973351, for updating MarionetteJS to address the issues you've stated.

Thanks for clarifying the remaining issues with MarionetteJS; it's clear that we shouldn't remove the Python tests until those other problems are resolved. That said, it also seems clear that _when_ those problems are resolved, there are noteworthy benefits to unifying our integration test workflow.

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

Depends on: 973351

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

Summary: [Clock] Port all Clock integration tests from Python to JavaScript → [Clock] Port all Clock integration tests from Python to JavaScript when MarionetteJS supports on-device testing

Jason Smith [:jsmith]

Comment 3

•

10 years ago

I'm okay with your updated suggestions. Btw, bug 913584 is the bug to follow for turning on OOP in b2g desktop builds.

Depends on: 914584

Jason Smith [:jsmith]

Comment 4

•

10 years ago

(In reply to Jason Smith [:jsmith] from comment #3)
> I'm okay with your updated suggestions. Btw, bug 913584 is the bug to follow
> for turning on OOP in b2g desktop builds.

That bug # should be - bug 914584.

Gareth Aye [:gaye] (back from PTO)

Comment 5

•

10 years ago

I disagree. Neither b2g-desktop nor the ability to run marionette js tests on devices should block this. I understand why the python group has used gaia apps to test other parts of the firefox os stack in the past. Gaia apps are great test cases for the other parts of firefox os. Other groups are welcome to use gaia for this purpose (it's open source : ), but they should fork gaia first (or expect for gaia changes to break them).

Marcus provided lots of good reasons above for replacing python tests with javascript ones as they break. The counterpoint that the js tests do not test the platform and hardware isn't (and shouldn't) be a concern to gaia. We are very busy making an awesome phone frontend in gaia and our priority is not to be a testcase for the platform groups and hardware vendors. The js marionette tests that exercise apps running on b2g-desktop are meant to test the apps. Other groups should have gaia-independent testing that ensures that apps that work on b2g-desktop also work on devices.

(Also pinging stephend because he asked me about this the other day.)

Flags: needinfo?(stephen.donner)

Gareth Aye [:gaye] (back from PTO)

Comment 6

•

10 years ago

What I meant was b2g-desktop OOP***

Gareth Aye [:gaye] (back from PTO)

Updated

•

10 years ago

No longer depends on: 914584, 973351

Jason Smith [:jsmith]

Comment 7

•

10 years ago

(In reply to Gareth Aye [:gaye] from comment #5)
> I disagree. Neither b2g-desktop nor the ability to run marionette js tests
> on devices should block this. I understand why the python group has used
> gaia apps to test other parts of the firefox os stack in the past. Gaia apps
> are great test cases for the other parts of firefox os. Other groups are
> welcome to use gaia for this purpose (it's open source : ), but they should
> fork gaia first (or expect for gaia changes to break them).

None of these arguments counter any of the points I've claimed. Sorry, but the argument of maintainability for loss of test coverage is simply a fundamentally wrong methodology to go with. You need to solve the on device & oop case before this even gets considered. Otherwise, we get a net loss in test coverage from this bug argues for.

> 
> Marcus provided lots of good reasons above for replacing python tests with
> javascript ones as they break. The counterpoint that the js tests do not
> test the platform and hardware isn't (and shouldn't) be a concern to gaia.
> We are very busy making an awesome phone frontend in gaia and our priority
> is not to be a testcase for the platform groups and hardware vendors. The js
> marionette tests that exercise apps running on b2g-desktop are meant to test
> the apps. Other groups should have gaia-independent testing that ensures
> that apps that work on b2g-desktop also work on devices.

It is a concern for QA's UI Automation. We have many gecko regressions that get caught by smoke tests & the associated UI automation. The goal of those tests to automate manual smoketests on device. The Marionette JS framework doesn't meet that need right now, so if we move tests over to that framework, we'll end up losing coverage that hits gecko & lower by caveat. We have a top objective in our QA organization that we have to get out of business of doing manual smoketests & move towards an automation-driven model. How exactly are we supposed hit that objective if we're losing device coverage like what this bug proposes?

> 
> (Also pinging stephend because he asked me about this the other day.)

This isn't up for negotiation. The platform *is* a concern for QA for smoketests, as we have many gecko regressions that are captured by these tests when we execute them daily. Losing this coverage would hurt us badly, as we would be forced to be in business moving more manual than we already are, which seems to go an entire opposite direction of what 2014 goals. Remember that a quality department at the full picture, not just one piece of the system.

I'm re-adding the dependencies, as I don't think any of your arguments are compelling to arguing to moving the Python tests to JS. The ultimate goal QA needs is automation that models on device smoketests, which Marionette JS currently fails to meet that objective.

Depends on: 914584, 973351

Jason Smith [:jsmith]

Comment 8

•

10 years ago

Also - There's also failures even at the Gaia level that will only show up on device, not in b2g desktop running in process. A good example of this recently was with the rocketbar - we saw two cases where e.me search results were only returning results if they were running on b2g desktop builds. If they ran on device, then we didn't get results due to a server-side error on e.me's side. In this case, this would fail our manual smoketest for the rocketbar, as this would prevent someone from being able to add an e.me app to the homescreen.

Gareth Aye [:gaye] (back from PTO)

Comment 9

•

10 years ago

(In reply to Jason Smith [:jsmith] from comment #7) 
> None of these arguments counter any of the points I've claimed.

Okay I will address them individually.

> This is a bad idea. This is essentially removes device & OOP coverage of the clock workflows, which overall decreases the scope of test coverage covered by these tests. The OOP issue is serious, as desktop builds won't capture the key gecko regressions triggered by e10s.

Clock is a webapp. Clock's tests are intended to catch clock regressions. Clock does not have any code that is special-cased for devices or processes/threads. Active gaia clock development is not meant to test gecko or gonk. We are focusing on improving clock's functionality. If anyone is interested in testing gecko or gonk they should either fork gaia or write their own test cases.

> JS marionette tests have the same problem, if not, worse actually. These tests were shut off on TBPL on linux machines for having a high, unsustainable, intermittent failure rate.

Most all of the intermittent failures in the past month have been related to timeouts checking out gaia from hg. jgriffin landed a patch recently to help this situation. I haven't put a lot of effort into polling releng to make the suite visible again, but I'll probably do that in the near future.

> They aren't the same - desktop in process builds won't trigger the e10s paths that are found on device.

They are the same from gaia's perspective. Gaia writes apps against web apis which are the same on devices, emulators, browsers, etc. Other groups are invited to fork gaia for their own purposes though!

> Technically each sets of tests should be aiming to solve different problems. In tree tests aim for what's driven by Gaia, where as on device tests focus on the full picture.

I am totally okay with other groups writing tests against gecko and gonk. Gaia is happy to offer other groups our codebase, but I doubt we'll compromise on workflow to help groups which are using our codebase to test other code/hardware.

> Both sets of tests use the same fundamental UI interaction pattern as well, so the intermittent failure problem here ends up being persistent across each framework.

You are correct that the python tests and javascript tests both consume the marionette server apis. Not all intermittent tests are caused by server bugs (most are not...).

> No it would not - that would greatly endanger us in not having on device coverage & no e10s coverage. This snake is dead also - we've already made a front-facing decision that devs can use JS, the UI automation team can use Python.

The problem Marcus is getting at is that it's a pain in the butt when changes to gaia apps break python tests. We are *helping* QA. You get our codebase for free and you may use it as you please to test gecko and gonk. You are asking too much if, in addition to using our project as a testcase, you also want for us to update your code when ours changes.

> I tend to disagree with this - the Python tests serve filling the current testing gap of a need for e10s coverage & on device coverage.

I agree that devices and platform changes need to be tested. I disagree that it is gaia's responsibility.

> This is wrong. B2G Desktop builds currently run in process, not out of process. They are completely different code paths. There's also other factors that come into play when tests run on device as well that desktop builds won't capture.

There are no code paths in clock which are special cased for whether the app is running in the same process as other code or not. Which of the factors that you're thinking about should clock care about? If clock has to care about whether it's running on a device, then there is a platform bug to fix. Seriously! The web is an abstraction and our clock app does *not* want to look under the hood.

> This is a dead snake of a discussion - we've already concluded that each set of tests serve different purposes. Each group is solving a different problem here - for QA, the intentional need of the Python tests was for an automated solution for our smoketests on device. Moving to JS tests here right now isn't go fit our equivalent need - we need e10s & on device coverage. Those gaps need to be filled first before we reconsider opening this discussion.

If QA needs e10s & on device coverage and you need for gaia to not break this coverage, then get off of our lawn (don't use gaia : ). Write your own web pages!

> Wrong. Each group is open to choose whatever language that fits them best. We aren't a dictatorship community on programming language choice.

Yes. Go write your python code! Gaia will not maintain for you though! It is worth mentioning that we are a web operation and that by using javascript we are eating our own dogfood :).

> You better not do that - that's up to zac's team to decide what they want to do with their tests.

Zac and whomever else can write whatever tests they want. They should not assume that gaia will not change. We have lots of good work planned for the next few months!

> We've already had two directors (Faramarz & Lucas) confirm that developers cannot turn off Python tests as they please without consulting Zac's team first, given that he's the module owner on those tests.

Here Marcus means tests that have been transplanted onto gaia's continuous builds on Travis. If QA doesn't want to be downstream from gaia, that's ok. You all can fork gaia or write your own test cases.

> When you solve the e10s problem & on device problem with the JS tests, then you can feel free to discuss this with our team again. But until those two needs are met, what's proposed here is out of question.

It is not gaia's job to test gecko and hardware... seriously! We have bugs to fix and improvements to make. Let us do our jobs! Don't use our project and then complain because the changes that we make aren't what you were hoping for.

> Sorry, but the argument of maintainability for loss of test coverage is simply a
> fundamentally wrong methodology to go with. You need to solve the on device
> & oop case before this even gets considered. Otherwise, we get a net loss in
> test coverage from this bug argues for

> It is a concern for QA's UI Automation. We have many gecko regressions that
> get caught by smoke tests & the associated UI automation. The goal of those
> tests to automate manual smoketests on device. The Marionette JS framework
> doesn't meet that need right now, so if we move tests over to that
> framework, we'll end up losing coverage that hits gecko & lower by caveat.
> We have a top objective in our QA organization that we have to get out of
> business of doing manual smoketests & move towards an automation-driven
> model. How exactly are we supposed hit that objective if we're losing device
> coverage like what this bug proposes?

If it's a concern for you, fork gaia or don't depend on us!

Gareth Aye [:gaye] (back from PTO)

Updated

•

10 years ago

No longer depends on: 914584, 973351

Jason Smith [:jsmith]

Comment 10

•

10 years ago

(In reply to Gareth Aye [:gaye] from comment #9)
> (In reply to Jason Smith [:jsmith] from comment #7) 
> > None of these arguments counter any of the points I've claimed.
> 
> Okay I will address them individually.
> 
> > This is a bad idea. This is essentially removes device & OOP coverage of the clock workflows, which overall decreases the scope of test coverage covered by these tests. The OOP issue is serious, as desktop builds won't capture the key gecko regressions triggered by e10s.
> 
> Clock is a webapp. Clock's tests are intended to catch clock regressions.
> Clock does not have any code that is special-cased for devices or
> processes/threads. Active gaia clock development is not meant to test gecko
> or gonk. We are focusing on improving clock's functionality. If anyone is
> interested in testing gecko or gonk they should either fork gaia or write
> their own test cases.

This is poor methodology. Our goal is to ship a functioning operating system, not a functioning app. You need the full package here.

> 
> > They aren't the same - desktop in process builds won't trigger the e10s paths that are found on device.
> 
> They are the same from gaia's perspective. Gaia writes apps against web apis
> which are the same on devices, emulators, browsers, etc. Other groups are
> invited to fork gaia for their own purposes though!

Wrong. Gaia also deals with cases of creating their own processes. If you run in process, then your tests are running incorrectly - you won't hit the same code path. We've seen many cases of this happening that leads to tests passing in tree, but failing on device. The nuwa landing provided an excellent example of this, where the keyboard broke entirely, because in tree tests are ran in process, which won't catch a failure that would be experienced on device. This also caused timezone UI changes to not be reflected in the UI as well on device, but not in tree. These situations inherently set us up for situations with broken on device builds, which ends up hurting everyone - people will end up getting blocked in their testing. I remember the Systems FE standup when this happened where everyone claimed they were blocked due to the keyboard not working on device. See the problem here? You can't just assume Gaia development can happen on b2g desktop builds alone. At some point, you have to test your solution on device. Otherwise, you are risking checking in bad code that won't work on device.

> 
> > Technically each sets of tests should be aiming to solve different problems. In tree tests aim for what's driven by Gaia, where as on device tests focus on the full picture.
> 
> I am totally okay with other groups writing tests against gecko and gonk.
> Gaia is happy to offer other groups our codebase, but I doubt we'll
> compromise on workflow to help groups which are using our codebase to test
> other code/hardware.

It isn't testing gecko & gonk. It would be testing the whole picture together end to end.

We already discussed this extensively that there was agreement to have the Python tests in the mainline tree. I don't see why this argument is being reopened? Nobody is going to win here - we're going to place a higher risk of breaking on device builds if we're not maintaining a full suite of tests on checkin. Skipping maintainability of one suite of tests will surely make this situation worse.

> 
> > No it would not - that would greatly endanger us in not having on device coverage & no e10s coverage. This snake is dead also - we've already made a front-facing decision that devs can use JS, the UI automation team can use Python.
> 
> The problem Marcus is getting at is that it's a pain in the butt when
> changes to gaia apps break python tests. We are *helping* QA. You get our
> codebase for free and you may use it as you please to test gecko and gonk.
> You are asking too much if, in addition to using our project as a testcase,
> you also want for us to update your code when ours changes.

There was already agreement on this workflow to my understanding that this was okay way back at the Oslo work week. And we are not just testing gecko & gonk. We're testing the whole thing here together. Given the high regression rate on trunk, trying to move to a model where we decrease maintainability of tests on checkin is going to make the situation worse, not better.

> 
> > I tend to disagree with this - the Python tests serve filling the current testing gap of a need for e10s coverage & on device coverage.
> 
> I agree that devices and platform changes need to be tested. I disagree that
> it is gaia's responsibility.

I think you are wrong. You are risking checking in broken code - you might check in code that might work on b2g desktop, but fail on device. 

> 
> > This is wrong. B2G Desktop builds currently run in process, not out of process. They are completely different code paths. There's also other factors that come into play when tests run on device as well that desktop builds won't capture.
> 
> There are no code paths in clock which are special cased for whether the app
> is running in the same process as other code or not. Which of the factors
> that you're thinking about should clock care about? If clock has to care
> about whether it's running on a device, then there is a platform bug to fix.
> Seriously! The web is an abstraction and our clock app does *not* want to
> look under the hood.

The OOP nature of gecko has a great, unexpected effect on how Gaia operates in many situations. I didn't expect timezone notifications to break in the FTE. I didn't expect the keyboard to break when nuwa landed. In reality, you are inevitably connected to gecko's operations & below. If something breaks under you, then you are going to want to know before you checkin. Otherwise, you could end up checking in broken code due to triggering an underlying bug not seen on b2g desktop.

> 
> > This is a dead snake of a discussion - we've already concluded that each set of tests serve different purposes. Each group is solving a different problem here - for QA, the intentional need of the Python tests was for an automated solution for our smoketests on device. Moving to JS tests here right now isn't go fit our equivalent need - we need e10s & on device coverage. Those gaps need to be filled first before we reconsider opening this discussion.
> 
> If QA needs e10s & on device coverage and you need for gaia to not break
> this coverage, then get off of our lawn (don't use gaia : ). Write your own
> web pages!

As I said - dead snake. We already had this discussion & came to agreement that it was okay to have these tests in the Gaia tree. I fail to see where your argument is providing value here - nobody is going to win from the approach you are taking. It simply risks placing ourselves in situations with busted full device builds.

> 
> > Wrong. Each group is open to choose whatever language that fits them best. We aren't a dictatorship community on programming language choice.
> 
> Yes. Go write your python code! Gaia will not maintain for you though! It is
> worth mentioning that we are a web operation and that by using javascript we
> are eating our own dogfood :).

This is a dead snake. We already concluded that this was okay to have Python tests in the Gaia tree. This is just trolling at this point with no added value in this argument.

> 
> > You better not do that - that's up to zac's team to decide what they want to do with their tests.
> 
> Zac and whomever else can write whatever tests they want. They should not
> assume that gaia will not change. We have lots of good work planned for the
> next few months!

I don't think they are making those assumptions - they have a full time team that tries to keep the tests to up to date with Gaia's changes.

> 
> > We've already had two directors (Faramarz & Lucas) confirm that developers cannot turn off Python tests as they please without consulting Zac's team first, given that he's the module owner on those tests.
> 
> Here Marcus means tests that have been transplanted onto gaia's continuous
> builds on Travis. If QA doesn't want to be downstream from gaia, that's ok.
> You all can fork gaia or write your own test cases.

Dead snake. You are just trolling at this point on something that was agreed on that was okay to do.

> 
> > When you solve the e10s problem & on device problem with the JS tests, then you can feel free to discuss this with our team again. But until those two needs are met, what's proposed here is out of question.
> 
> It is not gaia's job to test gecko and hardware... seriously! We have bugs
> to fix and improvements to make. Let us do our jobs! Don't use our project
> and then complain because the changes that we make aren't what you were
> hoping for.

Absolutely wrong. Ultimately, Gaia is dependent on the quality state of gecko & below. If you are checking in code that is going to work in b2g desktop with Gaia, but cause an underlying bad Gecko bug, then you risk producing a busted on device build. That is simply not acceptable.

> 
> > Sorry, but the argument of maintainability for loss of test coverage is simply a
> > fundamentally wrong methodology to go with. You need to solve the on device
> > & oop case before this even gets considered. Otherwise, we get a net loss in
> > test coverage from this bug argues for
> 
> > It is a concern for QA's UI Automation. We have many gecko regressions that
> > get caught by smoke tests & the associated UI automation. The goal of those
> > tests to automate manual smoketests on device. The Marionette JS framework
> > doesn't meet that need right now, so if we move tests over to that
> > framework, we'll end up losing coverage that hits gecko & lower by caveat.
> > We have a top objective in our QA organization that we have to get out of
> > business of doing manual smoketests & move towards an automation-driven
> > model. How exactly are we supposed hit that objective if we're losing device
> > coverage like what this bug proposes?
> 
> If it's a concern for you, fork gaia or don't depend on us!

This is a dead snake and you are wasting time on arguing on this. We've discussed this extensively previously that there was more value to have the tests in tree with Gaia over having them in a separate repository. It's disappointing to see that you don't understand the value clearly here, so you inevitably pushing a methodology that ends up risking developing poor quality code in Gaia.

The dependencies here are established here & are not up for negotiation. Please do not remove the dependencies again. If you don't understand the value here, then talk more with UI Automation team & blackbox QA team to understand their perspectives. But don't be a troll.

I'm done arguing at this point though.

Depends on: 914584, 973351

Gareth Aye [:gaye] (back from PTO)

Comment 11

•

10 years ago

FTR, I'm not trolling. I firmly believe that gonk tests should live with gonk, gecko tests should live with gecko, and gaia tests should live with gaia. As for what we agreed on in Oslo, I don't think I understood that the purpose of many of webqa's tests was to exercise gecko and gonk through gaia. I'm not saying you shouldn't use gaia for this; I'm saying it's not gaia's responsibility to maintain compatibility with tests you write for other components using it.

Gareth Aye [:gaye] (back from PTO)

Comment 12

•

10 years ago

Also I totally understand the case for adding OOP to b2g-desktop. What I meant to argue was that it doesn't matter for clock right now and for the forseeable future.

Jason Smith [:jsmith]

Comment 13

•

10 years ago

(In reply to Gareth Aye [:gaye] from comment #11)
> FTR, I'm not trolling. I firmly believe that gonk tests should live with
> gonk, gecko tests should live with gecko, and gaia tests should live with
> gaia. As for what we agreed on in Oslo, I don't think I understood that the
> purpose of many of webqa's tests was to exercise gecko and gonk through
> gaia. I'm not saying you shouldn't use gaia for this; I'm saying it's not
> gaia's responsibility to maintain compatibility with tests you write for
> other components using it.

The Gaia UI automation's primary purpose was originally to automate the manual smoketests. They weren't meant to just test Gaia - they are aiming to test the basics of the whole picture.

Also - You can't just only write tests that only exercise one area of the stack - you need to have tests that put the pieces together. There's quite a lot of danger of not having in tree tests written by Gaia developers that don't do this - you can inevitably check in code that works in a Gaia context alone, but fails when integrated with gecko on device. We've been nailed badly for regressions previously because of this, as we saw regressions that worked fine in a Gaia context alone, but failed when put together on device. A great example of this is when we tried to enable OOP keyboard support at the Gaia level - we had tests that did test the keyboard on a basic level in tree. They were in process however, so the tests didn't catch the failure to find out that the keyboard was failing to come up in many places. The opposite applies to - you could have someone in gecko make a breaking change to an API that Gaia relies, which can cause a piece of Gaia to break unexpectedly.

(In reply to Gareth Aye [:gaye] from comment #12)
> Also I totally understand the case for adding OOP to b2g-desktop. What I
> meant to argue was that it doesn't matter for clock right now and for the
> forseeable future.

I'd rather not take that risk - the cost exceeds the value here. We've already had blocking regressions show up on device for clock that were not caught by in tree automation, including a recent smoketest regression (bug 971162). Sometimes historically, the Gaia UI automation saved us to allow to get an early detection of the problem post creation of the build & helped with catching regressions in tree before checkin (I can dig up the bugs, but I definitely remember this happening with Clock specifically). I get the fact there's a developer productivity challenge here, but the trunk quality right now is unstable right now (i.e. we aren't maintaining a consistent usable build), so I don't think it's wise to reduce overall test coverage for the sole benefit of faster checkin processes. If we really want to move tests over to JS, then I think I need to see path forward to ensure that we don't lose test coverage in the process.

Jason Smith [:jsmith]

Comment 14

•

10 years ago

Though thinking about this more - we wouldn't need to be dependent on b2g desktop oop, as we don't have that right now anyways for test coverage. We really only need on device support to have parity between the frameworks to allow Python tests to be converted to JS with equivalent coverage.

No longer depends on: 914584

Gareth Aye [:gaye] (back from PTO)

Comment 15

•

10 years ago

I have some questions about device coverage. We do have https://github.com/mozilla-b2g/marionette-device-host which forwards all of the marionette commands to a device. In that sense, we do have device support for marionette js. However, I also know that webqa has a cluster of devices somewhere in the world and perhaps also some software to do continuous testing on those devices. Is it what you want from bug 973351 for the marionette js tests to be run continuously (by humans or automation) on devices or for it just to be possible to run them on devices?

Flags: needinfo?(jsmith)

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

See Also: → 972688, 961319, 972226, 971585

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

Comment 16

•

10 years ago

From my standpoint:

1) Ideally, tests should strive to provide the best (thorough, appropriate, reliable) coverage, and be run in the environment where they do that (any one of those things, on its own, or a combination thereof) - that doesn't always mean on-device, though for automating the smoketests, at present, that does mean the Python Gaia UI Tests, because of all the capabilities we get, on-device
2) As a real-world example, we often strive, in our team's other Web-testing work, to remove or bolster Selenium WebDriver tests (run against real websites, with real browsers), and replace/augment them with other libraries, like Python requests for URL/status-code checking, BeautifulSoup for XML parsing, etc.  We always aim to pick the best tool for the type of test, in light of coverage that's already in place from developers -- Django unit-tests, for example
3) I would be absolutely elated to see MarionetteJS become a 1st-tier, on-device testrunner, because then we -- the project -- could finally scale to the breadth of coverage and developer buy-in necessary to reach that point
4) I'm all for reducing redundant coverage too, esp. as it could potentially help us focus our efforts on areas that -- at present -- aren't being/can't be addressed by a JS runner without the ability to run on-device
5) As Jason mentions, we definitely have found quite a few bugs (Gonk/Gaia/Gecko) that neither unittests nor MarionetteJS have found, but I can't honestly say that they are OOP or device-dependent, where it comes to the Clock app (in fact, we're quite impressed with the ability and breadth of the clock tests)
6) To answer comment 15 specifically: I believe that we should strive for MarionetteJS, on-device, running in a CI, yes.  It'll have to have the Right Features (tm) and be reliable, but it could very well be one of the chief ways the project can effectively scale + mature from a testing standpoint, going forward.  (The corollary to the early efforts by Web QA + A-team in the Python gaiatest runner/Gaia UI Tests, is that there are now MTBF, Endurance, Performance, and even fuzzing test-suites making full advantage of being on-device.  We obviously can't lose those efforts, though we can and should continue talking about and working towards how we can get those abilities through JS.)

The rest of this discussion probably is best had outside of Bugzilla, in a meeting where we can hash out the concerns and needs from the various stakeholders, and come to a decision we can agree on (hopefully) :-)

Flags: needinfo?(stephen.donner)

Jonathan Griffin (:jgriffin)

Comment 17

•

10 years ago

I agree with Stephend that Bugzilla isn't the place for resolving this argument; I'm going to set up a meeting next week where we can discuss it.

Basically, I see two distinct needs:  app developers need to be able to write and run tests of their apps, and QA needs as large of a suite of end-to-end tests as possible, of which Gaia is a critical piece.  Let's figure out how to satisfy both of those goals.

Jason Smith [:jsmith]

Comment 18

•

10 years ago

Clear needinfo - as the above comments suggest, this should be taken offline in a meeting.

Flags: needinfo?(jsmith)

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

Summary: [Clock] Port all Clock integration tests from Python to JavaScript when MarionetteJS supports on-device testing → [Clock] Port all Clock integration tests from Python to JavaScript in conjunction with MarionetteJS / Gaia UI testing work

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

Assignee: nobody → m

Target Milestone: --- → 1.4 S3 (14mar)

Marcus Cavanaugh [:mcav]

Assignee

Comment 19

•

10 years ago

Based on the discussion in the meeting about Gaia UI testing (results summarized below), one action item for me is to port the clock integration tests to JS. Since this bug was largely made for that purpose, I'm assigning myself for this upcoming sprint. This will NOT involve any deletion of tests, only porting Clock JS tests to match parity with Python. When they have been ported, I'll mark this fixed, as part of the larger exploration into Gaia's longer-term testing plan.


Summary of testing plan (in the voice of :jgriffin):

1 - We're going to stand up MarionetteJS tests on-device in order to see what pain points we hit with the harness and whether engagement with the Gaia developers is easier with the JS tests when they break.

2 - Gareth will work to supply an on-device make target (ala 'make test-integration') that can be used to drive these.  I'll file a bug for this.

3 - Marcus will port the existing Clock tests in Python to JS to help facilitate the comparison.  The Clock tests will not be removed from the Python set for now.

4 - I will contact owners of the Dialer app and see if they'd be willing to contribute a few Dialer on-device tests in JS, so that we have some device-only JS tests as well as the existing set that work well on desktop builds.

5 - We'll meet again in 1 month to discuss where we want to go next.

Jonathan Griffin (:jgriffin)

Updated

•

10 years ago

Updated

•

10 years ago

Target Milestone: 1.4 S3 (14mar) → 1.4 S4 (28mar)

Marcus Cavanaugh [:mcav]

Assignee

Comment 20

•

10 years ago

Attached file Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/17300/commits — Details

Attachment #8393085 - Flags: review?(mmedeiros)

Miller Medeiros [:millermedeiros]

Comment 21

•

10 years ago

Comment on attachment 8393085 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/17300/commits

added some comments on github, it's looking really good but some points can still be improved. - there are also some intermittent failures on Travis (which I know mcav is already trying to fix).

Attachment #8393085 - Flags: review?(mmedeiros)

Marcus Cavanaugh [:mcav]

Assignee

Comment 22

•

10 years ago

Thanks for the thorough review! Agree with nearly all of your comments, will re-flag when ready.

Marcus Cavanaugh [:mcav]

Assignee

Comment 23

•

10 years ago

Comment on attachment 8393085 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/17300/commits

Okay Miller, I'm ready for the second pass.

I refactored the old "clock.js" file into "actions.js", "timer_actions.js", "stopwatch_actions.js", and "alarm_actions.js" and in general tried to clean up a bunch of stuff, in addition to your prior feedback.

Attachment #8393085 - Flags: review?(mmedeiros)

Miller Medeiros [:millermedeiros]

Comment 24

•

10 years ago

Comment on attachment 8393085 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/17300/commits

this looks great! extremely clean code, so easy to understand the tests.. it took me way less time to review it than last time. Just had to open the `*_actions.js` and `*_test.js` side-by-side.. good work!

Attachment #8393085 - Flags: review?(mmedeiros) → review+

Marcus Cavanaugh [:mcav]

Assignee

Comment 25

•

10 years ago

Thanks, Miller. Landed:

master: https://github.com/mozilla-b2g/gaia/commit/30ec580bc8d63ee1cc7c8bd7ee857fbf58cd5d3e

For those keeping track of the Python->JS stuff, this commit includes a comment above each test which mirrors each Python test. All tests (that I'm aware of) written in Python have been ported and included in this commit.

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Resolution: --- → FIXED

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

Whiteboard: [p=13]

Marcus Cavanaugh [:mcav]

Assignee

Comment 26

•

10 years ago

This inadvertently got vaporized during Gaia tree breakage. Relanded:

https://github.com/mozilla-b2g/gaia/commit/f44dde98e8fe285c53b08b799ca07da1c40ed8fc

Kevin Grandon :kgrandon

Updated

•

10 years ago

Depends on: 990798