Closed Bug 1441610 Opened 6 years ago Closed 6 years ago

Begin tracking navigator.webdriver to help detect telemetry coming from test automation

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: sphilp, Assigned: janerik)

Details

Attachments

(2 files, 1 obsolete file)

Following the landing of https://bugzilla.mozilla.org/show_bug.cgi?id=1169290 it may now be possible to collect telemetry around which pings are coming from test automation via monitoring and reporting on the navigation.webdriver interface. This interface is currently on nightly and will ride the train for 60.
I don't understand what you mean by "monitoring and reporting on the navigation.webdriver interface"? Near as I can tell it is controlled by the pref dom.webdriver.enabled which is, according to [1], essentially always true.

Is it "report the value of navigator.webdriver"? That, according to [2], is controlled by marionette.enabled.

I ask because reporting a new pref is pretty straightforward. We can add it to the list[3].

[1]: https://searchfox.org/mozilla-central/rev/056a4057575029f58eb69ae89bc7b6c3c11e2e28/modules/libpref/init/all.js#584
[2]: https://searchfox.org/mozilla-central/rev/056a4057575029f58eb69ae89bc7b6c3c11e2e28/dom/base/Navigator.cpp#1904
[3]: https://searchfox.org/mozilla-central/rev/056a4057575029f58eb69ae89bc7b6c3c11e2e28/toolkit/components/telemetry/TelemetryEnvironment.jsm#184
Flags: needinfo?(sphilp)
Yeah exactly, the latter, whether it's the interface or the pref sgtm :) 

Longer version so it's explicit:

The marionette.enabled and dom.webdriver.enabled prefs were added in the same patch from 1169290 that added the navigation.webdriver interface

dom.webdriver.enabled determines whether the capability of tracking webdriver status is enabled as a whole (in case we needed to remotely disable this functionality)

whereas marionette.enabled and navigation.webdriver determine whether the browser is actively under webdriver control (and as you said they are the same thing, just exposed differently)

Whether the pref or the interface is easier from a telemetry collection pov, I'm good with either.

For this bug: basically the idea is whenever webdriver-based automation is controlling the browser, navigation.webdriver and marionette.enabled will return true. Including either in our telemetry data would allow us to exclude (or include as the case may be) some set of test automation clients from any analysis.
Flags: needinfo?(sphilp)
When Marionette is active, data reporting to Telemetry is disabled
as to not taint data submitted by real users with that from Firefoxen
used in automation:

	https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/testing/marionette/server.js#173-185

Automation environments typically have widely different characteristics
than those of real users, and the fear is that reporting telemetry
from these entirely fictious sessions not operated by humans will
skew the data we use to make informed decisions about Firefox from.

We would like to have telemetry on instrumented Firefox sessions,
but we’d first have to establish a way for the data originating
from these sessions to be marked or collected in a separate bucket:

	https://bugzilla.mozilla.org/show_bug.cgi?id=1401172#c3

I also want to point out that reporting use/invocation of
navigator.webdriver is quite different to reporting whether
marionette.enabled is true.  The intention of navigator.webdriver
is to infer if the user agent is controlled by automation so that
documents/web applications can take alternate code paths when under
test.

See my original post to dev-platform@ about the use cases:

	https://groups.google.com/d/msg/mozilla.dev.platform/GRNpjC4MuH8/lTO4rEx9AgAJ

It is certainly useful for us to know how often navigator.webdriver
is used, but this is orthogonal from when the browser is under
automation.

I think it needs to be clarified in this bug exactly what we want
to measure and collect, and for what purpose.

In summary:

  - Use/invocation of navigator.webdriver can tell us something
    about how many websites have alternate code paths.

  - marionette.enabled can tell us how often Gecko is used in
    automation, but we should probably be careful about mixing
    the overall data submitted from these sessions.

  - dom.webdriver.enabled, like most other dom.*.enabled preferences,
    just controls whether navigator.webdriver is _exposed_ and is not
    IMO worth measuring.
Thanks for the clarification :ato!

It sounds like what we want is just adding marionette.enabled pref to the environment pref list :chutten linked, and then make sure our existing queries/datasets currently being used are excluding pings when marionette.enabled=true. Then we can create a separate bucket/dataset for marionette.enabled=true only data that will be useful particularly for the webdriver project. Sound reasonable?
If automated usage of Firefox is what you want to track, then yes.

We also need to enable Telemetry reporting from Marionette, and I
suspect we need Telemetry peer for this.  I am happy to review any
code changes related to this.
Assignee: nobody → jrediger
Priority: -- → P1
(In reply to Jan-Erik Rediger [:janerik] from comment #6)

> Created attachment 8956840 [details] [diff] [review]
> Track marionette.enabled in default telemetry environment

Without any extensive knowledge of Telemetry, it looks like this
will start monitoring the marionette.enabled preference, but because
of the reasons I gave in comment #3 the data will not be submitted
because Marionette disables Telemetry.  For this to work you also
need to enable Telemetry for Marionette and ensure the submitted
data from Marionette sessions doesn’t taint real user sessions.
Flags: needinfo?(jrediger)
Right, we wanted to first start tracking the preference and then in a second step enable actual data submission and ensure the datasets exclude the test data.
I'll check back with :chutten how to do that.
Comment on attachment 8956840 [details] [diff] [review]
Track marionette.enabled in default telemetry environment

Review of attachment 8956840 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. You'll need Data Collection Review for this new data collection. The form's over here: https://github.com/mozilla/data-review/blob/master/request.md

You can copy that into a comment or a text editor or something and then answer the questions as presented. Then needinfo a Data Steward (I'm a trainee Data Steward at the moment, so you can ni?chutten and I'll take it from there) so that they'll fill out the other form and give the final ok.

The final format of the commit message summary will then be "Bug 1441610 - Track marionette.enabled in default telemetry environment r=chutten data-r=<the data steward>"
Attachment #8956840 - Flags: review?(chutten) → review+
I guess I worry that this will taint data if Telemetry is not
properly disabled in Marionette.  So before you push this
patch, it would be a good idea to evaluate the preferences in
https://searchfox.org/mozilla-central/source/testing/marionette/components/marionette.js#55.
Those preferences were evaluated at the time and found to be adequate (for Desktop Firefox).

This patch will just report the value of the pref if it's set to a non-default value. For now I expect this to have precisely zero effect (since turning marionette on turns data upload off) except maybe for edge cases. And this is fine and intended yak shaving before anyone gets down to the real effort of trying to figure out what Marionette Telemetry should look like.
Flags: needinfo?(jrediger) → needinfo?(chutten)
    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate? 

This is covered by the usual Telemetry documentation in the Firefox Source Documentation.

    Is there a control mechanism that allows the user to turn the data collection on and off?

This collection is covered by the usual Telemetry preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

It is permanent, to be monitored by Jan-Erik.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1

    Is the data collection request for default-on or default-off?

default-on

    Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This will be an ongoing collection

===
ni?liuche to check my work, as this is my first Data Collection Review.
Flags: needinfo?(chutten) → needinfo?(liuche)
chutten, looks good - good clarifications in thread, and this looks like a good first step in differentiating between test environments.
Flags: needinfo?(liuche)
Attachment #8956840 - Attachment is obsolete: true
Attachment #8958092 - Flags: review?(chutten) → review+
Keywords: checkin-needed
I don't see when this was data-reviewed+. Could someone please take a look?
Flags: needinfo?(chutten)
Keywords: checkin-needed
(In reply to Jan-Erik Rediger [:janerik] from comment #8)
> Right, we wanted to first start tracking the preference and then in a second
> step enable actual data submission and ensure the datasets exclude the test
> data.

Please note that in our CI automation test jobs we do not want to submit Telemetry because that would reach external network which we also want to try to disallow for Marionette. See bug 1272255.
I'm the (trainee) Data Steward who provided (provisional) data-review+ in Comment#13, which :liuche confirmed in Comment#14

So we're good! (( I should've been more clear in Comment#13 that the filled-in-form constituted an r+ here ))
Flags: needinfo?(chutten)
Keywords: checkin-needed
Pushed by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/09fef9f36f4e
Track marionette.enabled in default telemetry environment r=chutten data-r=chutten
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/09fef9f36f4e
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: