Closed Bug 1174937 Opened 9 years ago Closed 9 years ago

Telemetry experiments: share of searches recorded in Places that resolve to major web SRPs

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

39 Branch
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wmaggs, Assigned: jhirsch)

References

Details

Attachments

(4 files, 19 obsolete files)

8.25 KB, patch
mak
: review+
benjamin
: feedback+
Details | Diff | Splinter Review
14.76 KB, patch
Details | Diff | Splinter Review
3.99 KB, application/zip
Details
7.49 KB, application/x-xpinstall
Details
As part of deciding how to evolve the navigation and search experience on Firefox, it is important to know how much of a user's experience of using Search Access Points (SAPs) and other mechanisms in the browser to find web pages actually lead immediately to a search results page (SRP). 

A good if not perfect approximation of this share of user search activity that are fulfilled by search engines is the percentage of entries in the moz_places table of the Places database that resolve to an SRP. 

A sample SQL query  to calculate this number for the Google search engine would be: 

SELECT COUNT(*) FROM moz_historyvisits INNER JOIN moz_places ON moz_historyvisits.place_id = moz_places.id WHERE moz_places.url LIKE '%google.com/search?q%'

and Yahoo: 

SELECT COUNT(*) FROM moz_historyvisits INNER JOIN moz_places ON moz_historyvisits.place_id = moz_places.id WHERE moz_places.url LIKE '%/search.yahoo.com/yhs/search?p%'

for Bing:

SELECT COUNT(*) FROM moz_historyvisits INNER JOIN moz_places ON moz_historyvisits.place_id = moz_places.id WHERE moz_places.url LIKE '%bing.com/search?q%'

and finally Amazon: 

SELECT COUNT(*) FROM moz_historyvisits INNER JOIN moz_places ON moz_historyvisits.place_id = moz_places.id WHERE moz_places.url LIKE '%amazon.com/s?%'


The four values from these queries would each then be divided by the total number of records in moz_places for that user to calculate the percentage of searches recorded in Places that lead to a Search results Page (SRP).

We would like these numbers to be exported to a server to be set up by Dev Ops. We've considered the privacy implications and there don't appear to be any concerns. 

The telemetry add-on needed for this data should be created,tested, and ultimately deployed via the process described in: 

https://wiki.mozilla.org/Telemetry/Experiments

Scope of Experiment: Ideally we would like to collect 50-100K data points to make this interesting, either from Beta (probably en-us) or from the share of population of FF Release users that have opted into Telemetry.  Deploying the add-on to both Telemetry and release would be best. 

We would like to get this done and deployed as soon as possible, so we can surface some of the results at Whistler.
I can work on this, I'll work on it tomorrow morning.
Assignee: nobody → 6a68
Attached patch WIP patch (obsolete) — Splinter Review
WIP patch: the basic Places code is there, but still needs error handling and I'm not sure exactly what the xhr endpoint should be.

I've used git with bugzilla in the past; this is an attempt to paste in a diff from an hg commit. Hopefully it's legible.
Comment on attachment 8623307 [details] [diff] [review]
WIP patch

Review of attachment 8623307 [details] [diff] [review]:
-----------------------------------------------------------------

Hey Benjamin - Mind taking a look at this WIP patch? I've called out TODOs throughout. Any feedback would be awesome. Thanks! - Jared
Attachment #8623307 - Flags: feedback?(benjamin)
Benjamin, we're open to your suggestions about the right way to test this on a short schedule, with the proviso we don't want to break anything for users Put it out on beta en-US to start, then move to release for a few days or a week? Open to your suggestions.
Comment on attachment 8623307 [details] [diff] [review]
WIP patch

I will provide feedback on the experiment system, extension bootstrapping, and data collection. The right person to provide feedback about the core places issues is somebody else: maybe Marco Bonardo.


>diff --git a/experiments/serp-fraction-counts/code/bootstrap.js b/experiments/serp-fraction-counts/code/bootstrap.js
>+XPCOMUtils.defineLazyModuleGetter(this, "TelemetryLog",
>+                                  "resource://gre/modules/TelemetryLog.jsm");

I don't see TelemetryLog used anywhere in this file. Please remove this.

>+XPCOMUtils.defineLazyModuleGetter(this, "PlacesUtils",
>+                                  "resource://gre/modules/PlacesUtils.jsm");
>+XPCOMUtils.defineLazyModuleGetter(this, "Promise",
>+                                  "resource://gre/modules/Promise.jsm");

>+function startup() {}
>+function shutdown() {}
>+function install() {
>+  try {
>+    runExperiment();
>+  } catch(ex) {
>+    // TODO: best way to report back on failure?
>+    console.log(ex);
>+  }
>+}

Install is only ever called once. As I mentioned on IRC, I'm concerned that if the code fails (because the users shuts down Firefox while runExperiment is still collecting or sending data), that you'll never run it again. I strongly recommend doing nothing on install() and doing your work on startup().

>+function uninstall() {
>+  // TODO: do we want uninstall to wait for the experiment to finish, or do we
>+  //       want uninstall to abort the experiment?

Immediately abort any pending data collection or sending please.


>+var getGoogleCount = getCount.bind(null, '%google.com/search?q%');
>+var getYahooCount = getCount.bind(null, '%/search.yahoo.com/yhs/search?p%');
>+var getBingCount = getCount.bind(null, '%bing.com/search?q%');
>+var getAmazonCount = getCount.bind(null, '%amazon.com/s?%');

Does this amazon query match every search somebody does on amazon, or only searches through the Firefox search interface? I'm concerned about collecting history data for amazon, since it is not a general web search engine and user expectations and user identification/profiling concerns apply more to this.

>+// TODO
>+var send = function(data) {
>+  return Promise.reject('send method not yet implemented');
>+  // JSONify the data, if necessary (does telemetry do this?)

Yes, the data should be JSON.

>+  // ship down via XHR (what endpoint?)

Please ask mreid for the endpoint and instructions. You need to decide what data you need to include in the payload for proper correlation: presumably at least the channel, locale, and geo.

>+var runExperiment = function() {
>+  return PlacesUtils.promiseDBConnection()
>+    // google bits
>+    .then(getGoogleCount, onError)
>+    .then(saveCount.bind(null, 'google'), onError)
>+    // yahoo bits
>+    .then(PlacesUtils.promiseDBConnection, onError)
>+    .then(getYahooCount, onError)
>+    .then(saveCount.bind(null, 'yahoo'), onError)
>+    // bing bits
>+    .then(PlacesUtils.promiseDBConnection, onError)
>+    .then(getBingCount, onError)
>+    .then(saveCount.bind(null, 'bing'), onError)1434758400
>+    // amzn bits
>+    .then(PlacesUtils.promiseDBConnection, onError)
>+    .then(getAmazonCount, onError)
>+    .then(saveCount.bind(null, 'amazon'), onError)
>+    // total
>+    .then(PlacesUtils.promiseDBConnection, onError)
>+    .then(getTotalCount, onError)
>+    .then(saveCount.bind(null, 'total'), onError)
>+    // xhr
>+    .then(send, onError);

The experiment is not uninstalling itself after successfully completing. I think that is a requirement for this kind of experiment.

>diff --git a/experiments/serp-fraction-counts/manifest.json b/experiments/serp-fraction-counts/manifest.json
>new file mode 100644
>--- /dev/null
>+++ b/experiments/serp-fraction-counts/manifest.json
>@@ -0,0 +1,17 @@
>+{
>+  "publish": true,
>+  "priority": 5,
>+  "name": "Invisible test of the SERP fraction in user history.",

Here and in install.rdf: I'm not sure what SERP means: I'm sure most users won't be able to parse this. Since this is a user-visible experiment name, please give it a name that users can understand what the experiment does.

>+  "description": "Invisible test of the fraction of search pages in the Places mozhistoryvisits DB.",

Similarly here and ininstall.rdf, this description is user-visible. Let's make it less jargony.

>+  "manifest": {
>+    "id": "serp-fraction-counts@experiments.mozilla.org",
>+    "startTime": 1434585600,
>+    "endTime": 1434758400,

This is not enough time for an experiment. Clients check for a new manifest every 24 hours, but since only 15% of our users use Firefox every day, if you want an unbiased sample an experiment needs to run for at least 7 days, and preferably 14.

>+    "maxActiveSeconds": 86400,
>+    "appName": ["Firefox"],
>+    "channel": ["beta", "release"],
>+    "sample": 1.0

Experiment IDs typically have a version # and channel in them. If you want to deploy this to beta and release separately, you should use two different experiments with different IDs: "serp-fraction-counts-beta39@experiments.mozilla.org". That way we can control deployment to beta and release separately.

We never deploy an experiment to 100% of users on a channel. That would make it difficult to deploy other experiments to the same channel, and it's rarely necessary. Can you do this with a 10% sample?
Attachment #8623307 - Flags: feedback?(benjamin)
Hi Benjamin,

Thanks very much for the quick and detailed review, this is super helpful feedback.

Since the sample period is 1-2 weeks, I'm going to work on stuff I'd like to get done before Whistler; I'll pick this up either at Whistler or the week after.

Thanks again,

Jared
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> Comment on attachment 8623307 [details] [diff] [review]
> WIP patch
> 
> I will provide feedback on the experiment system, extension bootstrapping,
> and data collection. The right person to provide feedback about the core
> places issues is somebody else: maybe Marco Bonardo.
> 
> 
> >diff --git a/experiments/serp-fraction-counts/code/bootstrap.js b/experiments/serp-fraction-counts/code/bootstrap.js
> >+XPCOMUtils.defineLazyModuleGetter(this, "TelemetryLog",
> >+                                  "resource://gre/modules/TelemetryLog.jsm");
> 
> I don't see TelemetryLog used anywhere in this file. Please remove this.
> 
> >+XPCOMUtils.defineLazyModuleGetter(this, "PlacesUtils",
> >+                                  "resource://gre/modules/PlacesUtils.jsm");
> >+XPCOMUtils.defineLazyModuleGetter(this, "Promise",
> >+                                  "resource://gre/modules/Promise.jsm");
> 
> >+function startup() {}
> >+function shutdown() {}
> >+function install() {
> >+  try {
> >+    runExperiment();
> >+  } catch(ex) {
> >+    // TODO: best way to report back on failure?
> >+    console.log(ex);
> >+  }
> >+}
> 
> Install is only ever called once. As I mentioned on IRC, I'm concerned that
> if the code fails (because the users shuts down Firefox while runExperiment
> is still collecting or sending data), that you'll never run it again. I
> strongly recommend doing nothing on install() and doing your work on
> startup().
> 
> >+function uninstall() {
> >+  // TODO: do we want uninstall to wait for the experiment to finish, or do we
> >+  //       want uninstall to abort the experiment?
> 
> Immediately abort any pending data collection or sending please.
> 
> 
> >+var getGoogleCount = getCount.bind(null, '%google.com/search?q%');
> >+var getYahooCount = getCount.bind(null, '%/search.yahoo.com/yhs/search?p%');
> >+var getBingCount = getCount.bind(null, '%bing.com/search?q%');
> >+var getAmazonCount = getCount.bind(null, '%amazon.com/s?%');
> 
> Does this amazon query match every search somebody does on amazon, or only
> searches through the Firefox search interface? I'm concerned about
> collecting history data for amazon, since it is not a general web search
> engine and user expectations and user identification/profiling concerns
> apply more to this.

According to metrics team, after the big three search engines Amazon search from FF is really popular, and I think the most popular "one-off" search engine choice from the search bar. 

> 
> >+// TODO
> >+var send = function(data) {
> >+  return Promise.reject('send method not yet implemented');
> >+  // JSONify the data, if necessary (does telemetry do this?)
> 
> Yes, the data should be JSON.
> 
> >+  // ship down via XHR (what endpoint?)

I think we will set up a microserver for after Whistler. 
> 
> Please ask mreid for the endpoint and instructions. You need to decide what
> data you need to include in the payload for proper correlation: presumably
> at least the channel, locale, and geo.
> 
> >+var runExperiment = function() {
> >+  return PlacesUtils.promiseDBConnection()
> >+    // google bits
> >+    .then(getGoogleCount, onError)
> >+    .then(saveCount.bind(null, 'google'), onError)
> >+    // yahoo bits
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getYahooCount, onError)
> >+    .then(saveCount.bind(null, 'yahoo'), onError)
> >+    // bing bits
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getBingCount, onError)
> >+    .then(saveCount.bind(null, 'bing'), onError)1434758400
> >+    // amzn bits
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getAmazonCount, onError)
> >+    .then(saveCount.bind(null, 'amazon'), onError)
> >+    // total
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getTotalCount, onError)
> >+    .then(saveCount.bind(null, 'total'), onError)
> >+    // xhr
> >+    .then(send, onError);
> 
> The experiment is not uninstalling itself after successfully completing. I
> think that is a requirement for this kind of experiment.
> 
> >diff --git a/experiments/serp-fraction-counts/manifest.json b/experiments/serp-fraction-counts/manifest.json
> >new file mode 100644
> >--- /dev/null
> >+++ b/experiments/serp-fraction-counts/manifest.json
> >@@ -0,0 +1,17 @@
> >+{
> >+  "publish": true,
> >+  "priority": 5,
> >+  "name": "Invisible test of the SERP fraction in user history.",
> 
> Here and in install.rdf: I'm not sure what SERP means: I'm sure most users
> won't be able to parse this. Since this is a user-visible experiment name,
> please give it a name that users can understand what the experiment does.
> 
> >+  "description": "Invisible test of the fraction of search pages in the Places mozhistoryvisits DB.",
> 
> Similarly here and ininstall.rdf, this description is user-visible. Let's
> make it less jargony.
> 
> >+  "manifest": {
> >+    "id": "serp-fraction-counts@experiments.mozilla.org",
> >+    "startTime": 1434585600,
> >+    "endTime": 1434758400,
> 
> This is not enough time for an experiment. Clients check for a new manifest
> every 24 hours, but since only 15% of our users use Firefox every day, if
> you want an unbiased sample an experiment needs to run for at least 7 days,
> and preferably 14.
> 

We can go for two weeks.

> >+    "maxActiveSeconds": 86400,
> >+    "appName": ["Firefox"],
> >+    "channel": ["beta", "release"],
> >+    "sample": 1.0
> 
> Experiment IDs typically have a version # and channel in them. If you want
> to deploy this to beta and release separately, you should use two different
> experiments with different IDs:
> "serp-fraction-counts-beta39@experiments.mozilla.org". That way we can
> control deployment to beta and release separately.
> 
> We never deploy an experiment to 100% of users on a channel. That would make
> it difficult to deploy other experiments to the same channel, and it's
> rarely necessary. Can you do this with a 10% sample?

can't we just plan the 7-14 days so no other experiments are running then? Are there other reasons you would not want it to go to everyone?
* Our general principal is to minimize data collection, and so we encourage using sampling whenever possible.
* I don't think you need 3.7M data points to form good statistical conclusions here.
* Again, we really don't ever deploy experiments to everyone, since that precludes other experiments during the same time frame.
OK, let's plan to start June 30 and deploy on beta and at 10% on release. Would a week at 10% get us up to 50K users? I think 10% would be a total of about  150K-200K users, most of whom would presumably be pretty heavy users of Firefox.
A week at 10% should get you well over 50k reports.
Following question from bsmedberg, data (Katie) and ops (Travis, Benson) are determining if a separate endpoint to collect the data is consistent with our privacy policy and security guidelines, otherwise we should use heka.
Flags: needinfo?(kparlante)
Flags: needinfo?(bwong)
Comment on attachment 8623307 [details] [diff] [review]
WIP patch

Hi Marco,

Would you mind taking a look at a WIP telemetry experiment that queries the Places DB?

I'm particularly interested in feedback on any/all of the following:

1. whether the queries are using the right async library (there seem to be many ways to query Places)
2. we want to measure the number of searches that originate from the searchbar or the urlbar. Is there a field in Places that tracks this?
3. any thoughts on the best way to immediately abort an inflight query if the user uninstalls the experiment addon or shuts down the browser
4. whether the queries themselves seem well-formed to you

I'm currently working on addressing bsmedberg's comments, so this code isn't final, but I'd like to get your input as early as possible (the goal is to ship this experiment this week or next).

Thanks very much,

Jared
Attachment #8623307 - Flags: feedback?(mak77)
(In reply to Bill Maggs from comment #11)
> Following question from bsmedberg, data (Katie) and ops (Travis, Benson) are
> determining if a separate endpoint to collect the data is consistent with
> our privacy policy and security guidelines, otherwise we should use heka.

After talking to Travis, sounds like we're good here. This is being treated as essentially statsd->datadog. As long as non sensitive data is sent (no pii), should be no problems from privacy/security standpoint.
Flags: needinfo?(kparlante)
Thanks, Katie!

Benjamin, is it possible to test that this code works inside my own copy of nightly? Say, by installing this inside a user profile using a proxy file?
Flags: needinfo?(benjamin)
Kamil has written a bunch of docs at https://wiki.mozilla.org/QA/Telemetry which describe how to test experiments.
Flags: needinfo?(benjamin)
Comment on attachment 8623307 [details] [diff] [review]
WIP patch

Review of attachment 8623307 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jared Hirsch [:_6a68] [NEEDINFO please!] from comment #12)
> 1. whether the queries are using the right async library (there seem to be
> many ways to query Places)

Yes, using PlacesUtils.promiseDBConnection is the right thing to do. It's an asynchronous connection and it's wrapped by Sqlite.jsm API.

> 2. we want to measure the number of searches that originate from the
> searchbar or the urlbar. Is there a field in Places that tracks this?

Nope, we never instrumented Places to specifically track searches, it tracks history as a generic entity.
On the other side, installed search engines have "purposes" (http://mxr.mozilla.org/mozilla-central/search?string=%22purpose%22&find=browser%2Flocales%2Fen-US%2F), based on the purpose we append different params, and through those params you might be able to recognize the source of the search.  This will add complication so it's up to you to figure if it's worth it.

> 3. any thoughts on the best way to immediately abort an inflight query if
> the user uninstalls the experiment addon or shuts down the browser

I think the statements will be fast enough that you don't care, if the browser is being shutdown, async shutdown will take care of it. The only cancellation API we have in Sqlite.jsm is throwing StopIteration from the onRow handler (see http://mxr.mozilla.org/mozilla-central/source/toolkit/modules/Sqlite.jsm#1250)

> 4. whether the queries themselves seem well-formed to you

Comments following (note, I didn't check previous review comments, sorry if something is dupe)

::: experiments/serp-fraction-counts/code/bootstrap.js
@@ +11,5 @@
> +                                  "resource://gre/modules/TelemetryLog.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "PlacesUtils",
> +                                  "resource://gre/modules/PlacesUtils.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "Promise",
> +                                  "resource://gre/modules/Promise.jsm");

I don't think you need this, DOM promises work quite fine. if you need a deferred just use PromiseUtils.jsm

@@ +45,5 @@
> +
> +var serpCountQuery = 'SELECT COUNT(*) AS count FROM moz_historyvisits ' +
> +                     'INNER JOIN moz_places ' +
> +                     'ON moz_historyvisits.place_id = moz_places.id ' +
> +                     'WHERE moz_places.url LIKE ":providerURL" ;';

So, I'd like to understand why you are counting from moz_historyvisits, when you could just use visit_count from moz_places. You can use Sqlite aggregate functions like SUM(), see https://sqlite.org/lang_aggfunc.html
For example SELECT SUM(visit_count) AS count FROM moz_places...
The join is expensive and the query is expensive enough even without it.

Also LIKE is a very expensive operation, and here you are applying it to each entry in moz_places, that is hundred thousands entries. I have a suggestion about this, you might limit your search based on rev_host

SELECT url FROM moz_places 
WHERE rev_host BETWEEN "moc.nozama." AND "moc.nozama." || X'FFFF'
AND url LIKE "%amazon.com/s?%" ESCAPE '\'

this means where host is _anything_.amazon.com (|| is the concat operator, X'FFFF' is the highest char).

I actually just figured out that we don't have good support for LIKE in Sqlite.jsm, ideally to properly bind a value for LIKE, you should use stmt.escapeStringForLike, but we don't expose the statement to do that.
Is that the reason you didn't bind the statement properly and instead used string replacement? Ideally one should never do string manipulation to bind statements, it's one of the best attack vectors + bug-prone. you can pass binding parameters to execute.
In case you have issues binding the LIKE param (I think we have an assertion somewhere but I am not sure atm) let me know, we can fix the underlying issue in Sqlite.jsm, and temporarily you could just workaround it.

@@ +56,5 @@
> +
> +var getCount = function(url, db) {
> +  try {
> +    var query = serpCountQuery.replace(':providerURL', url);
> +    return db.execute(query);

execute doesn't throw unless you pass broken arguments there should be no need to wrap it

@@ +76,5 @@
> +
> +var totalCount = function(db) {
> +  try {
> +    var query = 'SELECT COUNT(*) AS count FROM moz_historyvisits;';
> +    return db.execute(query);

ditto

@@ +116,5 @@
> +    .then(PlacesUtils.promiseDBConnection, onError)
> +    .then(getTotalCount, onError)
> +    .then(saveCount.bind(null, 'total'), onError)
> +    // xhr
> +    .then(send, onError);

I'm pretty sure you want a Task here...
Attachment #8623307 - Flags: feedback?(mak77)
I sent this email with details on using the new statsd => datadog service.

===
Hi, 

This is the endpoint: <omitted>
This is the repo: https://github.com/mozilla-services/datadog-bridge

I created a *super* simple API to send metrics to datadog. I namespaced everything with "experimental.", since I mostly(geek) hacked this together and deployed it. 

The API looks like this: 
curl -v -d '123.45' https://<endpoint>/gauge/<metric name>
curl -v -d '123' https://<endpoint>/count/<metric name>
curl -v -d '123.45' https://<endpoint>/histogram/<metric name>
curl -v -d 'some string' https://<endpoint>/set/<metric name>
Once you POST data it should be visible in datadog within a minute. 

Some caveats: 

- I skipped much of our ops-awesome-stable-service work and just got it up.
- the first goal is to determine if this is Good Enough and iterate on feedback. 
- It will make/accept any statsd metric for now. Later, it will be whitelist only.

Let me know if you guys have any troubles getting it to work. If there are no major feature changes *and* if it is useful, I'll do the ops work to stabilize it into a real and supported service. 

p.s: this is experimental! It might crash, but I got it in a tmux, while/true loop to restart it. That should tell you everything about how well supported this currently is :)

Thanks!
Ben
===
Flags: needinfo?(bwong)
What happened to this experiment? Are we still planning to run it? If so, what is blocking us?
I'm the blocker; I've been more focused on getting the addon built and working. I've got this ready to submit, just need to verify we are sending valid data to Benson's endpoint; I'll get a patch out today.
Attached patch Updated patch based on feedback (obsolete) — Splinter Review
Attachment #8623307 - Attachment is obsolete: true
Comment on attachment 8640179 [details] [diff] [review]
Updated patch based on feedback

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> Comment on attachment 8623307 [details] [diff] [review]
> WIP patch
>
> I will provide feedback on the experiment system, extension bootstrapping,
> and data collection. The right person to provide feedback about the core
> places issues is somebody else: maybe Marco Bonardo.

Awesome. Thanks for this feedback and for the Places feedback pointer.

>
>
> >diff --git a/experiments/serp-fraction-counts/code/bootstrap.js b/experiments/serp-fraction-counts/code/bootstrap.js
> >+XPCOMUtils.defineLazyModuleGetter(this, "TelemetryLog",
> >+                                  "resource://gre/modules/TelemetryLog.jsm");
>
> I don't see TelemetryLog used anywhere in this file. Please remove this.

Yup, removed. That was left over from initially thinking we'd use it for logging results.

>
> >+XPCOMUtils.defineLazyModuleGetter(this, "PlacesUtils",
> >+                                  "resource://gre/modules/PlacesUtils.jsm");
> >+XPCOMUtils.defineLazyModuleGetter(this, "Promise",
> >+                                  "resource://gre/modules/Promise.jsm");
>
> >+function startup() {}
> >+function shutdown() {}
> >+function install() {
> >+  try {
> >+    runExperiment();
> >+  } catch(ex) {
> >+    // TODO: best way to report back on failure?
> >+    console.log(ex);
> >+  }
> >+}
>
> Install is only ever called once. As I mentioned on IRC, I'm concerned that
> if the code fails (because the users shuts down Firefox while runExperiment
> is still collecting or sending data), that you'll never run it again. I
> strongly recommend doing nothing on install() and doing your work on
> startup().

Noted, thanks. I've moved the code into startup.

>
> >+function uninstall() {
> >+  // TODO: do we want uninstall to wait for the experiment to finish, or do we
> >+  //       want uninstall to abort the experiment?
>
> Immediately abort any pending data collection or sending please.

Sounds great. I've updated the code to check a global isExiting boolean before
issuing any DB queries or sending any data. PlacesUtils._promiseDBConnection
doesn't expose the raw connection, so it's not easy to abort a DB query in
flight. Per Marco's feedback, seems like waiting for a given read query to
return is a good enough solution.

We're using navigator.sendBeacon instead of a raw xhr, so there are no raw
network connections that need to be manually torn down.

>
>
> >+var getGoogleCount = getCount.bind(null, '%google.com/search?q%');
> >+var getYahooCount = getCount.bind(null, '%/search.yahoo.com/yhs/search?p%');
> >+var getBingCount = getCount.bind(null, '%bing.com/search?q%');
> >+var getAmazonCount = getCount.bind(null, '%amazon.com/s?%');
>
> Does this amazon query match every search somebody does on amazon, or only
> searches through the Firefox search interface? I'm concerned about
> collecting history data for amazon, since it is not a general web search
> engine and user expectations and user identification/profiling concerns
> apply more to this.

I've dug into this a bit. While it looks[1] like the base amazon search URL is

  http://www.amazon.com/exec/obidos/external-search

the actual URL to which I'm taken after doing an amazon search bar search seems
to always be of the form

  http://www.amazon.com/s?...

In contrast, any searches that originate on amazon's website are of the form

  http://www.amazon.com/s/...

so it seems like the existing regex will indeed only capture searches that
originate from the Firefox UI.

[1] https://dxr.mozilla.org/mozilla-central/source/browser/locales/en-US/searchplugins/amazondotcom.xml#13

>
> >+// TODO
> >+var send = function(data) {
> >+  return Promise.reject('send method not yet implemented');
> >+  // JSONify the data, if necessary (does telemetry do this?)
>
> Yes, the data should be JSON.
>
> >+  // ship down via XHR (what endpoint?)
>
> Please ask mreid for the endpoint and instructions. You need to decide what
> data you need to include in the payload for proper correlation: presumably
> at least the channel, locale, and geo.

We'll be using a microservice :mostlygeek has put together that will ship our
(non-sensitive) data to datadog.

Per comment 13, it seems like we are ok from a privacy standpoint. Happy to
elaborate if you want to get into the details, though.

>
> >+var runExperiment = function() {
> >+  return PlacesUtils.promiseDBConnection()
> >+    // google bits
> >+    .then(getGoogleCount, onError)
> >+    .then(saveCount.bind(null, 'google'), onError)
> >+    // yahoo bits
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getYahooCount, onError)
> >+    .then(saveCount.bind(null, 'yahoo'), onError)
> >+    // bing bits
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getBingCount, onError)
> >+    .then(saveCount.bind(null, 'bing'), onError)1434758400
> >+    // amzn bits
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getAmazonCount, onError)
> >+    .then(saveCount.bind(null, 'amazon'), onError)
> >+    // total
> >+    .then(PlacesUtils.promiseDBConnection, onError)
> >+    .then(getTotalCount, onError)
> >+    .then(saveCount.bind(null, 'total'), onError)
> >+    // xhr
> >+    .then(send, onError);
>
> The experiment is not uninstalling itself after successfully completing. I
> think that is a requirement for this kind of experiment.

Ah, great idea. I'm curious how much of the uninstall process is handled by the
parent telemetry experiment code, since it looks like none of the existing
experiments actually uninstall themselves. Is the uninstall process triggered
by calling uninstall() inside my addon code, even if my uninstall function
doesn't do much?

>
> >diff --git a/experiments/serp-fraction-counts/manifest.json b/experiments/serp-fraction-counts/manifest.json
> >new file mode 100644
> >--- /dev/null
> >+++ b/experiments/serp-fraction-counts/manifest.json
> >@@ -0,0 +1,17 @@
> >+{
> >+  "publish": true,
> >+  "priority": 5,
> >+  "name": "Invisible test of the SERP fraction in user history.",
>
> Here and in install.rdf: I'm not sure what SERP means: I'm sure most users
> won't be able to parse this. Since this is a user-visible experiment name,
> please give it a name that users can understand what the experiment does.

Cool, updated the title and description in both places.

I don't want to freak users out by phrasing the description so that it sounds
like we're trying to get away with something creepy, like every other company
on the internet. Feedback very welcome on improvements to the wording.

>
> >+  "description": "Invisible test of the fraction of search pages in the Places mozhistoryvisits DB.",
>
> Similarly here and ininstall.rdf, this description is user-visible. Let's
> make it less jargony.
>
> >+  "manifest": {
> >+    "id": "serp-fraction-counts@experiments.mozilla.org",
> >+    "startTime": 1434585600,
> >+    "endTime": 1434758400,
>
> This is not enough time for an experiment. Clients check for a new manifest
> every 24 hours, but since only 15% of our users use Firefox every day, if
> you want an unbiased sample an experiment needs to run for at least 7 days,
> and preferably 14.

Thanks. I've inserted a dummy string until we figure out the final start date.

>
> >+    "maxActiveSeconds": 86400,
> >+    "appName": ["Firefox"],
> >+    "channel": ["beta", "release"],
> >+    "sample": 1.0
>
> Experiment IDs typically have a version # and channel in them. If you want
> to deploy this to beta and release separately, you should use two different
> experiments with different IDs:
> "serp-fraction-counts-beta39@experiments.mozilla.org". That way we can
> control deployment to beta and release separately.

Sounds good, updated to start by running on beta.

>
> We never deploy an experiment to 100% of users on a channel. That would make
> it difficult to deploy other experiments to the same channel, and it's
> rarely necessary. Can you do this with a 10% sample?

Yes, I'm sure 10% would be a more than adequate sample size to draw some
interesting and statistically significant conclusions.

Thanks again for the feedback! Sorry it took me a while to get the response turned around.
Attachment #8640179 - Flags: review?(benjamin)
Comment on attachment 8640179 [details] [diff] [review]
Updated patch based on feedback

Thanks for the great feedback, Marco. I've updated the patch. Responses inline below.

(In reply to Marco Bonardo [::mak] from comment #16)
> (In reply to Jared Hirsch [:_6a68] [NEEDINFO please!] from comment #12)
> > 2. we want to measure the number of searches that originate from the
> > searchbar or the urlbar. Is there a field in Places that tracks this?
> 
> Nope, we never instrumented Places to specifically track searches, it tracks
> history as a generic entity.
> On the other side, installed search engines have "purposes"
> (http://mxr.mozilla.org/mozilla-central/
> search?string=%22purpose%22&find=browser%2Flocales%2Fen-US%2F), based on the
> purpose we append different params, and through those params you might be
> able to recognize the source of the search.  This will add complication so
> it's up to you to figure if it's worth it.

This is great, thanks for pointing it out. We'll probably just do the simplest
thing for now, and possibly do another pass later with more detailed data.

> > 3. any thoughts on the best way to immediately abort an inflight query if
> > the user uninstalls the experiment addon or shuts down the browser
> 
> I think the statements will be fast enough that you don't care, if the
> browser is being shutdown, async shutdown will take care of it. The only
> cancellation API we have in Sqlite.jsm is throwing StopIteration from the
> onRow handler (see
> http://mxr.mozilla.org/mozilla-central/source/toolkit/modules/Sqlite.
> jsm#1250)

Sounds good. Thanks for the confirmation on this point.

> ::: experiments/serp-fraction-counts/code/bootstrap.js
> @@ +11,5 @@
> > +                                  "resource://gre/modules/TelemetryLog.jsm");
> > +XPCOMUtils.defineLazyModuleGetter(this, "PlacesUtils",
> > +                                  "resource://gre/modules/PlacesUtils.jsm");
> > +XPCOMUtils.defineLazyModuleGetter(this, "Promise",
> > +                                  "resource://gre/modules/Promise.jsm");
> 
> I don't think you need this, DOM promises work quite fine. if you need a
> deferred just use PromiseUtils.jsm

Cool, I'll keep that in mind for the next experiment. For now, this code seems
to work, so I'm just going to leave it as-is.

> @@ +45,5 @@
> > +
> > +var serpCountQuery = 'SELECT COUNT(*) AS count FROM moz_historyvisits ' +
> > +                     'INNER JOIN moz_places ' +
> > +                     'ON moz_historyvisits.place_id = moz_places.id ' +
> > +                     'WHERE moz_places.url LIKE ":providerURL" ;';
> 
> So, I'd like to understand why you are counting from moz_historyvisits, when
> you could just use visit_count from moz_places. You can use Sqlite aggregate
> functions like SUM(), see https://sqlite.org/lang_aggfunc.html
> For example SELECT SUM(visit_count) AS count FROM moz_places...
> The join is expensive and the query is expensive enough even without it.

There's no reason for the particular choices I made: this query was literally
the first thing I tried that worked. I've switched to using visit_count instead;
thanks very much for the suggestion.

> 
> Also LIKE is a very expensive operation, and here you are applying it to
> each entry in moz_places, that is hundred thousands entries. I have a
> suggestion about this, you might limit your search based on rev_host
> 
> SELECT url FROM moz_places
> WHERE rev_host BETWEEN "moc.nozama." AND "moc.nozama." || X'FFFF'
> AND url LIKE "%amazon.com/s?%" ESCAPE '\'
> 
> this means where host is _anything_.amazon.com (|| is the concat operator,
> X'FFFF' is the highest char).

Wow. That BETWEEN clause is something else! I've updated the queries to use it.

I'm not sure escapeStringForLIKE or "ESCAPE '\'" is needed here, since I'm
not putting any user input into the query, nor am I matching on URL fragments
that include underscores or percent-encoded values. Am I missing any other
reason to include the ESCAPE clause?

> 
> I actually just figured out that we don't have good support for LIKE in
> Sqlite.jsm, ideally to properly bind a value for LIKE, you should use
> stmt.escapeStringForLike, but we don't expose the statement to do that.
> Is that the reason you didn't bind the statement properly and instead used
> string replacement? Ideally one should never do string manipulation to bind
> statements, it's one of the best attack vectors + bug-prone. you can pass
> binding parameters to execute.

Ha, nope.  I hacked that query together at lightning speed, there wasn't deep
thought behind the choices. Or rather, the deep thought was to keep it as
simple as possible, given that there's no user input / injection attacks to
worry about here.

I've updated the queries to bind params following the Sqlite.jsm docs on MDN.

> In case you have issues binding the LIKE param (I think we have an assertion
> somewhere but I am not sure atm) let me know, we can fix the underlying
> issue in Sqlite.jsm, and temporarily you could just workaround it.

Binding the LIKE argument seems to be working, thanks.

> 
> @@ +56,5 @@
> > +
> > +var getCount = function(url, db) {
> > +  try {
> > +    var query = serpCountQuery.replace(':providerURL', url);
> > +    return db.execute(query);
> 
> execute doesn't throw unless you pass broken arguments there should be no
> need to wrap it

Great, thanks.

> 
> @@ +76,5 @@
> > +
> > +var totalCount = function(db) {
> > +  try {
> > +    var query = 'SELECT COUNT(*) AS count FROM moz_historyvisits;';
> > +    return db.execute(query);
> 
> ditto

Fixed.

> 
> @@ +116,5 @@
> > +    .then(PlacesUtils.promiseDBConnection, onError)
> > +    .then(getTotalCount, onError)
> > +    .then(saveCount.bind(null, 'total'), onError)
> > +    // xhr
> > +    .then(send, onError);
> 
> I'm pretty sure you want a Task here...

Thanks for the tip; I've run across Task, but haven't dug into the docs much.
I'll probably leave this in its current barely-works-but-works form, and look
into Task for the next one of these.
Attachment #8640179 - Flags: feedback?(mak77)
Attachment #8640179 - Attachment is patch: true
Comment on attachment 8640179 [details] [diff] [review]
Updated patch based on feedback

Review of attachment 8640179 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like you might have \r\n line endings instead of \n that is the mozilla coding style, you should be able to setup your editor to stick to unix line endings.

Btw, the queries look good, let me give you some more hints:

::: experiments/serp-fraction-counts/code/bootstrap.js
@@ +8,5 @@
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "PlacesUtils",
> +                                  "resource://gre/modules/PlacesUtils.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "Promise",
> +                                  "resource://gre/modules/Promise.jsm");

nit: you should likely use DOM promises instead of Promise.jsm, I will post some suggestions below regarding that

@@ +16,5 @@
> + * experiment code
> + */
> +
> +// if true, abort any remaining DB requests or beacons
> +var isExiting = false;

nit: in new code we prefer to use "let" instead of var

@@ +21,5 @@
> +
> +// see https://bugzil.la/1174937#c16 for explanation of query optimizations
> +var query = "SELECT SUM(visit_count) AS count, url FROM moz_places " +
> +                     "WHERE rev_host BETWEEN :reversed AND :reversed || X'FFFF' " +
> +                     "AND url LIKE :fuzzy";

nit: Here you might use a template string to avoid the many apices and concats (plus it allows you to use double apices in queries, when needed). may also use const.
const COUNT_QUERY = `SELECT ...
                     ...
                     AND url LIKE :fuzzy`;

@@ +63,5 @@
> +  // query returns undefined if there are no visits to the specified page; replace with 0
> +  var count = results && results[0] && results[0].getResultByName('count') || 0;
> +  if (Number.isInteger(count)) {
> +    counts[providerName] = count;
> +    return Promise.resolve(count);

let saveCount = function(providerName, results) {
  return new Promise((resolve, reject) => {
    ...
    resolve(count);
    ...
    reject(...);
  });
}

Btw, I don't see why this must return a promise, it's synchronous, could just return and throw normally.

@@ +72,5 @@
> +
> +var getTotalCount = function(db) {
> +  if (isExiting) { return Promise.reject(new Error('aborting because isExiting is true')); }
> +  var totalQuery = 'SELECT COUNT(*) AS count FROM moz_historyvisits;';
> +  return db.execute(totalQuery);

let getTotalCount = Task.async(function* (db) {
  if (isExiting) {
    throw new Error...
  }
  yield db.execute(`
    SELECT count(*)...
  `);
});

@@ +83,5 @@
> +  return isFinite(result) ? Math.round(result * 100) : null;
> +};
> +
> +var sendBeacon = function(url, data) {
> +  if (isExiting) { return; }

per coding style you should never oneline ifs

@@ +140,5 @@
> +
> +var runExperiment = function() {
> +  if (isExiting) { return; }
> +  // get a window, or wait till a window is opened, then continue.
> +  var win = Services.wm.getMostRecentWindow('navigator:browser');

Use RecentWindow.jsm::getMostRecentBrowserWindow

@@ +150,5 @@
> +};
> +
> +var _runExperiment = function() {
> +  if (isExiting) { return; }
> +  return PlacesUtils.promiseDBConnection()

returning a promise is pointless since you are invoking this as a normal function, you might want to just close the chain with a .catch(error => { // do something with error })

but first, let me clarify why I was suggesting to use Task.jsm...

@@ +160,5 @@
> +    // yahoo bits
> +    .then(PlacesUtils.promiseDBConnection, onError.bind(null, 'promiseDBConnection.yahoo'))
> +    .then(function(db) {
> +      return db.execute(query, searchProviders['yahoo']);
> +    }, onError.bind(null, 'getYahooCount'))

Basically this would become

let _runExperiment = Task.async(function* () {
  if (isExiting) {
    return;
  }

  let db = yield PlacesUtils.promiseDBConnection();

  for (let providerName in searchProviders) {
    try {
      yield db.execute(query, searchProviders['google'];
      saveCount('google');
    } catch (ex) {
      onError(...);
    }
  }
  try {
    ...totalCount
  } catch (ex) {
    ...
  }
});

Then you would invoke this like:

_runExperiment.catch(ex => {
  // do something with the exception.
}).then(()=> {
  uninstall();
});

You see this is far more readable than a long chain of promises and error handlers since it's like normal synchronous code, and can be expanded to new engines very easily (nothing to do since there's a for loop)
If you want to interrupt at the first error, just remove the try/catches, the exception will be handled by the external catch.
Attachment #8640179 - Flags: feedback?(mak77) → feedback+
Thanks so much for the helpful and thorough feedback, Marco.
Comment on attachment 8640179 [details] [diff] [review]
Updated patch based on feedback

I got a ping about this from bmaggs. I was waiting for mak's review before making any more comments. In case it's not clear, mak's feedback was not a review and he does expect to re-review this once you've addressed his concerns.

Without waiting, though, this is still strange:

+// XXX update these URLs for beta vs release populations
+const countUrl = 'https://statsd-bridge.services.mozilla.com/count/beta39.1174937.serpfraction.';
+const gaugeUrl = 'https://statsd-bridge.services.mozilla.com/gauge/beta39.1174937.serpfraction.';

What does this XXX mean? Typically to divide an experiment, we include the channel either in the URL or in the payload (or both), for example:

telemetry.mozilla.org/<pingtype>/<channel>

Does this not work with datadog? In any case, we shouldn't have XXX comments present in final code.

Does statsd-bridge.services.mozilla.com have SSL pinning enabled in Firefox so that we're sure this data doesn't get intercepted?

Please do not record any amazon search URLs as part of this experiment. We already record search counts via telemetry, so recording only URLs that come from our builtin search providers is redundant. And otherwise, we promise not to record user history as part of telemetry. Generic search providers is different from visiting specific websites.

There is no documentation included in this patch for the data format. Please include at a minimum a DOCUMENTATION file which documents the URL endpoints and data/data format being sent by this experiment.

This experiment still does not uninstall itself. It appears to be calling the botostrap uninstall() method at various points, but that doesn't actually uninstall the experiment. You need to use Experiments.disableExperiment("FROM_API") to finish the experiment after it has done its work. Otherwise it will continue to be active for the next 24 hours and submit additional pings if the user launches Firefox again.

Again, this is just for the bootstrap/data-collection pieces. You still need a code review from mak for the main part of the code.
Attachment #8640179 - Flags: review?(benjamin) → review-
note I forgot to point out that clearly from a task you can return a value like...

let getTotalCount = Task.async(function* (db) {
  ...
  return count
});
Thanks for the feedback, Benjamin!

Yup, I'm aware I'll need an r+ from Marco (and from you) before this can land; I'll resubmit when I've addressed the latest round of feedback.
Attached patch Updated patchSplinter Review
Uploading the revised telemetry experiment diff. I'll insert review responses next.
Attachment #8640179 - Attachment is obsolete: true
Attachment #8654408 - Flags: review?(mak77)
Attachment #8654408 - Flags: review?(benjamin)
(In reply to Marco Bonardo [::mak] (spotty available until 24 Aug) from comment #23)

> Comment on attachment 8640179 [details] [diff] [review]
> Updated patch based on feedback
> 
> Review of attachment 8640179 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks like you might have \r\n line endings instead of \n that is the
> mozilla coding style, you should be able to setup your editor to stick to
> unix line endings.

I think this might have been caused by copy-pasting the patch, rather than
actually uploading a file. I've tried having Bugzilla pick the file this time, 
and the diff file should (according to vim) have unix line endings. Happy to
try again if not.

> 
> Btw, the queries look good, let me give you some more hints:
> 
> ::: experiments/serp-fraction-counts/code/bootstrap.js
> @@ +8,5 @@
> > +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> > +XPCOMUtils.defineLazyModuleGetter(this, "PlacesUtils",
> > +                                  "resource://gre/modules/PlacesUtils.jsm");
> > +XPCOMUtils.defineLazyModuleGetter(this, "Promise",
> > +                                  "resource://gre/modules/Promise.jsm");
> 
> nit: you should likely use DOM promises instead of Promise.jsm, I will post
> some suggestions below regarding that

Awesome, Promise.jsm removed. I also updated MDN to suggest not using Promise.jsm
for Gecko 29+.

> 
> @@ +16,5 @@
> > + * experiment code
> > + */
> > +
> > +// if true, abort any remaining DB requests or beacons
> > +var isExiting = false;
> 
> nit: in new code we prefer to use "let" instead of var

I've replaced var with let or const throughout.

> 
> @@ +21,5 @@
> > +
> > +// see https://bugzil.la/1174937#c16 for explanation of query optimizations
> > +var query = "SELECT SUM(visit_count) AS count, url FROM moz_places " +
> > +                     "WHERE rev_host BETWEEN :reversed AND :reversed || X'FFFF' " +
> > +                     "AND url LIKE :fuzzy";
> 
> nit: Here you might use a template string to avoid the many apices and
> concats (plus it allows you to use double apices in queries, when needed).
> may also use const.
> const COUNT_QUERY = `SELECT ...
>                      ...
>                      AND url LIKE :fuzzy`;

I've switched to 'const' and a template string.

> 
> @@ +63,5 @@
> > +  // query returns undefined if there are no visits to the specified page; replace with 0
> > +  var count = results && results[0] && results[0].getResultByName('count') || 0;
> > +  if (Number.isInteger(count)) {
> > +    counts[providerName] = count;
> > +    return Promise.resolve(count);
> 
> let saveCount = function(providerName, results) {
>   return new Promise((resolve, reject) => {
>     ...
>     resolve(count);
>     ...
>     reject(...);
>   });
> }
> 
> Btw, I don't see why this must return a promise, it's synchronous, could
> just return and throw normally.

The only reason I was using a promise was to be able to chain promises
down at the bottom. No longer needed, since we will just use a Task, so
this function becomes very simple.

> 
> @@ +72,5 @@
> > +
> > +var getTotalCount = function(db) {
> > +  if (isExiting) { return Promise.reject(new Error('aborting because isExiting is true')); }
> > +  var totalQuery = 'SELECT COUNT(*) AS count FROM moz_historyvisits;';
> > +  return db.execute(totalQuery);
> 
> let getTotalCount = Task.async(function* (db) {
>   if (isExiting) {
>     throw new Error...
>   }
>   yield db.execute(`
>     SELECT count(*)...
>   `);
> });

Thanks very much for spelling this out. I've replaced my version with yours.

> 
> @@ +83,5 @@
> > +  return isFinite(result) ? Math.round(result * 100) : null;
> > +};
> > +
> > +var sendBeacon = function(url, data) {
> > +  if (isExiting) { return; }
> 
> per coding style you should never oneline ifs

Thanks. I've expanded those one-liners throughout.

> 
> @@ +140,5 @@
> > +
> > +var runExperiment = function() {
> > +  if (isExiting) { return; }
> > +  // get a window, or wait till a window is opened, then continue.
> > +  var win = Services.wm.getMostRecentWindow('navigator:browser');
> 
> Use RecentWindow.jsm::getMostRecentBrowserWindow

Fantastic. Thanks for the tip.

> 
> @@ +150,5 @@
> > +};
> > +
> > +var _runExperiment = function() {
> > +  if (isExiting) { return; }
> > +  return PlacesUtils.promiseDBConnection()
> 
> returning a promise is pointless since you are invoking this as a normal
> function, you might want to just close the chain with a .catch(error => { //
> do something with error })

Yeah, using a Task makes this all so much simpler :-)

> 
> but first, let me clarify why I was suggesting to use Task.jsm...
> 
> @@ +160,5 @@
> > +    // yahoo bits
> > +    .then(PlacesUtils.promiseDBConnection, onError.bind(null, 'promiseDBConnection.yahoo'))
> > +    .then(function(db) {
> > +      return db.execute(query, searchProviders['yahoo']);
> > +    }, onError.bind(null, 'getYahooCount'))
> 
> Basically this would become
> 
> let _runExperiment = Task.async(function* () {
>   if (isExiting) {
>     return;
>   }
> 
>   let db = yield PlacesUtils.promiseDBConnection();
> 
>   for (let providerName in searchProviders) {
>     try {
>       yield db.execute(query, searchProviders['google'];
>       saveCount('google');
>     } catch (ex) {
>       onError(...);
>     }
>   }
>   try {
>     ...totalCount
>   } catch (ex) {
>     ...
>   }
> });
> 
> Then you would invoke this like:
> 
> _runExperiment.catch(ex => {
>   // do something with the exception.
> }).then(()=> {
>   uninstall();
> });
> 
> You see this is far more readable than a long chain of promises and error
> handlers since it's like normal synchronous code, and can be expanded to new
> engines very easily (nothing to do since there's a for loop)
> If you want to interrupt at the first error, just remove the try/catches,
> the exception will be handled by the external catch.

OK, I spent some quality time with Task.jsm, and it finally dawned on me:
tasks give you all the readability and niceness of generator code internal
to the task function, while external consumers of the task enjoy the simplicity
of managing a promise, rather than an iterator.

Thanks for the very detailed example. I've updated the code to use tasks.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #25)
> Comment on attachment 8640179 [details] [diff] [review]
> Updated patch based on feedback
> 
> I got a ping about this from bmaggs. I was waiting for mak's review before
> making any more comments. In case it's not clear, mak's feedback was not a
> review and he does expect to re-review this once you've addressed his
> concerns.
> 
> Without waiting, though, this is still strange:
> 
> +// XXX update these URLs for beta vs release populations
...
> In any case, we shouldn't have XXX comments
> present in final code.

Sounds good, I've removed the comment.

> Does statsd-bridge.services.mozilla.com have SSL pinning enabled in Firefox
> so that we're sure this data doesn't get intercepted?

I looked into this, and pinning is indeed enabled.

More detail, in case you're interested: ulfr pointed me at this config file,
which shows that all subdomains of services.m.c are pinned to Digicert:

https://mxr.mozilla.org/mozilla-central/source/security/manager/tools/PreloadedHPKPins.json#215

The cert for statsd-bridge is issued by Digicert, which I verified by visiting
the page and examining the cert. So, we're all set there.

> Please do not record any amazon search URLs as part of this experiment. We
> already record search counts via telemetry, so recording only URLs that come
> from our builtin search providers is redundant. And otherwise, we promise
> not to record user history as part of telemetry. Generic search providers is
> different from visiting specific websites.

I'm not sure I understand how recording search counts via telemetry is relevant
here: the goal of this experiment is to determine the *fraction* of visits that
correspond to search result pages, and I don't see how the existing telemetry
data answers that question.

As I mentioned in my earlier comment (comment 21), the amazon URL we're matching
seems to be specific to amazon searches initiated in the browser. We count up the
number of visits to that URL, then divide by the total number of visits. I don't
quite see how this amounts to recording user history, could you maybe explain
your concern in a bit more detail?

> There is no documentation included in this patch for the data format. Please
> include at a minimum a DOCUMENTATION file which documents the URL endpoints
> and data/data format being sent by this experiment.
> 
> This experiment still does not uninstall itself. It appears to be calling
> the botostrap uninstall() method at various points, but that doesn't
> actually uninstall the experiment. You need to use
> Experiments.disableExperiment("FROM_API") to finish the experiment after it
> has done its work. Otherwise it will continue to be active for the next 24
> hours and submit additional pings if the user launches Firefox again.

I noticed that these two requirements (DOCUMENTATION file and needing to call
Experiments.disableExperiment) don't seem to be documented anywhere, don't show
up in any of the past experiments in the telemetry-experiments-server repo, and
weren't mentioned in the first review.

To help move reviews along more quickly in the future, I've documented these points
in the Engineering section of the Telemetry Experiments wiki page[1]. I'm sure I've
introduced some errors there, but hopefully it's a step in the right direction, and
you can fill in gaps / make corrections.

I've also updated the MDN page for install.rdf[2] to document that type=128
corresponds to telemetry experiments.

[1] https://wiki.mozilla.org/Telemetry/Experiments
[2] https://developer.mozilla.org/en-US/Add-ons/Install_Manifests#type

> Again, this is just for the bootstrap/data-collection pieces. You still need
> a code review from mak for the main part of the code.

Sounds good! I've r?'d you and mak on this round, and I will for future rounds, too.
Benjamin, I heard from Jared you want Marco to look at this again. NI to you, Marco, so you hopefully can review before the European weekend. We would really like to get this deployed, and the long Labor Day weekend in the US will slow things down.
Flags: needinfo?(mak77)
Yes, the code hasn't been reviewed yet, which is a basic requirement.
Comment on attachment 8654408 [details] [diff] [review]
Updated patch

Review of attachment 8654408 [details] [diff] [review]:
-----------------------------------------------------------------

yes, this is much better. From a code and Places points of view the patch looks good.
Since I never wrote or reviewed experiments, and I don't know that API very well, I'm leaving experiment-related details to bsmedberg.

::: experiments/serp-fraction-counts/code/bootstrap.js
@@ +23,5 @@
> +// if true, abort any remaining DB requests or beacons
> +let isExiting = false;
> +
> +// see https://bugzil.la/1174937#c16 for explanation of query optimizations
> +const query = `SELECT SUM(visit_count) AS count, url FROM moz_places

nit: it's not important at all here, but if you should ever touch Firefox internal code, the coding style for constants is uppercase with underscores (like EXAMPLE_CONST_VALUE)

@@ +68,5 @@
> +  counts[providerName] = count;
> +}
> +
> +// returns an integer percentage or null if either operand was invalid
> +// division operator handles type coercion for us

nit: I guess the first line needs a period, or this is unreadable.

@@ +115,5 @@
> +// implements nsIWindowMediatorListener
> +// pulled from MDN: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIWindowMediator#addListener()
> +let windowListener = {
> +  onOpenWindow: function(aWindow) {
> +    Services.wm.removeListener(windowListener);

I assume you don't want *any* window, you should probably skip the notification if this is not a browser window.

@@ +154,5 @@
> +    return;
> +  }
> +  yield db.execute(`
> +    SELECT COUNT(*) AS count FROM moz_historyvisits;
> +  `);

nit: making the return explicit would improve readability.
Attachment #8654408 - Flags: review?(mak77) → review+
Flags: needinfo?(mak77)
> > Please do not record any amazon search URLs as part of this experiment. We
> > already record search counts via telemetry, so recording only URLs that come
> > from our builtin search providers is redundant. And otherwise, we promise
> > not to record user history as part of telemetry. Generic search providers is
> > different from visiting specific websites.
> 
> I'm not sure I understand how recording search counts via telemetry is
> relevant
> here: the goal of this experiment is to determine the *fraction* of visits
> that
> correspond to search result pages, and I don't see how the existing telemetry
> data answers that question.

Amazon is not a general-purpose search engine, and I expect users to be surprised that we would send data about their search habits. Since "No Surprises" is one of our core privacy principles, please exclude Amazon from this experiment.

As I understand this experiment, the reason we're doing this experiment at all is that you care about all searches, not just searches from our builtin providers. So e.g. if you go to google.com and search, this experiment will count that even though our SAP counts don't. Given what you've said about the URL scheme, that's not true of the Amazon searches, and then I don't understand why the data is valuable. More technically the fact that there's a distinction between s? and s/ in the URL is not a guarantee, and your "it seems to be" is clearly insufficient guarantee of desired behavior.

If all you care about is the fraction of builtin searches, you can just count the total number of pageloads and we already count the total number of searches and you have your fraction. No need to go diving through the history database.

> I noticed that these two requirements (DOCUMENTATION file and needing to call
> Experiments.disableExperiment) don't seem to be documented anywhere, don't
> show
> up in any of the past experiments in the telemetry-experiments-server repo,
> and
> weren't mentioned in the first review.

The documentation requirement isn't specific to experiments, it's all product code that collects data. This is the first experiment that collects custom data, so that's why it didn't come up before.

.disableExperiment is also unique to this experiment, because all previous experiments expire after a certain period of time, they don't uninstall themself. Because this experiment is a one-shot thing, the uninstall is different.
Comment on attachment 8654408 [details] [diff] [review]
Updated patch

drive-by code question:

Services.wm.removeListener(windowListener);
This can be called even if addListener was never called. Is that safe?

I still don't see docs for this in the patch, but I read enough of the code to be comfortable with the data you're collecting.

So, data-review+ from me *if* you remove Amazon and add a correct doc file describing the statsd data.
Attachment #8654408 - Flags: review?(benjamin) → feedback+
Hi Bill - Do we want result pages originating from Firefox UI, or from web navigation as well? I thought it was the former, but :bsmedberg seems to think it's the latter, see below.

If it is the latter, it sounds like there is a simpler way to get the search percentage.

> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #34)
> 
> As I understand this experiment, the reason we're doing this experiment at
> all is that you care about all searches, not just searches from our builtin
> providers. So e.g. if you go to google.com and search, this experiment will
> count that even though our SAP counts don't. Given what you've said about
> the URL scheme, that's not true of the Amazon searches, and then I don't
> understand why the data is valuable. More technically the fact that there's
> a distinction between s? and s/ in the URL is not a guarantee, and your "it
> seems to be" is clearly insufficient guarantee of desired behavior.
> 
> If all you care about is the fraction of builtin searches, you can just
> count the total number of pageloads and we already count the total number of
> searches and you have your fraction. No need to go diving through the
> history database.
Flags: needinfo?(wmaggs)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #35)
> Comment on attachment 8654408 [details] [diff] [review]
> Updated patch
> 
> drive-by code question:
> 
> Services.wm.removeListener(windowListener);
> This can be called even if addListener was never called. Is that safe?

Good question. Running `Services.wm.removeListener({})` in the
Browser Toolkit console doesn't throw, though if you pass it a
primitive value (nothing, or null, or a string), it does throw.

The MDN page for nsIWindowMediator doesn't document how the function
handles invalid arguments, and I'm not familiar enough with the
Mozilla C++ code to understand the function's source:

https://dxr.mozilla.org/mozilla-central/source/xpfe/appshell/nsWindowMediator.cpp?offset=0#758-766

So, since windowMediator will always be an object, this should be
safe, but it would take someone with deeper C++ Gecko knowledge
to give a definitive answer.

> 
> I still don't see docs for this in the patch, but I read enough of the code
> to be comfortable with the data you're collecting.
> 
> So, data-review+ from me *if* you remove Amazon and add a correct doc file
> describing the statsd data.

Great, I'll remove Amazon.

Could you point me to the doc file format you'd like me to use?
Flags: needinfo?(benjamin)
(In reply to Jared Hirsch [:_6a68] [NEEDINFO please!] from comment #36)
> Hi Bill - Do we want result pages originating from Firefox UI, or from web
> navigation as well? I thought it was the former, but :bsmedberg seems to
> think it's the latter, see below.
> 
> If it is the latter, it sounds like there is a simpler way to get the search
> percentage.
> 
> > (In reply to Benjamin Smedberg  [:bsmedberg] from comment #34)
> > 
> > As I understand this experiment, the reason we're doing this experiment at
> > all is that you care about all searches, not just searches from our builtin
> > providers. So e.g. if you go to google.com and search, this experiment will
> > count that even though our SAP counts don't. Given what you've said about
> > the URL scheme, that's not true of the Amazon searches, and then I don't
> > understand why the data is valuable. More technically the fact that there's
> > a distinction between s? and s/ in the URL is not a guarantee, and your "it
> > seems to be" is clearly insufficient guarantee of desired behavior.
> > 
> > If all you care about is the fraction of builtin searches, you can just
> > count the total number of pageloads and we already count the total number of
> > searches and you have your fraction. No need to go diving through the
> > history database.

I care about all searches from Firefox for the providers we can practically get data about.
Flags: needinfo?(wmaggs)
Hi Benjamin,

I've got another question for you. Per Bill's input in comment 38, I guess we want all searches, not just searches that originated with Firefox UI.

Would you mind explaining this simpler approach in a bit more detail:

> If all you care about is the fraction of builtin searches, you can just
> count the total number of pageloads and we already count the total number of
> searches and you have your fraction. No need to go diving through the
> history database.

It sounds like this patch is about to get much simpler ^_^

Thanks, Jared
Firefox already records all the search counts from the browser access points in the SEARCH_COUNTS histogram. If all you need is those counts, they already exist and you can go get them.

If you need to compare those to the total number of pageloads, I think you'll need to also count the total number of pageloads, but that isn't especially hard. You'll also need to decide things like whether you care about toplevel pageloads, or all pageloads, and whether in-page navigation (history.pushState) should be counted.

But since those are counters, they are very easy to measure and report on using the existing infrastructure.
Flags: needinfo?(benjamin)
Attached patch Final tweaks per comment 35 (obsolete) — Splinter Review
Here's a slightly updated patch (added DOCUMENTATION, removed amazon). I didn't want to lose marco's r+, so I've uploaded this iteration as a separate attachment. If that's bad form, I'm happy to adjust the bug; let me know.

Bill and I decided to just ship the code we've got, since it has gotten this far already. We'll explore the existing telemetry data when designing followup experiments.

I've guessed at a start date of 2 October 2015 in the manifest. Happy to change it as needed.

What are the next steps from here?

Thanks very much for your help throughout this process.

Jared
Flags: needinfo?(benjamin)
Attached patch Final tweaks per comment 35 (obsolete) — Splinter Review
Forgot the DOCUMENTATION file when generating the patch. It's now included.
Attachment #8667022 - Attachment is obsolete: true
Comment on attachment 8667023 [details] [diff] [review]
Final tweaks per comment 35

Yes this is fine.
Flags: needinfo?(benjamin)
Comment on attachment 8667023 [details] [diff] [review]
Final tweaks per comment 35

Review of attachment 8667023 [details] [diff] [review]:
-----------------------------------------------------------------

::: manifest.json
@@ +6,5 @@
> +  "info": "https://bugzilla.mozilla.org/show_bug.cgi?id=1174937",
> +  "manifest": {
> +    "id": "serp-fraction-counts-beta42@experiments.mozilla.org",
> +    "startTime": "1443830400",
> +    "endTime": "1444435200",

need to update this endTime which is Oct 9.

::: code/bootstrap.js
@@ +183,5 @@
> +/*
> + *  bootstrapped addon code
> + */
> +
> +function startup() {

please use this gStarted pattern here:
http://hg.mozilla.org/webtools/telemetry-experiment-server/file/59365ce5cabe/experiments/flash-protectedmode-beta/code/bootstrap.js#l14

@@ +197,5 @@
> +function install() {
> +}
> +function uninstall() {
> +  exit();
> +  Experiments.disableExperiment("FROM_API");

Does this work?  .disableExperiment is defined in Experiments.instance(). But I don't think it's necessary to explicitly call it.

And why was this "FROM_API" here? Is it just for debugging purposes, or does this end up tagged somewhere in the data analysis that you'll do? Note that this uninstall() function is also called when the experiment is disabled by the user, or at the end of the experiment. So virtually everyone will be tagged as "FROM_API".
Blocks: 1214897
Attached patch Final tweaks per comment 35 (obsolete) — Splinter Review
Updating patch to move start date to 9 Nov 2015
Attachment #8667023 - Attachment is obsolete: true
Waiting for my mercurial account to be reactivated before I can push to hg.m.o.

MOC bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1216376
Hi Felipe,

Sorry, I missed your comments yesterday. Responses below:

(In reply to :Felipe Gomes from comment #44)
> Comment on attachment 8667023 [details] [diff] [review]
> Final tweaks per comment 35
> 
> Review of attachment 8667023 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: manifest.json
> @@ +6,5 @@
> > +  "info": "https://bugzilla.mozilla.org/show_bug.cgi?id=1174937",
> > +  "manifest": {
> > +    "id": "serp-fraction-counts-beta42@experiments.mozilla.org",
> > +    "startTime": "1443830400",
> > +    "endTime": "1444435200",
> 
> need to update this endTime which is Oct 9.

Yup, I've moved the start date to Nov 9, and the end date to Nov 16.

> 
> ::: code/bootstrap.js
> @@ +183,5 @@
> > +/*
> > + *  bootstrapped addon code
> > + */
> > +
> > +function startup() {
> 
> please use this gStarted pattern here:
> http://hg.mozilla.org/webtools/telemetry-experiment-server/file/59365ce5cabe/
> experiments/flash-protectedmode-beta/code/bootstrap.js#l14

OK, added.

Is there a bug open to track the fact that the startup method runs more than once?
If not, I can file one.

Also, it would be great to add this pattern to the telemetry experiment wiki;
I've been trying to capture undocumented requirements there as I encounter them,
as in comment 30, for instance.


> 
> @@ +197,5 @@
> > +function install() {
> > +}
> > +function uninstall() {
> > +  exit();
> > +  Experiments.disableExperiment("FROM_API");
> 
> Does this work?  .disableExperiment is defined in Experiments.instance().
> But I don't think it's necessary to explicitly call it.

The .disableExperiment call was inserted at bsmedberg's request, see comment 25.

Given the number of rounds of feedback that have gone into this bug already,
I'd prefer to keep it unless it's absolutely necessary to remove it.

> 
> And why was this "FROM_API" here? Is it just for debugging purposes, or does
> this end up tagged somewhere in the data analysis that you'll do? Note that
> this uninstall() function is also called when the experiment is disabled by
> the user, or at the end of the experiment. So virtually everyone will be
> tagged as "FROM_API".

We aren't using the telemetry backend for our data, we'll be sending our results
to datadog. This is discussed in earlier comments in the bug, in case you're
interested in learning more.

Thanks for the feedback; I'll upload an updated patch shortly.
Attachment #8675968 - Attachment is obsolete: true
Hi Felipe,

I don't have any experience pushing to hg.m.o, and I seem to be hitting some kind of error (see https://6a68.pastebin.mozilla.org/8849933).

Would you mind pushing the code for me? The newer patch attached to this bug is up-to-date and includes fixes for the issues you pointed out.

Once it's pushed, :kjozwiak is going to start the QA process for the experiment.

Thanks!

Jared
Flags: needinfo?(felipc)
Attached patch 1174937-serp-fraction-counts (obsolete) — Splinter Review
Hi Jared,

yeah, I can push the code for you. But before doing that, since we are working on another experiment at the same time, I'd prefer if we test it locally to sort out any problems before pushing (and then afterwards we do some sanity checking on staging after it's pushed). Kamil is able to do that, since he is doing the same in bug 1193089, so I'm just posting a slightly modified patch here.

What I changed is that the uninstall() method is part of the bootstrap.js API, so the code should be calling something else when it wants to uninstall the experiment by itself.  This is to avoid .disableExperiment being called when the experiment system is automatically disabling it (which will implicitly call disableExperiment in that situation, with a different termination reason).
Flags: needinfo?(felipc)
Attachment #8677045 - Attachment filename: 1174937-serp-fraction-counts → 1174937-serp-fraction-counts -- patch for testing
Take a look at the test plan that was prepared for the e10s experiment: https://public.etherpad-mozilla.org/p/e10s_experiment

You could write something similar to that to specify the main things you want to have tested with this experiment
Thanks very much, Felipe! I'm running out of time tonight, but I'll generate a test plan tomorrow and attach the link to this bug.
Thanks Jared! In the past, I've had to go through the entire bug and try creating my own test cases by reading all the comments in the bug which took a bit of time depending on how long the bug was. With yesterdays e10s experiment in bug # 1193089, the test scenarios that were provided definitely made things a lot faster. I would have mentioned something earlier in our IRC chat so you could start them earlier but we never did it that way in the past.
(In reply to Kamil Jozwiak [:kjozwiak] from comment #53)
> Thanks Jared! In the past, I've had to go through the entire bug and try
> creating my own test cases by reading all the comments in the bug which took
> a bit of time depending on how long the bug was. With yesterdays e10s
> experiment in bug # 1193089, the test scenarios that were provided
> definitely made things a lot faster. I would have mentioned something
> earlier in our IRC chat so you could start them earlier but we never did it
> that way in the past.

No worries, Kamil! It seems like asking for a test plan simplifies your job quite a bit; what would you think about making the test plan a requirement for future experiments? You could document the requirement on the telemetry experiment wiki, and you could even link to the e10s test plan as an example. Might make your job a whole lot nicer :-)
Hi Kamil - Here's a first pass at a test plan. Comments welcome, here or in the etherpad:

https://public.etherpad-mozilla.org/p/bug_1174937_search_experiment_qa_plan

Let me know if I need to do anything else for you to get started with testing.

Thanks again!
Flags: needinfo?(kjozwiak)
> No worries, Kamil! It seems like asking for a test plan simplifies your job
> quite a bit; what would you think about making the test plan a requirement
> for future experiments? You could document the requirement on the telemetry
> experiment wiki, and you could even link to the e10s test plan as an
> example. Might make your job a whole lot nicer :-)

Yup, good point! I'll add something into the wiki in a few minutes. Should make things a lot easier in the future.

The test plan looks great, it will give be a really good idea where I should start and what area's I should be looking at. Has the experiment been pushed yet? I don't see it under:

* http://hg.mozilla.org/webtools/telemetry-experiment-server/
* https://telemetry-experiment-dev.allizom.org/
Flags: needinfo?(kjozwiak)
Kamil: no, it hasn't been pushed yet. I was hoping you could do some local testing before we push
> Kamil: no, it hasn't been pushed yet. I was hoping you could do some local
> testing before we push

Where can I pull/get the experiment from? I usually pull the experiment from the repo [1] and build/test locally. Sometimes the experiment get's pushed into the staging server [2] so I don't bother pulling/building and just use the firefox-manifest [3] from the staging server.

Should I pull the patches from this bug and apply them into the telemetry experiment repo [1]? (I haven't done it this way before so I'm just making sure)

* [1] http://hg.mozilla.org/webtools/telemetry-experiment-server/
* [2] https://telemetry-experiment-dev.allizom.org/
* [3] https://telemetry-experiment-dev.allizom.org/firefox-manifest.json
(In reply to Kamil Jozwiak [:kjozwiak] from comment #58)
> Should I pull the patches from this bug and apply them into the telemetry
> experiment repo [1]? (I haven't done it this way before so I'm just making
> sure)

Yep! You can just download the attachment, and then `patch -p1 < attachment`

or

> hq qimport -n bug1174937 https://bug1174937.bmoattachments.org/attachment.cgi?id=8677045
> hg qpush
Hi Kamil - Have you been able to get the code for this experiment running locally? If not, I'm happy to (try to) help you get running :-)
Flags: needinfo?(kjozwiak)
> Hi Kamil - Have you been able to get the code for this experiment running
> locally? If not, I'm happy to (try to) help you get running :-)

Apologies Jared! There's currently a AsyncShutdown crash that's affecting aurora due to the e10s experiment so I was helping Vladan debug the problem yesterday. However, I'm getting the following error when I try to build the experiment once I pull/apply the patch using the instructions via comment # 59:

>> Traceback (most recent call last):
>>  File "build.py", line 117, in <module>
>>    build(os.path.normpath(os.path.abspath(path)), baseurl)
>>  File "build.py", line 75, in build
>>    experiments = get_experiments()
>>  File "build.py", line 44, in get_experiments
>>    e = Experiment(experiment_path)
>>  File "/home/pony/telemetry-experiment-server/experiment.py", line 107, in __init__
>>    validate(self.manifest[k])
>>  File "/home/pony/telemetry-experiment-server/experiment.py", line 14, in validate_time
>>    validate_int(d)
>>  File "/home/pony/telemetry-experiment-server/experiment.py", line 21, in validate_int
>>    raise ValueError("Value %s is not an integer" % (d,))
>> ValueError: Value 1447113600 is not an integer

Steps I'm currently using:

* hg clone http://hg.mozilla.org/webtools/telemetry-experiment-server/
* cd into the telemetry-experiment-server directory
* hg qimport -n bug1174937 https://bug1174937.bmoattachments.org/attachment.cgi?id=8677045
* hg qpush
* check to make sure the patch is applied via hg qapplied
* python build.py _out "http://localhost:8080"

This usually works and creates a _out folder with all the needed files that are used to launch the experiment via localhost or my personal server which lets me test on VM's that can't reach the localhost server. I'm not sure if I'm doing something incorrectly? Perhaps the patch isn't being applied cleanly?
Flags: needinfo?(kjozwiak) → needinfo?(6a68)
Felipe, do you know if we are working around bug 1217282 with a jsfilter for this experiment?
Is anyone making sure that that happens?
Flags: needinfo?(felipc)
Yeah I am aware, will make sure it happens 

Also posting here the other pref that you mentioned to me to not forget it: "datareporting.policy.dataSubmissionEnabled"
Flags: needinfo?(felipc)
Hey Kamil,

Looks like the startDate and endDate in manifest.json must be ints, and my patch has them wrapped in quotes. This is a great example of what happens when a config file's accepted format is undocumented.

It also looks like there's an undocumented requirement to either check in the xpi, or check in a build.py file. I discovered this from reading the main build.py source, after running the experiment locally failed for me.

I'll work on updating the patch and let you know when it's ready.
Flags: needinfo?(6a68)
In fact because of the new signing requirements we have to check in all the .xpis.
Here's the updated patch with integer dates in the manifest. Otherwise it's unmodified from attachment 8677045 [details] [diff] [review] that Felipe uploaded.
Attachment #8676541 - Attachment is obsolete: true
Attachment #8677045 - Attachment is obsolete: true
Attached file experiment.xpi (obsolete) —
OK, here's the unsigned addon. Kamil, I'm able to get build.py to run without errors using this xpi and the patch I just updated. Hopefully it'll work for you, too.

However, the addon signing requirement might be another hurdle: the AMO validator doesn't recognize the telemetry experiment type in install.rdf:

> Error: The only valid values for <em:type> are 2, 4, 8, and 32.
> Any other values are either invalid or deprecated.

Is that AMO validation error going to block deploying this addon? If so, I can work with AMO to get a fix landed for the validator code right away. Benjamin, what do you think?
Flags: needinfo?(kjozwiak)
Flags: needinfo?(benjamin)
Based on IRC conversation with :andym, seems like the right next step is to file a bug under Addon Validation, so I'll do that now and update this bug when the signed experiment is ready.
Flags: needinfo?(benjamin)
Comment on attachment 8680973 [details]
experiment.xpi

Jason, can we get this XPI signed by the full review AMO root please?
Flags: needinfo?(jthomas)
Attached file experiment.xpi (obsolete) —
Please see attached.
Attachment #8680973 - Attachment is obsolete: true
Flags: needinfo?(jthomas)
OK, we've got a signed experiment, thanks :jason and :Mossop!

Hi Kamil (relocating the ni down here), let me know if you need anything else to test this puppy.
Flags: needinfo?(kjozwiak)
Whoops, lost the ni. Trying again
Flags: needinfo?(kjozwiak)
The current start date is Monday 9 November. I'm really hoping we can stick with that start date, as other folks in the org are anticipating the data being available the following week.

As soon as Kamil signs off, the remaining steps are:
- I'll ping Felipe to land the patch
- I'll email release-drivers@ with the info mentioned in the telex wiki[1]
- Once release-drivers sign off, I'll reopen bug 1214897 to get the code deployed.

Fingers crossed that this comes together in the next 2 days ^_^
Just a quick update relating to QA efforts. I started testing this yesterday and went through most of the basic tests [1] without any problems. I had to change the startTime to something earlier [2] for testing purposes. After the experiment gets installed, I didn't see any information being pinged back into DataDog. I tried using HTTP logging to see if there was any contact with "https://statsd-bridge.services.mozilla.com" but couldn't find anything.

Jared and myself spent a few hours debugging trying to figure out why data isn't being passed down into the DataDog server once the experiment is installed. Using "./firefox-bin -no-remote > out.txt 2>&1" with the prefs below, we get the following error:

1446833191535 addons.xpi-utils WARN Error: XPI database modified after shutdown began (resource://gre/modules/addons/XPIProvider.jsm -> resource://gre/modules/addons/XPIProviderUtils.js:451:17) JS Stack trace: this.XPIDatabase.saveChanges@XPIProviderUtils.js:451:17 < XPIDB_updateActiveAddons@XPIProviderUtils.js:1398:9 < XPI_shutdown@XPIProvider.jsm:2639:7 < _startProvider/AMProviderShutdown/<@AddonManager.jsm:843:21 < Promise@Promise-backend.js:385:5 < _startProvider/AMProviderShutdown@AddonManager.jsm:841:16 < Barrier/this.client.addBlocker/promise</trigger@AsyncShutdown.jsm:686:23 < Barrier.prototype<._wait@AsyncShutdown.jsm:830:7 < Barrier.prototype<.wait@AsyncShutdown.jsm:814:28 < AddonManagerInternal.shutdownManager<@AddonManager.jsm:1233:15

- [1] https://public.etherpad-mozilla.org/p/ShareSearchesTelemetryExperimentQA
- [2] http://kamiljozwiak.io/firefox-manifest.json
- [3] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Mac_OS_X

Pref's Used:

>> Services.prefs.setBoolPref("experiments.logging.dump", true);
>> Services.prefs.setCharPref("experiments.logging.level", "Trace");
>> Services.prefs.setCharPref("experiments.force-sample-value", ".1");
>> Services.prefs.setBoolPref("xpinstall.signatures.required", false);
>> experiments.manifest.uri;http://kamiljozwiak.io/firefox-manifest.json
Flags: needinfo?(kjozwiak)
Jared, I've started testing the experiment and ran into an issue when it comes to shutting down fx during installation. I've also ran into some minor things that I'm not 100% sure about so I've added those as well. I figured I would start posting the issues that I've found so you can start looking into them rather than giving you a list at the end. I tried to make issue # 3 as readable as possible.

Potential Issue #1:
===================

While testing scenario 4 from the test plan in comment # 5, the following data was uploaded into DataDog at 00:51:00 on Nov. 9th:

* experimental.beta42.1174937.serpfraction.clients = 0.5
* experimental.beta42.1174937.serpfraction.total = 1.3

I'm not sure if serpfraction.clients = 0.5 is a valid value (it's always been 1 while I've been testing), but serpfraction.total = 1.3 definitely looks strange. I'm not sure how you can end up with a total of 1.3? Is this a bug in the calculation or is this a legitimate value that's been some how derived from the experiment?

I couldn't reproduce this a second time :/ But if this is a bug, I imagine will see it with the larger population.

STR:

* install a fresh fx and change all the appropriate values under about:config
* launch the experiment (I believe there's two bookmarks in total for Windows 10)
* checked datadog expecting clients = 1 & serpfraction.total = 2 but received serpfraction.clients = 0.5 & serpfraction.total = 1.3

Potential Issue #2:
===================

When there's a completely empty profile and the percentage calculation is "null" [Telex.percentage of 0 / 0 is null], the data is being placed into the .error containers under DataDog:

* Telex: sendBeacon: url, "https://statsd-bridge.services.mozilla.com/count/beta42.1174937.serpfraction.google.error" data: 1
* Telex: sendBeacon: url, "https://statsd-bridge.services.mozilla.com/count/beta42.1174937.serpfraction.bing.error" data: 1
* Telex: sendBeacon: url, "https://statsd-bridge.services.mozilla.com/count/beta42.1174937.serpfraction.yahoo.error" data: 1

Is this the correct behaviour? Or should the null be a 0 and the entries placed into the non-error containers under DataDog?

Issue #3 [shutdown issue]
=========================

Sometimes the experiment will stay installed and never send any data back to DataDog when it's interrupted during installation. This doesn't happen every single time but I've ran into it twice already while testing scenario 5 from the test plan in comment # 55. I'm guessing it's timing related and it all depends when you press "command + q" during the experiment installation process.

STR:

- while the experiment is being installed, quickly "command + q" and exit FX

After pressing "command + q" during the installation process, I received the following errors:

<===================================LOG==============================================>
1447051520599 Browser.Experiments.Experiments DEBUG Experiments #0::evaluateExperiments() - activating experiment serp-fraction-counts-beta42@experiments.mozilla.org
1447051520600 Browser.Experiments.Experiments TRACE ExperimentEntry #0::start() for serp-fraction-counts-beta42@experiments.mozilla.org
1447051520600 Browser.Experiments.Experiments TRACE ExperimentEntry #0::reconcileAddonState()
1447051520600 Browser.Experiments.Experiments INFO ExperimentEntry #0::reconcileAddonState() - Installing add-on.

Handler function threw an exception: TypeError: threadActor is null
Stack: TabActor.prototype._windowReady@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/webbrowser.js:1657:1
TabActor.prototype._changeTopLevelDocument/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/webbrowser.js:1594:7
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:86:14
Line: 1657, column: 1
Handler function NRL_getSecurityInfo threw an exception: TypeError: this.transport is null
Stack: DSC_send@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1344:5
NEA_addSecurityInfo@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/webconsole.js:2005:5
NRL_getSecurityInfo@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/toolkit/webconsole/network-monitor.js:265:5
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:86:14
NRL_onStartRequest@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/toolkit/webconsole/network-monitor.js:219:5
Line: 1344, column: 5

1447051521294 addons.repository WARN Search failed when adding add-ons to cache
1447051521295 Browser.Experiments.Experiments TRACE ExperimentEntry #0::_installAddon() - onDownloadEnded for serp-fraction-counts-beta42@experiments.mozilla.org
1447051521295 Browser.Experiments.Experiments TRACE   ExperimentEntry #0::_installAddon() - onInstallStarted for serp-fraction-counts-beta42@experiments.mozilla.org
<====================================================================================>

Once I restarted fx after shutting it down via "command +q", I received the following errors:

<===================================LOG==============================================>
1447051541721 Browser.Experiments.Experiments TRACE ExperimentEntry #0::reconcileAddonState()
1447051541724 Browser.Experiments.Experiments	INFO ExperimentEntry #0::Activating add-on: serp-fraction-counts-beta42@experiments.mozilla.org
console.log: Telex: startup
console.log: Telex: runExperiment
console.log: Telex: did runExperiment find a window?  false
1447051541740 Browser.Experiments.Experiments INFO ExperimentEntry #0::Add-on has been enabled: serp-fraction-counts-beta42@experiments.mozilla.org
Telex: startup
Telex: runExperiment
Telex: did runExperiment find a window? false
Telex: onOpenWindow bootstrap.js:149
Telex.onOpenWindow: does domWindow exist? ChromeWindow → chrome://browser/content/devtools/webconsole.xul bootstrap.js:154
Telex: onDomWindowReady bootstrap.js:119
Telex.onDomWindowReady failed:  TypeError: domWindow.removeEventListener is not a function
Stack trace:
onDomWindowReady@resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///Users/kjozwiak/Library/Application%20Support/Firefox/Profiles/ziww90o6.default/extensions/serp-fraction-counts-beta42@experiments.mozilla.org.xpi!/bootstrap.js:121:5 bootstrap.js

console.log: Telex: onOpenWindow
console.log: Telex.onOpenWindow: does domWindow exist? Window {close:..,stop:..,focus:..,blur:..,open:..,alert:..,confirm:..,prompt:..,print:..,showModalDialog:..,postMessage:..,captureEvents:..,releaseEvents:..,getSelection:..,getComputedStyle:..,matchMedia:..,moveTo:..,moveBy:..,resizeTo:..,resizeBy:..,scroll:..,scrollTo:..,scrollBy:..,requestAnimationFrame:..,cancelAnimationFrame:..,getDefaultComputedStyle:..,scrollByLines:..,scrollByPages:..,sizeToContent:..,updateCommands:..,find:..,dump:..,setResizable:..,getAttention:..,getAttentionWithCycleCount:..,setCursor:..,maximize:..,minimize:..,restore:..,notifyDefaultButtonLoaded:..,getGroupMessageManager:..,beginWindowMove:..,fetch:..,createImageBitmap:..,btoa:..,atob:..,setTimeout:..,clearTimeout:..,setInterval:..,clearInterval:..,self:..,name:..,history:..,locationbar:..,menubar:..,personalbar:..,scrollbars:..,statusbar:..,toolbar:..,status:..,closed:..,frames:..,length:..,opener:..,parent:..,frameElement:..,navigator:..,external:..,applicationCache:..,screen:..,innerWidth:..,innerHeight:..,scrollX:..,pageXOffset:..,scrollY:..,pageYOffset:..,screenX:..,screenY:..,outerWidth:..,outerHeight:..,performance:..,caches:..,mozInnerScreenX:..,mozInnerScreenY:..,devicePixelRatio:..,scrollMaxX:..,scrollMaxY:..,fullScreen:..,mozPaintCount:..,onwheel:..,ondevicemotion:..,ondeviceorientation:..,ondeviceproximity:..,onuserproximity:..,ondevicelight:..,content:..,console:..,sidebar:..,windowState:..,browserDOMWindow:..,messageManager:..,crypto:..,onabort:..,onblur:..,onfocus:..,oncanplay:..,oncanplaythrough:..,onchange:..,onclick:..,oncontextmenu:..,ondblclick:..,ondrag:..,ondragend:..,ondragenter:..,ondragleave:..,ondragover:..,ondragstart:..,ondrop:..,ondurationchange:..,onemptied:..,onended:..,oninput:..,oninvalid:..,onkeydown:..,onkeypress:..,onkeyup:..,onload:..,onloadeddata:..,onloadedmetadata:..,onloadstart:..,onmousedown:..,onmouseenter:..,onmouseleave:..,onmousemove:..,onmouseout:..,onmouseover:..,onmouseup:..,onpause:..,onplay:..,onplaying:..,onprogress:..,onratechange:..,onreset:..,onresize:..,onscroll:..,onseeked:..,onseeking:..,onselect:..,onshow:..,onstalled:..,onsubmit:..,onsuspend:..,ontimeupdate:..,onvolumechange:..,onwaiting:..,onmozfullscreenchange:..,onmozfullscreenerror:..,onmozpointerlockchange:..,onmozpointerlockerror:..,indexedDB:..,onerror:..,onafterprint:..,onbeforeprint:..,onbeforeunload:..,onhashchange:..,onlanguagechange:..,onmessage:..,onoffline:..,ononline:..,onpagehide:..,onpageshow:..,onpopstate:..,onunload:..,localStorage:..,sessionStorage:..,STATE_MAXIMIZED:..,STATE_MINIMIZED:..,STATE_NORMAL:..,STATE_FULLSCREEN:..,mozScrollSnap:..,mozRequestOverfill:..,back:..,forward:..,home:..,openDialog:..,getInterface:..,controllers:..,realFrameElement:..,MozSelfSupport:..,_content:..,windowRoot:..,window:..,document:..,location:..,top:..,InstallTrigger:..,Application:.., }

console.log: Telex: onDomWindowReady
console.error: 
  Telex.onDomWindowReady failed: 
  Message: TypeError: domWindow.removeEventListener is not a function
  Stack:
    onDomWindowReady@resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///Users/kjozwiak/Library/Application%20Support/Firefox/Profiles/ziww90o6.default/extensions/serp-fraction-counts-beta42@experiments.mozilla.org.xpi!/bootstrap.js:121:5

console.log: Telex: shutdown
console.log: Telex: exit

1447051747193 addons.xpi-utils WARN Error: XPI database modified after shutdown began (resource://gre/modules/addons/XPIProvider.jsm -> resource://gre/modules/addons/XPIProviderUtils.js:451:17) JS Stack trace: this.XPIDatabase.saveChanges@XPIProviderUtils.js:451:17 < XPIDB_updateActiveAddons@XPIProviderUtils.js:1398:9 < XPI_shutdown@XPIProvider.jsm:2639:7 < _startProvider/AMProviderShutdown/<@AddonManager.jsm:843:21 < Promise@Promise-backend.js:385:5 < _startProvider/AMProviderShutdown@AddonManager.jsm:841:16 < Barrier/this.client.addBlocker/promise</trigger@AsyncShutdown.jsm:686:23 < Barrier.prototype<._wait@AsyncShutdown.jsm:830:7 < Barrier.prototype<.wait@AsyncShutdown.jsm:814:28 < AddonManagerInternal.shutdownManager<@AddonManager.jsm:1233:15
<====================================================================================>
Flags: needinfo?(6a68)
Hi Kamil!

Issue #1 isn't a problem.

The values submitted by Firefox are whole numbers, but Datadog's default graph averages over data in a given time range, rather than summing. If you zoom the graph on the fractional counts (click and drag around the single bar in the graph), you'll see that they actually split out into discrete whole number values.


Issue #2 is not ideal, but it's ok.

We want to exclude empty profiles, and the current error path technically does that. The major thing is to separate users with no visits from those with lots of visits and, say, 0.4% of searches with a given provider: this rounds down to 0, but it really means [0, 0.5). Low-resolution data is funny ^_^

I'll be able to compare the error count and the number of clients reporting total = 0 to identify any other errors that occur.


Issue #3 does include a small bug in the experiment code; I'll attach a fixed experiment in a second.

The first traceback only seems to refer to devtools and addons code. It looks like you did indeed time it so that FF quits after the download is finished, and while the install process is happening, nice work ^_^

The second traceback (after restart) contains an error in the experiment code, where we assume a window has `removeEventListener`. I've added a check for that function & I'll attach the updated unsigned experiment momentarily.

It looks like the second traceback also includes an addon manager error ("XPI database modified after shutdown began"). This looks to me like a legit bug in AddonManager, which you might want to file as a separate issue (AddonManager should detect shutdown and abort any in-flight installs).

Thanks very much! Updated experiment incoming.
Flags: needinfo?(6a68)
Attached file unsigned experiment for testing (obsolete) —
Here's the experiment with a fix for issue #3 from comment 75
Flags: needinfo?(kjozwiak)
Seems like we should push the start date by a week, so, I'll update the manifest to start on 16 November, instead of 9 November.
(In reply to Jared Hirsch [:_6a68] [NEEDINFO please!] from comment #78)
> Seems like we should push the start date by a week, so, I'll update the
> manifest to start on 16 November, instead of 9 November.

Hey, no need to update the start date, because what we do in practice is having a startDate that has already elapsed, when we push to production (so that the experiment can start immediately when pushed).


But while you're at it, could you add to the experiment code a check for these three prefs, and make sure that the experiment doesn't do anything (and, if possible, uninstalls itself -- but if that's complicated, not doing anything is enough):

datareporting.healthreport.service.enabled
datareporting.healthreport.uploadEnabled
toolkit.telemetry.enabled

(if any of these are false)


One last thing, can you drop a file called filter.js in the experiment folder, just like this one:
http://hg.mozilla.org/webtools/telemetry-experiment-server/rev/2286e26b93cc
Seems like the new XPI from comment # 77 is failing. It get's installed, but it never sends anything back to DataDog and fails to uninstall itself. Attached the log below via pastebin.

- OSX 10.11.1 x64 -> Reproduced
- Win 10 x64 (VM) -> Reproduced

* https://pastebin.mozilla.org/8851741
** errors appears on line #30
Flags: needinfo?(kjozwiak) → needinfo?(6a68)
I'm going to use the previous XPI to continue testing while Jared takes a look at the issue in comment #80.
Attached file unsigned experiment for testing, again (obsolete) —
Ouch, typoed `domWindow` as `donWindow`. Let's try that again...
Attachment #8685056 - Attachment is obsolete: true
Flags: needinfo?(6a68) → needinfo?(kjozwiak)
I'm planning to move the start date back a week, from 11/9/15 to 11/16/15.

This should give us time to finish the remaining testing and release steps.
Still having issues with the XPI from comment # 80, I've added the browser console dump below:

* https://pastebin.mozilla.org/8851751
Flags: needinfo?(kjozwiak) → needinfo?(6a68)
Apologies, meant comment # 82 above. The error occurs on line # 42 in the pastebin.
Here's an xpi with the `donWindow` typo _actually_ fixed
Attachment #8685112 - Attachment is obsolete: true
Flags: needinfo?(6a68)
Attachment #8685168 - Attachment description: unsigned experiment for texting (1447113505) → unsigned experiment for testing (1447113505)
Please take a look at comment 79 too
Flags: needinfo?(6a68)
Thanks, Felipe - I indeed missed comment 79.

I'll update the code to:
- uninstall itself if those prefs are set (not hard at all)
- include the filter.js file
- push the end date out a week, but leave the start date at Nov 9

I'll attach the updated patch (without logging), and I'll also attach the updated xpi (with logging, to ease testing).

I've added a test case to the testing etherpad (link in comment 55) to verify that the experiment doesn't run if any of those prefs are set to false.

Once Kamil has finished testing, I'll upload the xpi (without logging) and ping Jason to re-sign it with the AMO keys.
Flags: needinfo?(6a68)
Removing some single quotes that crept into the patch.

Sorry for the relentless bugspam!
Attachment #8685196 - Attachment is obsolete: true
Add missing filter.js
Attachment #8685199 - Attachment is obsolete: true
Attachment #8685168 - Attachment is obsolete: true
When disabling the prefs listed below, the experiment is still installed but doesn't ping back any data according to the debugging messages. The main issue that I see with this approach is that the experiment will leave a trace under about:addons and about:support. Users that have telemetry disabled might get the wrong impression and assume that we've installed an experiment and collected some data at some point, even though the experiment didn't ping anything back.

* datareporting.healthreport.service.enabled;false
* datareporting.healthreport.uploadEnabled;false
* toolkit.telemetry.enabled;false

In this situation, I think not installing the experiment would be a better approach. Felipe, thoughts?
Flags: needinfo?(felipc)
(In reply to Kamil Jozwiak [:kjozwiak] from comment #93)
> In this situation, I think not installing the experiment would be a better
> approach. Felipe, thoughts?

That would be ideal, but it's not doable at the moment, as to avoid installing the experiment we need to use the filter.js function, and the datareporting.* prefs are not available there. We intend to improve it in the future.
(the toolkit.telemetry.enabled pref is doable and will be filtered by the latest addition in comment 91)


Just checking, the trace that you mean in about:addons and about:support are of a disabled experiment, right? Because according to the code in there, if any these prefs are false, the experiment will deactivate itself.
Flags: needinfo?(felipc)
> the toolkit.telemetry.enabled pref is doable and will be filtered by the
> latest addition in comment 91

I had the patch applied from comment #91 and used the XPI from comment #92 and managed to install the experiment even though toolkit.telemetry.enabled;false. The default behaviour of the experiment is to install itself, ping data back to DataDog and uninstall itself quickly. With toolkit.telemetry.enabled;false, you get the same behaviour, the only difference is that the data isn't being sent back however the experiment is still being installed.

STR:
===

- install a clean fx
- go into about:config and change the following prefs:

* toolkit.telemetry.enabled;false
* devtools.chrome.enabled;true
* Services.prefs.setBoolPref("experiments.logging.dump", true);
* Services.prefs.setCharPref("experiments.logging.level", "Trace");
* Services.prefs.setCharPref("experiments.force-sample-value", ".1");
* Services.prefs.setBoolPref("xpinstall.signatures.required", false);

- once the above prefs are set, change "experiments.manifest.uri;http://kamiljozwiak.io/firefox-manifest.json" and you'll receive the following log in the browser console and the experiment will NOT be installed:

* 1447171984163	Browser.Experiments.Experiments	TRACE Experiments #0::updateManifest()

- close fx and re-open it (the experiment will still not be installed)
- open the browser console and paste the following:

Cu.import("resource:///modules/experiments/Experiments.jsm");
Experiments.instance().updateManifest();

The experiment will get installed, however it won't ping data back.

> Just checking, the trace that you mean in about:addons and about:support are
> of a disabled experiment, right? Because according to the code in there, if
> any these prefs are false, the experiment will deactivate itself.

Yup, the experiment appears as "disabled" in both about:addons and about:support. The only difference with setting those prefs to "false", is once the experiment is installed, it won't ping any data back.

It sounds like that's expected with "datareporting.healthreport.service.enabled;false" and "datareporting.healthreport.uploadEnabled;false". Is the above case that I mentioned also expected with toolkit.telemetry.enabled;false?
Flags: needinfo?(felipc)
Kamil and I figured out the case about toolkit.telemetry.enabled;false not preventing it from being installed. It was because the patch in comment 91 probably missed `hg add`ing the filter.js function. With it added it should work as expected.
Flags: needinfo?(felipc)
Jared, I'm almost done testing and only have a few things left. I also found an issue with the experiment counting the "https://www.bing.com/search?q={query}" hits. I'm assuming this will skew the results from Bing users?

Issue #1:

* https://www.bing.com/search?q={query} <-- these are currently being counted

These two search queries were counted and sent back to DataDog:

* https://www.bing.com/search?q=kamil&qs=n&form=QBLH&pq=kamil&sc=8-5&sp=-1&sk=&cvid=6722E79C95864CC69BCBFE83EC8CA398
* https://www.bing.com/search?q=firefox&qs=n&form=QBLH&pq=firefox&sc=9-7&sp=-1&sk=&ghc=1&cvid=D96C249A2CCE45BAAB00B7D815D56D98
** Telex: send:  Object { google: 0, yahoo: 0, bing: 2, total: 9 }

Items left to complete:

* need to investigate and create a new bug relating to the AddonManager detecting shutdown and abort any in-flight installs

* will need to retest the below prefs once the filter.js function is added as per comment # 96
** datareporting.healthreport.service.enabled;false
** datareporting.healthreport.uploadEnabled;false
** toolkit.telemetry.enabled;false
Flags: needinfo?(6a68)
While investigating the shutdown issue with the experiment mentioned in comment # 75 & comment # 76, I ran into the following crash:

* https://crash-stats.mozilla.com/report/index/daeb70e0-0f48-4bee-bb99-c00702151110
* https://crash-stats.mozilla.com/report/index/c23a7162-4c38-4698-9f13-d8ea52151110
* https://crash-stats.mozilla.com/report/index/ab616dca-75ea-4cdb-87b4-04ba32151110

STR (time sensitive... need to "command + q" at the right time during the installation process)

- install fx beta
- change all the correct prefs to get the experiment installed
- once the installation process for the experiment begins, quickly "command + q"

At this point, FX will become "non-responsive" and will end up crashing after about a minute. Is this related to the experiment code or is this an add-on manager/experiment architecture issue?
Hi Marco - This telemetry experiment aims to count the number of searches originating from Firefox UI (excluding search sessions that start with a visit to a search web page). It turns out that the Bing search result pages have the same initial URL part in both cases, and FF-based searches can only be identified by the `pc=MOZI` query string, which comes *after* the search term in the URL.

Is there a simple way to identify whether a given Places visit started from typing in the awesomebar or search bar? If not, can you think of an efficient way to modify this query to scan for a substring in the query string (I can't):

> SELECT url FROM moz_places
> WHERE rev_host BETWEEN "moc.gnib." AND "moc.gnib." || X'FFFF'
> AND url LIKE "%bing.com/search?q%" ESCAPE '\'
Flags: needinfo?(mak77)
Hey Kamil - If there's no easy, efficient way to differentiate types of Bing visits, I'll have to figure out if we just exclude Bing from our experiment, or if we further delay this nearly six-month-old project to look for inefficient or hard solutions, which will probably mean I'll have to pause on this till after Mozlando. Let's see what Marco has to say, hopefully there's a nice hack available :-)

As far as those crash reports, they exceed my Firefox neckbeard level:

The 'processor notes' say something about "non-integer value of SecondsSinceLastCrash", which suggests a bug in the actual crash reporter (as does the title of the page).

On the other hand, the "app notes" refers to a line in Experiments.jsm, and the Metadata tab's 'AsyncShutdownTimeout' trace seems to point to something hanging in the Experiment.jsm shutdown code. I don't see any reference to this search experiment in any of the traces, but again, I'm really not sure how to interpret the data.

Felipe, any thoughts on the meaning of the crash reports in comment 98?
Flags: needinfo?(6a68) → needinfo?(felipc)
I believe that non-responsive + crash is something to be investigated in another bug. There's ongoing work to reduce this async timeouts in bug 1218266, so maybe this STR will be fixed in that bug. Georg, do you want to take a look in comment 98?
Flags: needinfo?(felipc) → needinfo?(gfritzsche)
Attached file latest updated patch (obsolete) —
Hopefully adding filter.js successfully this time.

Kamil, let me know if this one works :-)
Attachment #8685218 - Attachment is obsolete: true
Flags: needinfo?(kjozwiak)
Regarding the Bing query: 

QA is nearly done, and tweaking the query would set us back even further.

I think we should just run the current experiment, being aware of the ambiguity in the query. Having some data will be better than having no data.

So, unsetting ni for Marco, and let's see if we can wrap up testing tomorrow, and aim to ship next week :-)
Flags: needinfo?(mak77)
> Hopefully adding filter.js successfully this time.

filter.js should go into "telemetry-experiment-server/experiments/serp-fraction-counts" rather than "telemetry-experiment-server/experiments/serp-fraction-counts/code"

I'll move it manually for testing but we should fix this before pushing the experiment into production.
Flags: needinfo?(kjozwiak)
I went through "toolkit.telemetry.enabled;false" with the filter.js function and everything worked correctly. Receved the following error message when attempting to install the experiment:

1447346295335 Browser.Experiments.Experiments TRACE Experiments #0::evaluateExperiments() - added EXPERIMENT_ACTIVATION to TelemetryLog: ["REJECTED","serp-fraction-counts-beta42@experiments.mozilla.org","jsfilter-false"]

- ensured the experiment continued to be rejected when moving the machines date/time ahead by one day
- ensured you couldn't install the experiment by forcing a manifest.json refresh via browser console

I also ensured that while the experiment is being rejected, once "endTime" has been reached, the following error is displayed in the browser console:

1448470277055 Browser.Experiments.Experiments TRACE Experiments #0::evaluateExperiments() - added EXPERIMENT_ACTIVATION to TelemetryLog: ["REJECTED","serp-fraction-counts-beta42@experiments.mozilla.org","endTime"]

That's pretty much it for the QA portion. I think I've covered all the bases. If someone could go through the results listed below to make sure nothing obvious was missed, it would be much appreciated!

Quick Summary:
==============

* created bug # 1224309 to address the issue mentioned in comment # 75. When we run into this situation, the experiment get's installed but never ends up pinging the database. The experiment doesn't disable itself until "endTime" has been reached.
* It would be a good idea if someone could take a look at the crashes in comment #98 and confirm that this isn't related to this specific experiment.
* Bing queries are being counted into the results as per comment # 97 (we're moving ahead either way as per comment # 103)
* filter.js needs to be moved into the correct directory as per comment # 104.

QA Results:
* https://public.etherpad-mozilla.org/p/ShareSearchesTelemetryExperimentQA
This looks great!

> That's pretty much it for the QA portion. I think I've covered all the
> bases. If someone could go through the results listed below to make sure
> nothing obvious was missed, it would be much appreciated!
> 
> Quick Summary:
> ==============
> 
> * created bug # 1224309 to address the issue mentioned in comment # 75. When
> we run into this situation, the experiment get's installed but never ends up
> pinging the database. The experiment doesn't disable itself until "endTime"
> has been reached.
> * It would be a good idea if someone could take a look at the crashes in
> comment #98 and confirm that this isn't related to this specific experiment.

The crash reports seem very closely related to the crashes uncovered by a different experiment (bug 1218266, mentioned in comment 101).

Georg hasn't responded to the ni? in a couple of days, I'm inclined to just move forward.

> * Bing queries are being counted into the results as per comment # 97 (we're
> moving ahead either way as per comment # 103)

Great.

> * filter.js needs to be moved into the correct directory as per comment #
> 104.

Cool, I'll fix that shortly.

Next steps:
- I'll upload a patch with filter.js in the right place, and remove the console.log calls
- I'll email release-drivers@ per the wiki[1]
- I'll upload an updated xpi, and ping Jason to sign it
- Once it's signed, I'll ping Felipe to merge the patch
- Once the patch is landed, and we have final approval for release, I'll reopen the release bug (bug 1214897)

[1] https://wiki.mozilla.org/Telemetry/Experiments#Final_Deployment
Based on the xpi files for the e10s-enabled-aurora and tile-switcher-nightly experiments, it looks like filter.js doesn't actually need to be in the xpi.

This xpi just contains bootstrap.js and install.rdf.

Kamil, mind taking one last look, to be sure everything works properly?
Attachment #8685223 - Attachment is obsolete: true
Flags: needinfo?(kjozwiak)
> Kamil, mind taking one last look, to be sure everything works properly?

I went through a quick spot check and it looks like everything is working correctly. The only concern is that I've noticed the following error messages in the browser console when testing "toolkit.telemetry.enabled;false" on OSX (didn't see these errors under the Win 10 VM(:

>> IndexedDB Unable to open IndexedDB database, schema is too high!: ActorsParent.cpp:4466 <unknown>
>> UnknownError <unknown>
>> UnknownError aboutHome.js:113:20

I'm not sure if these are related to the experiment.
Flags: needinfo?(kjozwiak) → needinfo?(6a68)
I don't think it's related to the experiment as I can't reproduce it on either platform now.. I'll let Jared take a look at the errors in comment # 109 and see what he thinks.
Ah yeah, I've run across those errors when doing totally unrelated work (though I'm guilty of ignoring them and not filing a bug...).

Looks like the affected line in aboutHome.js[1] is an indexedDB.open call (that passes in a schema version), which I'm guessing is the source of the IndexedDB error.

Good looking out, but I think we can safely cross that error off the list of concerns.

I'll update this bug with timelines once I hear back from release-drivers@.

[1] https://dxr.mozilla.org/mozilla-central/source/browser/base/content/abouthome/aboutHome.js#113-114
Flags: needinfo?(6a68)
In case it might be relevant: here are the bugs with telemetry-related patches aimed to land for beta 4 next Monday/Tuesday:  bug 1213780, bug 1211411, bug 1215540, and bug 1219751.
Attached file experiment.xpi (obsolete) —
relman has signed off, moving ahead for a release Monday/Tuesday with Beta 4.

Jason, could you please sign this updated version of the search telemetry experiment?
Attachment #8680985 - Attachment is obsolete: true
Flags: needinfo?(jthomas)
Hey Felipe - once the final experiment.xpi is signed, would you mind landing the patch and creating any needed release tags? I'm a mercurial novice and would hate to break something.

Also, bug 1214897 is the release bug. When should I reopen it? After the patch lands?

Thanks! - Jared
Flags: needinfo?(felipc)
Yeah, will do. So, when that patch lands, let's mark this bug as fixed and reopen the deployment bug.
Flags: needinfo?(felipc)
Attached file experiment.xpi
Please see attached.
Attachment #8687448 - Attachment is obsolete: true
Flags: needinfo?(jthomas)
landed:            https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/ce604a104dc2
added release_tag: https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/f8c39c544bf3

FWIW, I added in the manifest
> "minBuildID": "20151109145326",
to ensure that everyone who receives the experiment got the fix for bug 1220198.

Jared, you can mark the bug as resolved fixed now and reopen the one for deployment.

btw, just wanted to confirm that your desired "Maximum runtime" is just 1 day.
Thanks very much for landing the patch!

It seems like the experiment should run to completion within a few minutes, which is why I set the max runtime to a day. If that's not a great idea, I'm happy to change it to whatever you think would be more reasonable--or if it's fine, we'll just leave it as-is.

What's a good approach here?
Flags: needinfo?(felipc)
That sounds good. I didn't know if there's any situation in which this experiment needed to wait longer before completing, but it sounds like it doesn't. You can keep it as is.
Flags: needinfo?(felipc)
Awesome! In that case, closing this bug, and reopening the release bug.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to :Felipe Gomes from comment #101)
> I believe that non-responsive + crash is something to be investigated in
> another bug. There's ongoing work to reduce this async timeouts in bug
> 1218266, so maybe this STR will be fixed in that bug. Georg, do you want to
> take a look in comment 98?

Bug 1218266 is about those kinds of issues - i commented with the STR from comment 98 over there.
Flags: needinfo?(gfritzsche)
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: