Closed Bug 1388130 Opened 7 years ago Closed 7 years ago

unify crontabber.jobs configuration across environments

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(1 file)

-prod, -stage, and DEFAULT_JOBS in the code (which is what we use in the docker-based dev environment) all have different values for crontabber.jobs.

This bug covers unifying them probably by setting DEFAULT_JOBS to the appropriate value, then removing the configuration value from -prod and -stage.
What we have currently is this:

    crontabber.jobs in -prod (22 lines):
     
    socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp|7d,
    socorro.cron.jobs.matviews.ProductVersionsCronApp|1d|05:00,
    socorro.cron.jobs.matviews.SignaturesCronApp|1d|05:00,
    socorro.cron.jobs.matviews.RawUpdateChannelCronApp|1d|05:00,
    socorro.cron.jobs.matviews.ADUCronApp|1d|08:30,
    socorro.cron.jobs.matviews.DuplicatesCronApp|1h,
    socorro.cron.jobs.matviews.ReportsCleanCronApp|1h,
    socorro.cron.jobs.bugzilla.BugzillaCronApp|1h,
    socorro.cron.jobs.matviews.BuildADUCronApp|1d|08:30,
    socorro.cron.jobs.matviews.AndroidDevicesCronApp|1d|05:00,
    socorro.cron.jobs.matviews.GraphicsDeviceCronApp|1d|05:00,
    socorro.cron.jobs.matviews.CrashAduByBuildSignatureCronApp|1d|08:30,
    socorro.cron.jobs.ftpscraper.FTPScraperCronApp|1h,
    socorro.cron.jobs.elasticsearch_cleanup.ElasticsearchCleanupCronApp|7d,
    socorro.cron.jobs.drop_old_partitions.DropOldPartitionsCronApp|7d,
    socorro.cron.jobs.truncate_partitions.TruncatePartitionsCronApp|7d,
    socorro.cron.jobs.clean_raw_adi_logs.CleanRawADILogsCronApp|1d,
    socorro.cron.jobs.clean_raw_adi.CleanRawADICronApp|1d,
    socorro.cron.jobs.clean_missing_symbols.CleanMissingSymbolsCronApp|1d,
    socorro.cron.jobs.missingsymbols.MissingSymbolsCronApp|1d,
    socorro.cron.jobs.featured_versions_automatic.FeaturedVersionsAutomaticCronApp|1h,
    socorro.cron.jobs.upload_crash_report_json_schema.UploadCrashReportJSONSchemaCronApp|1h
     
     
    crontabber.jobs in -stage (23 lines):
     
    socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp|7d,
    socorro.cron.jobs.matviews.ProductVersionsCronApp|1d|05:00,
    socorro.cron.jobs.matviews.SignaturesCronApp|1d|05:00,
    socorro.cron.jobs.matviews.RawUpdateChannelCronApp|1d|05:00,
    socorro.cron.jobs.matviews.ADUCronApp|1d|08:30,
    socorro.cron.jobs.matviews.DuplicatesCronApp|1h,
    socorro.cron.jobs.matviews.ReportsCleanCronApp|1h,
    socorro.cron.jobs.bugzilla.BugzillaCronApp|1h,
    socorro.cron.jobs.matviews.BuildADUCronApp|1d|08:30,
    socorro.cron.jobs.matviews.AndroidDevicesCronApp|1d|05:00,
    socorro.cron.jobs.matviews.GraphicsDeviceCronApp|1d|05:00,
    socorro.cron.jobs.matviews.CrashAduByBuildSignatureCronApp|1d|08:30,
    socorro.cron.jobs.ftpscraper.FTPScraperCronApp|1h,
    socorro.cron.jobs.elasticsearch_cleanup.ElasticsearchCleanupCronApp|7d,
    socorro.cron.jobs.drop_old_partitions.DropOldPartitionsCronApp|7d,
    socorro.cron.jobs.truncate_partitions.TruncatePartitionsCronApp|7d,
    socorro.cron.jobs.featured_versions_automatic.FeaturedVersionsAutomaticCronApp|1h,
    socorro.cron.jobs.clean_raw_adi_logs.CleanRawADILogsCronApp|1d,
    socorro.cron.jobs.clean_raw_adi.CleanRawADICronApp|1d,
    socorro.cron.jobs.fetch_adi_from_hive.FAKEFetchADIFromHiveCronApp|1d,
    socorro.cron.jobs.clean_missing_symbols.CleanMissingSymbolsCronApp|1d,
    socorro.cron.jobs.missingsymbols.MissingSymbolsCronApp|1d,
    socorro.cron.jobs.upload_crash_report_json_schema.UploadCrashReportJSONSchemaCronApp|1h
     
     
     
    DEFAULT_JOBS (17 lines):
     
    socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp|7d
    socorro.cron.jobs.matviews.ProductVersionsCronApp|1d|05:00
    socorro.cron.jobs.matviews.SignaturesCronApp|1d|05:00
    socorro.cron.jobs.matviews.RawUpdateChannelCronApp|1d|05:00
    socorro.cron.jobs.matviews.ADUCronApp|1d|07:30
    socorro.cron.jobs.fetch_adi_from_hive.FetchADIFromHiveCronApp|1d|07:00
    socorro.cron.jobs.matviews.DuplicatesCronApp|1h
    socorro.cron.jobs.matviews.ReportsCleanCronApp|1h
    socorro.cron.jobs.bugzilla.BugzillaCronApp|1h
    socorro.cron.jobs.matviews.BuildADUCronApp|1d|07:30
    socorro.cron.jobs.matviews.AndroidDevicesCronApp|1d|05:00
    socorro.cron.jobs.matviews.GraphicsDeviceCronApp|1d|05:00
    socorro.cron.jobs.matviews.CrashAduByBuildSignatureCronApp|1d|07:30
    socorro.cron.jobs.ftpscraper.FTPScraperCronApp|1h
    socorro.cron.jobs.elasticsearch_cleanup.ElasticsearchCleanupCronApp|7d
    socorro.cron.jobs.drop_old_partitions.DropOldPartitionsCronApp|7d
    socorro.cron.jobs.truncate_partitions.TruncatePartitionsCronApp|7d 


We want to codify value from crontabber.jobs from -prod into DEFAULT_JOBS.
The whole hack-around with fetching ADI from Hive is the root of the difference. There used to be others but I believe that's gone now (we used to pull "featured versions" from Prod's PG when that was manually maintained by superusers on crash-stats.mozilla.com). 

I propose we leave that (the fact that `fetch_adi_from_hive.FetchADIFromHiveCronApp` only runs in stage)  as is. 

The alternative solution is to run 2 crontabber instances on SCL3. One that writes to Prod's PG and one that writes to Stage's PG. Then there'd be no reason to make a difference. 
But this solution is kinda scary since we have so little "control" over how crontabber is run there. (I say control in quotation marks because we do have sudo root but we don't have our consulate, monitoring or our regular deployment infra bits. It's a mess)

Also, if we reign in all prod's jobs and put them back into the source code (aka. DEFAULT_JOBS [0]) then 
we lose the ability to make differences between stage and prod. Noble cause but technically not something we can do until we completely do away with ADI. 

Another option might be to extend crontabber with a new config flag. Call it "additional_jobs". (Internally crontabber would just do `jobs = jobs UPDATE additional_jobs` or something. Adding features to crontabber isn't hard. We maintain it. It has test coverage and stuff. 


[0] https://github.com/mozilla-services/socorro/blob/3acddab7b57522a1b1c19fe5c712d66b62ce3c18/socorro/cron/crontabber_app.py#L39
Mmm... None of those options are easy.

I'll leave this as is for now and have it block on the ADI bug because if we ditch ADI, then the causes for complexity here go away.
Blocks: 1369498
I copied the -prod value to DEFAULT_JOBS because we need it for a functional local -dev environment. That doesn't fix the issue here, but alleviates my current blocking problem where we can't run jobs that aren't in the list.
Commit pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/59acd8ab4594c4b5d7fcc34afa0f2dd6143e8ac2
bug 1388130 - copy -prod crontabber jobs to DEFAULT_JOBS (#3894)

This syncs the crontabber.jobs value that we have in -prod with DEFAULT_JOBS
enabling us to run all the jobs we're running in -prod in local dev
environments.
Switching the blocks/depends-on. I keep doing the wrong one.
No longer blocks: 1369498
Depends on: 1369498
Grabbing this to work on again.

-prod and local dev environment are identical now.

-prod and -stage differ in that -stage has this additional line:

    socorro.cron.jobs.fetch_adi_from_hive.FAKEFetchADIFromHiveCronApp|1d,

Our current issues all go away when we drop ADI, so I think it's ok to do a "temporary hack solution".

We have a new -stage-new environment we need to think about. -stage-new needs to be like -stage, but instead of the FAKEFetchADIFromHiveCronApp, it'll need to run the app that bug #1407655 results in.


The jobs configuration is defined in the crontabber library and (more importantly), it's heavily used. We either need to have a solution that conforms to that usage or we need to fork/vendor the project and bend it to our ways.

Definition is here:

https://github.com/mozilla/crontabber/blob/57c463a521cdd2f9db4ae3c1104559abf88d1607/crontabber/app.py#L679-L688

Usage is throughout that file as "config.crontabber.jobs".


Given all that, I propose something like the following:

1. keep the DEFAULT_JOBS configuration as it is which matches -prod
2. remove the configuration variable in -prod -- it's the default so we don't need it anymore
3. override the "jobs" config variable and change it to take a Python dotted path as a value

Items 1 and 2 are straight forward, so let's talk about 3 a bit.

The Python dotted path would point to a list of jobs to run. Possible values would be something like:

* "socorro.crontabber.DEFAULT_JOBS"
* "socorro.crontabber.STAGE_JOBS"
* "socorro.crontabber.STAGE_NEW_JOBS"

Each of those would be a list of the jobs for that environment. In -stage, we'd have this configuration variable (mixing consulate prefixes with variable keys and such):

   socorro/crontabber/crontabber.jobs=socorro.crontabber.STAGE_JOBS

I think that'll work, it's minimally messy, it's flexible, and we can remove a bunch of it when we drop ADI.

I'm going to work on this today because it blocks -stage-new progress.
Assignee: nobody → willkg
Blocks: 1406019
Status: NEW → ASSIGNED
Commit pushed to master at https://github.com/mozilla-services/socorro

https://github.com/mozilla-services/socorro/commit/97fee909bbe463d1cbfd559dd9b9cc3726970af2
fixes bug 1388130 - redo crontabber jobs configuration (#4207)

This redoes crontabber jobs configuration such that it can take a Python dotted
path to a string with the configuration.

In order to do that, we also had to redo things so that CronTabberApp was a
valid socorro app. This means it now works with the socorro script.

This updates the related docker scripts to use the new way to run it.

In doing all that, I discovered the tests weren't consistent in how they set up
a config manager. This fixes tests, too.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
PR #4207 landed. So next steps are these:

1. Wait for it to deploy to -stage and make sure crontabber still works.

2. Change the -stage configuration:

   consulate kv set socorro/crontabber/crontabber.jobs socorro.cron.crontabber_app.STAGE_JOBS

3. Wait for that to kick in and make sure crontabber still works.

4. Deploy to -prod.

5. Verify crontabber works.

6. Remove the crontabber.jobs configuration:

   consulate kv rm socorro/crontabber/crontabber.jobs


That should be it.
I did steps 1 through 3 and crontabber is working on stage with the new configuration setting. Yay!

I'll do the rest when the changes get to -prod.
We deployed to -prod.

I updated the admin node, verified that crontabber still works fine, removed the crontabber.jobs key, verified that crontabber continues to work and had the same set of jobs to run and especially not FAKEFetchADIFromHiveCronApp.

Everything looks super!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: