Closed Bug 1141217 Opened 9 years ago Closed 9 years ago

nagios alerts for unassigned blocker bugs in all releng bugzilla components

Categories

(Infrastructure & Operations :: MOC: Service Requests, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: Usul)

Details

A blocker bug got filed on Friday morning, and the reporter didn't ping anyone outside of bugzilla (e.g. IRC) for many hours after that. 

We should have nagios alerts for unassigned blocker bugs in all Release Engineering components. These alerts should be sent to the #buildduty channel. buildduty can't always action each blocker, but we can probably find someone who can.

Alerting cadence should be 15 minutes.

Product: Release Engineering has the following components:
* Balrog: Backend
* Balrog: Frontend
* Buildduty
* General Automation
* Loan Requests
* Other
* Platform Support
* Release Automation
* Releases
* Release: Custom Builds
* Tools
This is what we currently do for releng :
nagios_blocker_checker.pl --product 'Release Engineering' --component 'Loan Requests' --severity any --any_warn 24 --any_alarm 24
Assignee: nobody → ludovic
so chagning products to all is pretty easy.

Changing where it alerts more complicated (at least for me ashish lights ?)
Flags: needinfo?(ashish)
Dave I might need your nagios-fu to send the alerts to the proper channel.
We want the severity any for that existing alert to stay in place, but we want the new alert to scream on blocker bugs in addition. ... to be clear
<ashish> Usul: for https://bugzilla.mozilla.org/show_bug.cgi?id=1141217 - put in a new nrpe command without --component
<ashish> just --product 'Release Engineering' and whatever severities
<ashish> also, setting contact_groups to "build" will alert #buildduty (see https://mana.mozilla.org/wiki/display/SYSADMIN/Nagios#Nagios-Selectingcontact_groups)
Flags: needinfo?(ashish)
[ludo@Oulanl modules]$ svn diff
Index: nagios/manifests/mozilla/checkcommands.pp
===================================================================
--- nagios/manifests/mozilla/checkcommands.pp	(revision 102047)
+++ nagios/manifests/mozilla/checkcommands.pp	(working copy)
@@ -314,6 +314,7 @@
         check_releaseslag                        => '$USER1$/check_nrpe -H $HOSTADDRESS$ -t 120 -c check_releaseslag',
         check_releasesrsynclag                   => '$USER1$/check_nrpe -H $HOSTADDRESS$ -t 120 -c check_releasesrsynclag',
         check_releng_loaner_bugs                 => '$USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -u -c check_releng_loaner_bugs',
+        check_releng_bugs                 => '$USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -u -c check_releng_bugs',
         check_ro_mounts                          => '$USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_ro_mounts -a $ARG1$',
         check_ro_mounts_exclude                  => '$USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_ro_mounts -a $ARG1$ $ARG2$',
         check_ro_mounts_exclude_types            => '$USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_ro_mounts -a $ARG1$ $ARG2$',
Index: nagios/manifests/mozilla/services.pp
===================================================================
--- nagios/manifests/mozilla/services.pp	(revision 102047)
+++ nagios/manifests/mozilla/services.pp	(working copy)
@@ -4624,6 +4624,26 @@
                 ]
             }
         },
+        'releng_bugs' => {
+            service_description => 'releng_bugs',
+            check_command => 'check_releng_bugs',
+            notification_options => 'w,c',
+            contact_groups => 'build',
+            notification_period  => 'usworkinghours',
+            normal_check_interval => '1',
+            retry_check_interval => '1',
+            max_check_attempts => '2',
+            flap_detection_enabled => '0',
+            notification_interval => '30',
+            stalking_options => 'w,c',
+            hostgroups => $::fqdn ? {
+                'nagios1.private.scl3.mozilla.com' => [
+                    'bug-checks'
+                ],
+                default => [
+                ]
+            }
+         },
 ## End bugs checks
         "desktop_servicenow-p1" => {
             service_description => "SN P1",
Index: nrpe/templates/nrpe.d/bug_queues.cfg.erb
===================================================================
--- nrpe/templates/nrpe.d/bug_queues.cfg.erb	(revision 102047)
+++ nrpe/templates/nrpe.d/bug_queues.cfg.erb	(working copy)
@@ -10,3 +10,4 @@
 command[check_serverops_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl server-ops@mozilla-org.bugs
 command[check_telecom_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl telecom@infra.bugs
 command[check_webops_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl --assignee server-ops-webops@mozilla-org.bugs --severity blocker
+command[check_releng_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl --product 'Release Engineering'
[ludo@Oulanl modules]$ svn commit -m 'changing bug alerts for releng per bug 1141217'
Sending        nagios/manifests/mozilla/checkcommands.pp
Sending        nagios/manifests/mozilla/services.pp
Sending        nrpe/templates/nrpe.d/bug_queues.cfg.erb
Transmitting file data ...
Committed revision 102048.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Of course you only want blockers so I changed :

[ludo@Oulanl modules]$ svn diff
Index: nrpe/templates/nrpe.d/bug_queues.cfg.erb
===================================================================
--- nrpe/templates/nrpe.d/bug_queues.cfg.erb	(revision 102048)
+++ nrpe/templates/nrpe.d/bug_queues.cfg.erb	(working copy)
@@ -10,4 +10,4 @@
 command[check_serverops_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl server-ops@mozilla-org.bugs
 command[check_telecom_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl telecom@infra.bugs
 command[check_webops_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl --assignee server-ops-webops@mozilla-org.bugs --severity blocker
-command[check_releng_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl --product 'Release Engineering'
+command[check_releng_bugs]=sudo /data/bugzilla/www/bugzilla.mozilla.org/contrib/nagios_blocker_checker.pl --product 'Release Engineering' --severity blocker
[ludo@Oulanl modules]$ svn commit -m 'changing bug alerts for releng per bug 1141217'
Sending        nrpe/templates/nrpe.d/bug_queues.cfg.erb
Transmitting file data .
Committed revision 102050.
You need to log in before you can comment on or make changes to this bug.