Closed
Bug 1130242
Opened 9 years ago
Closed 9 years ago
request for throughput data on the SCL3 ZLBs for the past 12 hours
Categories
(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)
Infrastructure & Operations Graveyard
WebOps: Product Delivery
x86
macOS
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: dcurado, Assigned: cliang)
References
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/550] )
Hi, this evening releng had trouble downloading images from ftp.mozilla.org and ftp-ssl.mozilla.org. They got in touch with netops as they thought perhaps it was a network problem. But they were seeing problems from releng slaves in SCL3 as well as slaves in AWS. I checked all network links within the data center and could not see a problem. This got me wondering about the ZLBs, as I know we have a license limitation. Can we get the throughput data? Thanks very much.
Assignee | ||
Comment 1•9 years ago
|
||
I don't know that we have throughput data in the format that you want. I usually end up looking at the ZLB graphs: * current activity: https://zlb1.ops.scl3.mozilla.com:9090/apps/zxtm/index.fcgi?section=Monitoring * historical activity: https://zlb1.ops.scl3.mozilla.com:9090/apps/zxtm/index.fcgi?section=Statd Current gives you some idea if we're (currently) running into the license limitation. The historical data doesn't give as clean an overview. That graph is drawn from statd logs; if you have something that can crunch through those, they can be found in zlb1.ops.scl3 zlb1.ops.scl3.mozilla.com:/usr/local/zeus/zxtm/log/statd/ Otherwise, the load balancers are supposed to write out to the logs if the bandwidth limit is hit. I went to each of the load balancers serving ftp.mozilla.org traffic and searched for "bandwidth" or "license" in the errors log (kept in /usr/local/zeus/zxtm/log/). I couldn't see any evidence that it had been triggered. (I did see some adjustments made to the ftp protection class on Dec. 4th.)
Reporter | ||
Comment 2•9 years ago
|
||
Thanks for getting back to me about this. Maybe we can take a step back on my request... It does not happen often, but now and then there is a lot of traffic hitting the ZLBs, and we may run into our license cap. People experience this as a "network problem", and come to the netops team. It would be quite helpful if we could look at the ZLB throughput and say, "ah ha, we're hitting our license cap" Possible? Thanks again.
Assignee | ||
Comment 3•9 years ago
|
||
Right now, the only way to catch this is to look at the current activity graph and do the maths and/or see if the event log has been triggered. I experimented briefly with having the ZLBs send email when if a bandwidth event got triggered but the email never came through. I'll try to resume testing next week (since it is a No Change Friday (tm)). If I'm successful, we can look at the best way for looping in NetOps / MOC.
Assignee | ||
Comment 4•9 years ago
|
||
I've done some tweaking and I think the load balancers can now be configured to alert if certain traffic license limits are reached. If you go to one of the load balancers and hit "Sytems" -> "Alerting" -> "Manage Actions", you can see what alerting options are available. Feel free to add an alerting method that works for you / netops; if you have more questions, etc. about trying to implement an alert, ask away. =) The event "group" <Traffic License Limit Problem> includes the following events: * tpslimited: License key transactions-per-second limit has been hit * ssltpslimited: License key SSL transactions-per-second limit has been hit * bwlimited: License key bandwidth limit has been hit Right now, the only alert that is sent out is an email to me (the alert <E-mail C>).
Reporter | ||
Comment 5•9 years ago
|
||
Could I ask you for a favor, and you can you add netops@mozilla.com to that email alert?
Assignee | ||
Comment 6•9 years ago
|
||
Set up a separate alert called <Notify Netops>. The <Traffic License Limit Problem> "group" should now trigger the <Notify Netops> alert in addition to <E-mail C>.
Reporter | ||
Comment 7•9 years ago
|
||
Closing this, as we'll now get email when we're hitting the ZLB throughput thresholds. Thanks!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 8•9 years ago
|
||
thanks! this is amazing.
Comment 9•9 years ago
|
||
<cyliang> nthomas, dcurado: Ugh. so, yes, confirmation that the log event doesn't isn't getting generated when we hit 2Gbps out. <cyliang> nthomas, dcurado: will look at seeing if we can trigger this from somewhere else (maybe graphite)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 10•9 years ago
|
||
If we can get the alert piped into #buildduty in IRC in addition to wherever it needs to go for IT, it will head-off a lot of digging when things are timing out in CI.
Assignee | ||
Comment 11•9 years ago
|
||
Closing this bug (which is a request for throughput logs) in favor of bug 1140489, which is explicitly about creating a Nagios check for bandwidth.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → INCOMPLETE
Updated•8 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•