Closed
Bug 1252248
Opened 8 years ago
Closed 8 years ago
Add more tst-linux64, tst-emulator64 capacity in AWS
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: vciobancai)
References
Details
Attachments
(9 files, 3 obsolete files)
50.38 KB,
text/plain
|
kmoir
:
review+
vciobancai
:
checked-in+
|
Details |
26.06 KB,
text/plain
|
kmoir
:
review+
vciobancai
:
checked-in+
|
Details |
703 bytes,
patch
|
kmoir
:
review+
vciobancai
:
checked-in+
|
Details | Diff | Splinter Review |
53 bytes,
text/x-github-pull-request
|
kmoir
:
review+
|
Details | Review |
4.62 KB,
patch
|
kmoir
:
review+
vciobancai
:
checked-in+
|
Details | Diff | Splinter Review |
1.02 KB,
patch
|
kmoir
:
review+
vciobancai
:
checked-in+
|
Details | Diff | Splinter Review |
259 bytes,
text/plain
|
kmoir
:
review+
vciobancai
:
checked-in+
|
Details |
1017 bytes,
patch
|
dustin
:
review+
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
1.80 KB,
patch
|
aselagea
:
review+
|
Details | Diff | Splinter Review |
We're seeing significant wait times on tst-linux64 and tst-emulator64 platforms. We should add more machines / masters / subnets as appropriate to increase each pool by about 25%. This translates into 500 more tst-linux64 machines and 250 more tst-emulator64 machines. See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1090568 https://bugzilla.mozilla.org/show_bug.cgi?id=1143901 https://bugzilla.mozilla.org/show_bug.cgi?id=1204756
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → vlad.ciobancai
Assignee | ||
Comment 1•8 years ago
|
||
Attached you can find the csv file for tst-linux64 where I added 250 spot instances in us-east-1 AZ and the rest of them in us-west-2 AZ
Attachment #8725176 -
Flags: review?(kmoir)
Assignee | ||
Comment 2•8 years ago
|
||
Created the following pull request https://github.com/mozilla/build-cloud-tools/pull/181 but failed, can somebody help me why ? The change update the watch_pending.cfg file to increase the tst-linux64
Assignee | ||
Comment 3•8 years ago
|
||
Updated the csv file
Attachment #8725176 -
Attachment is obsolete: true
Attachment #8725176 -
Flags: review?(kmoir)
Attachment #8725186 -
Flags: review?(kmoir)
Assignee | ||
Comment 4•8 years ago
|
||
Attached you can find the csv file for tst-emulator64-spot where I added 125 spot instances in us-east-1 AZ and the rest of them in us-west-2 AZ
Attachment #8725187 -
Flags: review?(kmoir)
Assignee | ||
Comment 5•8 years ago
|
||
Attached you can find the production_config.py updated for both slave types (tst-emulator and tst-linux64)
Attachment #8725189 -
Flags: review?(kmoir)
Comment 6•8 years ago
|
||
For the cloud tools github pull request, there seems to be an issue with travis/tox that is causing the failure. If I clone your repo into my local docker instance and run tox, the tests pass. Not sure what is happening there, I'm looking. As for that pull request, you will need to add additional subnets to account for the increase in the number of machines we are going to add so we don't run out of ip addresses. Bug 1165432 has an example of this change. You will also need to look at the master load for the existing masters that serve these instance types and determine if we need to add more masters since we are significantly increasing the amount of instances that will attach to them.
Updated•8 years ago
|
Attachment #8725189 -
Flags: review?(kmoir) → review+
Updated•8 years ago
|
Attachment #8725187 -
Flags: review?(kmoir) → review+
Updated•8 years ago
|
Attachment #8725186 -
Flags: review?(kmoir) → review+
Assignee | ||
Comment 7•8 years ago
|
||
Created the following pull request in order to add new subnet https://github.com/mozilla/build-cloud-tools/pull/182
Comment 8•8 years ago
|
||
You also need to update configs/tst-linux64 and configs/tst-emulator64 to include the new subnets as appropriate see https://bugzilla.mozilla.org/show_bug.cgi?id=1165432#c1
Assignee | ||
Comment 9•8 years ago
|
||
(In reply to Kim Moir [:kmoir] from comment #8) > You also need to update configs/tst-linux64 and configs/tst-emulator64 to > include the new subnets as appropriate > > see > https://bugzilla.mozilla.org/show_bug.cgi?id=1165432#c1 From what I understood first the subnets.yml needs to be updated and after that the script needs to be run. First I wanted to be sure that the new entries that I added in subnets.yml are OK. @Kim please let me know if the entries are OK in order to run the script
Flags: needinfo?(kmoir)
Comment 10•8 years ago
|
||
answered questions on github pull request
Assignee | ||
Comment 11•8 years ago
|
||
Added the pull request https://github.com/mozilla/build-cloud-tools/pull/183 for example
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(kmoir)
Assignee | ||
Comment 12•8 years ago
|
||
The following subnets have been created in us-west-2 without any issue : subnet-47a8b830, subnet-797f5120,subnet-330bf457 and subnet-5fa8b828 but when the script tried to create in us-east-1 I'm receiving the following error: 2016-03-03 00:12:37,486 - 10.132.60.0/22 - IPSet(['10.132.60.0/22']) isn't covered by any subnets 2016-03-03 00:12:37,487 - creating subnet 10.132.63.0/24 in us-east-1a/vpc-b42100df (y/N) y 2016-03-03 00:12:48,157 - creating subnet 2016-03-03 00:12:48,306 - 400 Bad Request 2016-03-03 00:12:48,306 - <?xml version="1.0" encoding="UTF-8"?> <Response><Errors><Error><Code>InvalidSubnet.Range</Code><Message>The CIDR '10.132.63.0/24' is invalid.</Message></Error></Errors><RequestID>8cc8c5da-a3f2-42e5-8d42-1d38321034c6</RequestID></Response> Traceback (most recent call last): File "scripts/aws_manage_subnets.py", line 121, in <module> main() File "scripts/aws_manage_subnets.py", line 117, in main sync_subnets(conn, config[region]) File "scripts/aws_manage_subnets.py", line 94, in sync_subnets s = conn.create_subnet(vpc_id, c, z.name) File "/builds/aws_manager/lib/python2.7/site-packages/boto/vpc/__init__.py", line 1166, in create_subnet return self.get_object('CreateSubnet', params, Subnet) File "/builds/aws_manager/lib/python2.7/site-packages/boto/connection.py", line 1177, in get_object raise self.ResponseError(response.status, response.reason, body) boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request <?xml version="1.0" encoding="UTF-8"?> <Response><Errors><Error><Code>InvalidSubnet.Range</Code><Message>The CIDR '10.132.63.0/24' is invalid.</Message></Error></Errors><RequestID>8cc8c5da-a3f2-42e5-8d42-1d38321034c6</RequestID></Response>
Assignee | ||
Comment 13•8 years ago
|
||
Found the issue, created another pull request to resolve it https://github.com/mozilla/build-cloud-tools/pull/184
Assignee | ||
Comment 14•8 years ago
|
||
Created the following subnets - us-east-1 : subnet-2e49df58, subnet-68882f42, subnet-8dd818e8 and subnet-519b3209 - us-west-2 : subnet-47a8b830, subnet-797f5120, subnet-330bf457 and subnet-5fa8b828
Attachment #8726233 -
Flags: review?(kmoir)
Updated•8 years ago
|
Attachment #8726233 -
Flags: review?(kmoir) → review+
Assignee | ||
Updated•8 years ago
|
Attachment #8725189 -
Flags: checked-in+
Assignee | ||
Updated•8 years ago
|
Attachment #8725186 -
Flags: checked-in+
Assignee | ||
Updated•8 years ago
|
Attachment #8725187 -
Flags: checked-in+
Assignee | ||
Comment 15•8 years ago
|
||
Slaves added in slavealloc DB
Assignee | ||
Comment 16•8 years ago
|
||
Attached you can find the production masters json file with the new two buildbot masters
Attachment #8726262 -
Flags: review?(kmoir)
Assignee | ||
Comment 17•8 years ago
|
||
Attached you can find the moco-nodes file with the new two buildbotmasters
Attachment #8726263 -
Flags: review?(kmoir)
Updated•8 years ago
|
Attachment #8726263 -
Flags: review?(kmoir) → review+
Updated•8 years ago
|
Attachment #8726262 -
Flags: review?(kmoir) → review+
Assignee | ||
Updated•8 years ago
|
Attachment #8726262 -
Flags: checked-in+
Assignee | ||
Updated•8 years ago
|
Attachment #8726263 -
Flags: checked-in+
Assignee | ||
Comment 18•8 years ago
|
||
Created two new buildbot-masters - buildbot-master130.bb.releng.use1.mozilla.com - buildbot-master131.bb.releng.usw2.mozilla.com Both of them have been added in inventory
Assignee | ||
Comment 19•8 years ago
|
||
Attached you can find the csv for both buildbot-masters in order to be inserted in slavealloc DB
Attachment #8726644 -
Flags: review?(kmoir)
Assignee | ||
Comment 20•8 years ago
|
||
At this step https://wiki.mozilla.org/ReleaseEngineering/How_To/Setup_buildbot_masters_in_AWS#IT a bug needs to be created to add both masters in nagios. The example is deprecated and I was not able to find the component to create the bug. Amy can you please help me with the details in order to create a new bug ?
Flags: needinfo?(arich)
Updated•8 years ago
|
Attachment #8726644 -
Flags: review?(kmoir) → review+
Assignee | ||
Updated•8 years ago
|
Attachment #8726644 -
Flags: checked-in+
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(arich)
Assignee | ||
Comment 21•8 years ago
|
||
(In reply to Vlad Ciobancai [:vladC] from comment #20) > At this step > https://wiki.mozilla.org/ReleaseEngineering/How_To/ > Setup_buildbot_masters_in_AWS#IT a bug needs to be created to add both > masters in nagios. The example is deprecated and I was not able to find the > component to create the bug. > > Amy can you please help me with the details in order to create a new bug ? kmoir helped with a recent example, bug 1207411. I updated also the wiki page
Comment 22•8 years ago
|
||
I reverted the subnet changes because there need to be netops changes so devices there can connect to our scl3 https://github.com/mozilla/build-cloud-tools/pull/187 from #releng 2:20 PM <catlee> philor, kmoir: pretty sure now that instances in the new subnets can't talk to all the things they need to in scl3 2:21 PM <dustin> catlee: yep 2:21 PM <catlee> https://irccloud.mozilla.com/pastebin/0kV1V7Aj Plain Text • 6 lines raw | line numbers 2:21 PM <catlee> dustin: is that a quick change, or no? should we backout cloud-tools for now? 2:21 PM <kmoir> huh I didn't see that we needed that change in the older bugs for adding new subnets 2:22 PM <dustin> it's not quick 2:22 PM <dustin> bug for netops, update SG's 2:22 PM <kmoir> I can backout new subnets in cloud tools 2:23 PM <catlee> thanks 2:23 PM <catlee> and I guess kill off instances in those regions 2:23 PM <catlee> I can try that 2:24 PM <dustin> kmoir: is this just adding more test subnets? 2:24 PM <dustin> or are they different than other subnets? 2:25 PM <kmoir> just new test subnets 2:28 PM <dustin> ok, so the firewall-tests change should just be in network.py to add the new subnets 2:28 PM ⇐ JoeS quit (Thunderbird@moz-ldrur9.east.verizon.net) Client exited 2:29 PM <dustin> huh, I appear not to have fw access anymore 2:29 PM <dustin> anyway, if you file a netops bug asking them to add <new IP ranges> to the address sets containing <old IP ranges> on both fw1.scl3 and fw1.releng.scl3, that should do the trick 2:30 PM <kmoir> okay 2:30 PM <dustin> you'll also need to add the new subnets to https://github.com/mozilla/build-cloud-tools/blob/master/configs/securitygroups.yml 2:30 PM <dustin> and commit that
Comment 23•8 years ago
|
||
Vladc: looking at the changes I reverted, I'm unsure why we changed an ip range instead of adding one ie. replacing 10.132.60.0/22 with 10.134.60.0/22: https://github.com/mozilla/build-cloud-tools/commit/88e7a12efa5994e4c6d3b846aaed55d72648e325 shouldn't we be adding new ip ranges?
Flags: needinfo?(vlad.ciobancai)
Comment 24•8 years ago
|
||
I started a doc that describes the steps to add new subnets because I couldn't find one and it seems it's something that has tripped us up since it occurs rarely. https://wiki.mozilla.org/ReleaseEngineering/How_To/Add_New_AWS_Subnets Please update with the final steps as needed.
Assignee | ||
Comment 25•8 years ago
|
||
(In reply to Kim Moir [:kmoir] from comment #23) > Vladc: looking at the changes I reverted, I'm unsure why we changed an ip > range instead of adding one > > ie. > > replacing 10.132.60.0/22 with 10.134.60.0/22: > > https://github.com/mozilla/build-cloud-tools/commit/ > 88e7a12efa5994e4c6d3b846aaed55d72648e325 > > shouldn't we be adding new ip ranges? Kim when I created this change https://github.com/mozilla/build-cloud-tools/pull/183 I made a mistake in subnets.yml by adding this CIDR 10.132.60.0/22 in us-east-1. This ip range is wrong because in us-east-1 the ip class is 10.134.X.X . In order to resolve the issue I created another pull request https://github.com/mozilla/build-cloud-tools/pull/184 to resolve the IP class, by changing from 10.132.60.0/22 to 10.134.60.0/22. For more details please view comment #12 and #13
Flags: needinfo?(vlad.ciobancai)
Comment 26•8 years ago
|
||
patch for firewall-tests once netops bug is resolved
Assignee | ||
Comment 27•8 years ago
|
||
I tested on fwunit1 the patch that Kim wrote and the tests still failing
Assignee | ||
Comment 28•8 years ago
|
||
Created the following pull request https://github.com/mozilla/build-cloud-tools/pull/189 to add the new subnets in security file
Comment 29•8 years ago
|
||
merged pull request
Assignee | ||
Comment 30•8 years ago
|
||
I run again the test and still failing
Assignee | ||
Comment 31•8 years ago
|
||
The issue was the security group have not been updated. I run the script to create the security groups. Dustin run a manually update for firewalls and the test that Kim wrote worked without any issue. I updated also the wiki page with the steps.
Assignee | ||
Comment 32•8 years ago
|
||
Created this pull request https://github.com/mozilla/build-cloud-tools/pull/193 in order to add the new subnets for tst-linux64 and tst-emulator64. This pull request will be pushed in to the production only when the masters are OK
Comment 33•8 years ago
|
||
Attachment #8727546 -
Attachment is obsolete: true
Attachment #8729197 -
Flags: review?(dustin)
Updated•8 years ago
|
Attachment #8729197 -
Flags: review?(dustin) → review+
Updated•8 years ago
|
Attachment #8729197 -
Flags: checked-in+
Comment 34•8 years ago
|
||
patch to enable new masters
Attachment #8730699 -
Flags: review?(alin.selagea)
Comment 35•8 years ago
|
||
Comment on attachment 8730699 [details] [diff] [review] bug1252248tools.patch Looks good. Noticed that you disabled 'linux64-tsan' for bm130 but left it unchanged for bm131. Is this intended?
Attachment #8730699 -
Flags: review?(alin.selagea) → review+
Comment 36•8 years ago
|
||
no that was not my intention, my Eclipse editor was dirty and the other change was not saved
Attachment #8730699 -
Attachment is obsolete: true
Attachment #8730730 -
Flags: review?(alin.selagea)
Updated•8 years ago
|
Attachment #8730730 -
Flags: review?(alin.selagea) → review+
Updated•8 years ago
|
Attachment #8730699 -
Flags: checked-in+
Comment 37•8 years ago
|
||
I've enabled the two new masters and merged the patch with the new subnets. If this looks okay we can enable the new machines in slavelloc.
Comment 38•8 years ago
|
||
The new machines enabled in slavealloc
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•