Closed Bug 1469013 Opened 6 years ago Closed 6 years ago

t-yosemite-r7-187.test.releng.mdc2.mozilla.com. is unreachable

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dhouse, Assigned: van)

References

Details

Reboot t-yosemite-r7-187.test.releng.mdc2.mozilla.com. 10.51.56.78
Requested by mozilla-auth0/ad|Mozilla-LDAP|dhouse
Relops controller action failed:
2018-06-15T17:36:43.676991 ipmi ipmi_reset KeyError
2018-06-15T17:36:43.679659 ipmi ipmi_cycle KeyError
2018-06-15T17:36:49.708402 snmp_reboot pdu1.gc132.ops.releng.mdc2.mozilla.com ba2 CalledProcessError
I have manually confirmed this machine is not responding to ping or ssh.

I think we need someone in the data center to physically reboot this machine.
Van, do you know how I can request this mdc2 machine be rebooted? (or do I need for you or your team to file a request with the dc?)
Flags: needinfo?(vle)
:dhouse, sorry for delay. ill be on site next week to work on these minis. 

we can ask QTS to reboot machines, i need to give the mdc2 techs training first (planned for this trip.) do you have an account to access the QTS portal?
Flags: needinfo?(vle)
Reboot t-yosemite-r7-187.test.releng.mdc2.mozilla.com. 10.51.56.78
Requested by mozilla-auth0/ad|Mozilla-LDAP|dhouse
Relops controller action failed:
2018-06-25T21:16:41.482740 ssh_reboot -l roller -i ssh.key CommandError
2018-06-25T21:16:41.484879 ipmi ipmi_reset KeyError
2018-06-25T21:16:41.486415 ipmi ipmi_cycle KeyError
2018-06-25T21:16:47.504605 snmp_reboot pdu1.gc132.ops.releng.mdc2.mozilla.com ba2 CalledProcessError
Reboot t-yosemite-r7-187.test.releng.mdc2.mozilla.com. 10.51.56.78
Requested by mozilla-auth0/ad|Mozilla-LDAP|dhouse
Relops controller action failed:
2018-06-25T21:37:43.749849 ssh_reboot -l roller -i ssh.key CommandError
2018-06-25T21:37:43.754436 ipmi ipmi_reset KeyError
2018-06-25T21:37:43.761974 ipmi ipmi_cycle KeyError
2018-06-25T21:37:49.792862 snmp_reboot pdu1.gc132.ops.releng.mdc2.mozilla.com ba2 CalledProcessError
:van thank you.

I haven't seen or heard of the QTS portal before. But I'll ask in my team; maybe others have used it and can show it to me.

I was seeing about 20 macs in mdc2 that were not taking taskcluster jobs, but many of those rebooted from snmp and are looking okay now. Here are the problem ones in mdc2 that I think need physically kicked (I've tried snmp power cycling):
```
t-yosemite-r7-077.test.releng.mdc2.mozilla.com
t-yosemite-r7-119.test.releng.mdc2.mozilla.com
t-yosemite-r7-165.test.releng.mdc2.mozilla.com
t-yosemite-r7-187.test.releng.mdc2.mozilla.com
t-yosemite-r7-192.test.releng.mdc2.mozilla.com
t-yosemite-r7-196.test.releng.mdc2.mozilla.com
t-yosemite-r7-200.test.releng.mdc2.mozilla.com
```
cc'ing Danut for ciduty to be aware of these 7 "bad" macs in mdc2
Reboot t-yosemite-r7-187.test.releng.mdc2.mozilla.com. 10.51.56.78
Requested by mozilla-auth0/ad|Mozilla-LDAP|dhouse
Relops controller action failed:
2018-06-25T22:06:03.695626 ssh_reboot -l roller -i ssh.key CommandError
2018-06-25T22:06:03.698277 ipmi ipmi_reset KeyError
2018-06-25T22:06:03.700583 ipmi ipmi_cycle KeyError
2018-06-25T22:06:09.727417 snmp_reboot pdu1.gc132.ops.releng.mdc2.mozilla.com ba2 CalledProcessError
reimaged.
Assignee: server-ops-dcops → vle
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.