Closed
Bug 1066765
Opened 10 years ago
Closed 10 years ago
please run hardware diagnostics on foopy64 and reimage
Categories
(Infrastructure & Operations :: DCOps, task)
Infrastructure & Operations
DCOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Unassigned)
References
Details
(Whiteboard: iX Systems RMA Case ID #AMA-717-88012)
Attachments
(1 file)
733.43 KB,
image/jpeg
|
Details |
This is having a bunch of problems translatable to the attached pandas, lets run diags first.
Updated•10 years ago
|
colo-trip: --- → scl3
Comment 2•10 years ago
|
||
memtest found 0 errors after 13 passes, running Western Digital's full media scan on hdd.
Comment 4•10 years ago
|
||
host back up. sals-MacBook-Pro-3:~ sal$ sudo fping 10.26.19.126 10.26.19.126 is alive sals-MacBook-Pro-3:~ sal$ sudo fping 10.26.131.21 10.26.131.21 is alive sals-MacBook-Pro-3:~ sal$ ssh !$ ssh 10.26.131.21 The authenticity of host '10.26.131.21 (10.26.131.21)' can't be established. RSA key fingerprint is 0f:fb:74:e6:23:32:2b:30:ca:e6:4c:b2:f7:97:f5:26. Are you sure you want to continue connecting (yes/no)?
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 5•10 years ago
|
||
It's still showing load issues. Do we have more invasive disk diags we can run (or other diags)?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 6•10 years ago
|
||
Oh, heck yeah, we definitely have drive issues. There are multiple: Sep 30 18:27:53 foopy64 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 30 18:27:53 foopy64 kernel: ata1.00: irq_stat 0x40000001 Sep 30 18:27:53 foopy64 kernel: ata1.00: failed command: FLUSH CACHE EXT Sep 30 18:27:53 foopy64 kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Sep 30 18:27:53 foopy64 kernel: res 51/04:00:38:df:f7/00:00:00:00:00/a7 Emask 0x1 (device error) Sep 30 18:27:53 foopy64 kernel: ata1.00: status: { DRDY ERR } Sep 30 18:27:53 foopy64 kernel: ata1.00: error: { ABRT } Sep 30 18:27:53 foopy64 kernel: ata1.00: configured for UDMA/33 Sep 30 18:27:53 foopy64 kernel: ata1: EH complete Please contact iX for a drive replacement.
Comment 7•10 years ago
|
||
It may also be the controller, too, come to think of it. They'll probably ask for some diagnostic data we can pass onto them to help them diagnose the issue.
Comment 8•10 years ago
|
||
iX Systems RMA Case ID #AMA-717-88012
Whiteboard: reimaging → iX Systems RMA Case ID #AMA-717-88012
Comment 9•10 years ago
|
||
I've ran HDD diags but no errors were detected. While we wait for iX to respond, I'll reimage the host.
Comment 10•10 years ago
|
||
reimaging won't help, we've already done that. There's is most definitely issues with the disk or controller if you look at the system logs.
Comment 11•10 years ago
|
||
The host has been dropped off to iX for 48 hrs burn-in diags.
Comment 12•10 years ago
|
||
Feedback from iX support: The node itself has passed our burn-in test. Have you run any additional tests on the drives associated with it?
Comment 14•10 years ago
|
||
Node came back from iX with no errors detected. iX has loaned me a temp HDD to replace. Upon reimaging, I get the following error; see attachment.
Comment 15•10 years ago
|
||
Reporter | ||
Comment 16•10 years ago
|
||
fwiw, we're again getting: [19:48:59] nagios-releng Tue 16:49:02 PDT [4219] foopy64.p4.releng.scl3.mozilla.com:load is CRITICAL: (Return code of 255 is out of bounds) (http://m.mozilla.org/load)
Comment 17•10 years ago
|
||
(In reply to Vinh Hua [:vinh] from comment #15) > Created attachment 8501434 [details] > foopy64.JPG Vinh: according to dustin and lsscsi, foopies shouldn't have raid. I think the disk needs to be zero'd to wipe the metadata that is hanging up anaconda
Comment 18•10 years ago
|
||
Host is online now with a loaner HDD. Let me know if issues persist.
Comment 19•10 years ago
|
||
:callek - How's foopy64 holding up after the HDD replacement?
Reporter | ||
Comment 20•10 years ago
|
||
I'm not sure if its been put back into rotation, passing the question off to coop.
Flags: needinfo?(coop)
Comment 21•10 years ago
|
||
It's back in rotation, and seems to be working correctly thus far. I'll reopen if anything goes wrong.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Flags: needinfo?(coop)
Resolution: --- → FIXED
Comment 22•10 years ago
|
||
:coop - The replacement hard disk came in. Can you take foopy64 down for the replacement?
Status: RESOLVED → REOPENED
Flags: needinfo?(coop)
Resolution: FIXED → ---
Comment 23•10 years ago
|
||
(In reply to Vinh Hua [:vinh] from comment #22) > :coop - The replacement hard disk came in. Can you take foopy64 down for > the replacement? I've disabled all the pandas on foopy64 and shutdown the foopy. It's all yours.
Flags: needinfo?(coop)
Comment 24•10 years ago
|
||
:coop - Foopy64 has been reimaged with the replacement drive.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•