Closed Bug 768489 Opened 12 years ago Closed 12 years ago

talos-r3-w7-038 isn't processing minidumps properly / is crashing extremely frequently during tests

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Windows 7
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: billm)

References

Details

(Whiteboard: [buildduty])

Bug 768156, bug 767906 & bug 768483 lack proper stacks & are all from the same slave (talos-r3-w7-038).
Summary: talos-r3-w7-038 isn't processing minidumps properly → talos-r3-w7-038 isn't processing minidumps properly / is crashing extremely frequently during tests
Blocks: 767908
Blocks: 767932
Blocks: 767937
Blocks: 767970
Blocks: 768006
Blocks: 768036
Blocks: 768160
Blocks: 768397
Blocks: 768402
Blocks: 768415
Blocks: 768418
Blocks: 768425
Blocks: 768433
Blocks: 768434
Blocks: 768439
Blocks: 768453
Blocks: 768486
Please may we remove this slave from production.
Severity: major → critical
Actually, we see users who have crashes like and we can never solve them. Could we try to figure out what's wrong with this slave so we can help our users?
Armen, please may you give billm access to the slave after taking it out of production. Thanks! :-)
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: release → armenzg
Whiteboard: [buildduty]
I disabled the slave on slavealloc and added a note.

billm, would you be able to look what is wrong with this slave?
Blocks: 768424
Blocks: 768009
Blocks: 767925
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #8)
> I disabled the slave on slavealloc and added a note.
> 
> billm, would you be able to look what is wrong with this slave?

Yes, I'd like to try. At the very least, it would be interesting to run a memory test. It may just be a hardware malfunction. Can you please send me a login?
I asked IT to get you access.
You should have the machine ready tomorrow morning.
Assignee: nobody → armenzg
Depends on: 769046
I have sent the credentials to Bill.
Assignee: armenzg → wmccloskey
removing this from Critical to Major as it is in dev's hands to work with
Severity: critical → major
Bill: any update on the status of this slave? Found anything? Are you done with it?
I ran a memory test and didn't find anything. I tried running a few tests and none of them failed.

However, I don't think we can add this slave back to the pool. It will likely just cause more failures.
Depends on: 776924
(In reply to Bill McCloskey (:billm) from comment #14)
> I ran a memory test and didn't find anything. I tried running a few tests
> and none of them failed.
> 
> However, I don't think we can add this slave back to the pool. It will
> likely just cause more failures.

OK, thanks for trying. I'll get IT to re-educate the machine in bug 776924.
This slave has been fixed, at least in theory. If this recurs, please re-open and we'll just decommission this slave.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.