Closed Bug 943932 (t-w864-ix-025) Opened 11 years ago Closed 7 years ago

t-w864-ix-025 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

x86_64
Windows 8

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Usual cloning issues:
{
========= Started 'c:\mozilla-build\hg\hg clone ...' failed (results: 5, elapsed: 0 secs) (at 2013-11-27 08:38:35.842883) =========
'c:\\mozilla-build\\hg\\hg' 'clone' 'http://hg.mozilla.org/build/mozharness' 'scripts'
 in dir C:\slave\test\. (timeout 1320 secs)
 watching logfiles {}
 argv: ['c:\\mozilla-build\\hg\\hg', 'clone', 'http://hg.mozilla.org/build/mozharness', 'scripts']
 environment:
  ALLUSERSPROFILE=C:\ProgramData
  APPDATA=C:\Users\cltbld.T-W864-IX-025\AppData\Roaming
  COMMONPROGRAMFILES=C:\Program Files (x86)\Common Files
  COMMONPROGRAMFILES(X86)=C:\Program Files (x86)\Common Files
  COMMONPROGRAMW6432=C:\Program Files\Common Files
  COMPUTERNAME=T-W864-IX-025
  COMSPEC=C:\windows\system32\cmd.exe
  FP_NO_HOST_CHECK=NO
  HOMEDRIVE=C:
  HOMEPATH=\Users\cltbld.T-W864-IX-025
  KTS_HOME=C:\Program Files\KTS
  KTS_VERSION=1.19c
  LOCALAPPDATA=C:\Users\cltbld.T-W864-IX-025\AppData\Local
  LOGONSERVER=\\T-W864-IX-025
  NUMBER_OF_PROCESSORS=8
  OS=Windows_NT
  PATH=C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\mozilla-build\python27;C:\mozilla-build\python27\Scripts;C:\mozilla-build\msys\bin;C:\mozilla-build\vim\vim72;C:\mozilla-build\wget;C:\mozilla-build\info-zip;C:\CoreUtils\bin;C:\mozilla-build\buildbotve\scripts;C:\mozilla-build\hg
  PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
  PROCESSOR_ARCHITECTURE=x86
  PROCESSOR_ARCHITEW6432=AMD64
  PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 30 Stepping 5, GenuineIntel
  PROCESSOR_LEVEL=6
  PROCESSOR_REVISION=1e05
  PROGRAMDATA=C:\ProgramData
  PROGRAMFILES=C:\Program Files (x86)
  PROGRAMFILES(X86)=C:\Program Files (x86)
  PROGRAMW6432=C:\Program Files
  PROMPT=$P$G
  PSMODULEPATH=C:\windows\system32\WindowsPowerShell\v1.0\Modules\
  PUBLIC=C:\Users\Public
  PWD=C:\slave\test
  SYSTEMDRIVE=C:
  SYSTEMROOT=C:\windows
  TEMP=C:\Users\CLTBLD~1.T-W\AppData\Local\Temp
  TMP=C:\Users\CLTBLD~1.T-W\AppData\Local\Temp
  USERDOMAIN=T-W864-IX-025
  USERDOMAIN_ROAMINGPROFILE=T-W864-IX-025
  USERNAME=cltbld
  USERPROFILE=C:\Users\cltbld.T-W864-IX-025
  WINDIR=C:\windows
 using PTY: False
program finished with exit code -1073741800
elapsedTime=0.109000
========= Finished 'c:\mozilla-build\hg\hg clone ...' failed (results: 5, elapsed: 0 secs) (at 2013-11-27 08:38:36.020720) =========
}

Disabled in slavealloc.
Re-enabled and rebooted.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Throwing up errors like this:
https://tbpl.mozilla.org/php/getParsedLog.php?id=33406188&tree=Services-Central

I attempted a manual reboot, but that didn't help. Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looks like it is having problems rm C:\slave\test\build during clobber.

Seems tied to: Bug 692715 - Windows slaves often get permission denied errors while rm'ing files

I am surprised as this should be using mozharness's _rmtree_windows() which, at a glance, takes care of many edge cases. 

Although in saying that, I tried
'rm -rf build'
'rmdir /s /q build'
'del /s build\*' then 
'rmdir /s /q build' again.
'shutdown -r'
'rmdir /s /q build' again.

but no matter what awful permutations I tried, I keep getting:
build\tools\HG8B6C~1\store\data\BUILDF~1\utils\GENERA~1\places - The directory is not empty

Finally I vnc'd and tried to delete it. I got a prompt that gave same message with an error code.
Error 0x80070091: The directory is not empty

Microsoft thinks it could be bad sectors and the disk needs to be checked: http://answers.microsoft.com/en-us/windows/forum/windows_7-windows_programs/error-0x80070091-the-directory-is-not-empty-while/8c58e1d9-b6f2-4164-8fa6-19f35c09ea3d
Depends on: 968046
Back in production after a reimage. Hopefully that helps, given that it passed hardware diagnostics.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
http://buildbot-master110.srv.releng.scl3.mozilla.com:8201/builders/WINNT%206.2%20try%20debug%20test%20crashtest/builds/1981/steps/clone_scripts/logs/stdio

reports:

'c:\\mozilla-build\\hg\\hg' 'clone' 'https://hg.mozilla.org/build/mozharness' 'scripts'
 in dir C:\slave\test\. (timeout 1320 secs)
 watching logfiles {}
 argv: ['c:\\mozilla-build\\hg\\hg', 'clone', 'https://hg.mozilla.org/build/mozharness', 'scripts']
 environment:
  ALLUSERSPROFILE=C:\ProgramData
  APPDATA=C:\Users\cltbld.T-W864-IX-025\AppData\Roaming
  COMMONPROGRAMFILES=C:\Program Files (x86)\Common Files
  COMMONPROGRAMFILES(X86)=C:\Program Files (x86)\Common Files
  COMMONPROGRAMW6432=C:\Program Files\Common Files
  COMPUTERNAME=T-W864-IX-025
  COMSPEC=C:\windows\system32\cmd.exe
  DCLOCATION=SCL3
  DNSSUFFIX=wintest.releng.scl3.mozilla.com
  FP_NO_HOST_CHECK=NO
  HOMEDRIVE=C:
  HOMEPATH=\Users\cltbld.T-W864-IX-025
  KTS_HOME=C:\Program Files\KTS
  KTS_VERSION=1.19c
  LOCALAPPDATA=C:\Users\cltbld.T-W864-IX-025\AppData\Local
  LOGONSERVER=\\T-W864-IX-025
  MONDIR=C:\Monitor_config\
  MOZBUILDDIR=C:\mozilla-build\
  NUMBER_OF_PROCESSORS=8
  OS=Windows_NT
  OURDRIVE=C:
  PATH=C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\mozilla-build\python27;C:\mozilla-build\python27\Scripts;C:\mozilla-build\msys\bin;C:\mozilla-build\vim\vim72;C:\mozilla-build\wget;C:\mozilla-build\info-zip;C:\CoreUtils\bin;C:\mozilla-build\buildbotve\scripts;C:\mozilla-build\hg
  PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
  PROCESSOR_ARCHITECTURE=x86
  PROCESSOR_ARCHITEW6432=AMD64
  PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 30 Stepping 5, GenuineIntel
  PROCESSOR_LEVEL=6
  PROCESSOR_REVISION=1e05
  PROGRAMDATA=C:\ProgramData
  PROGRAMFILES=C:\Program Files (x86)
  PROGRAMFILES(X86)=C:\Program Files (x86)
  PROGRAMW6432=C:\Program Files
  PROMPT=$P$G
  PSMODULEPATH=C:\windows\system32\WindowsPowerShell\v1.0\Modules\
  PUBLIC=C:\Users\Public
  PWD=C:\slave\test
  RUNLOGFILE=C:\slave\\runslave.log
  SLAVEDIR=C:\slave\
  SYSTEMDRIVE=C:
  SYSTEMROOT=C:\windows
  TEMP=C:\Users\CLTBLD~1.T-W\AppData\Local\Temp
  TEST1=testie
  TMP=C:\Users\CLTBLD~1.T-W\AppData\Local\Temp
  USERDOMAIN=T-W864-IX-025
  USERDOMAIN_ROAMINGPROFILE=T-W864-IX-025
  USERNAME=cltbld
  USERPROFILE=C:\Users\cltbld.T-W864-IX-025
  WINDIR=C:\windows
 using PTY: False
program finished with exit code -1073741800
elapsedTime=0.109000



So I logged onto the machine:

cltbld@T-W864-IX-025 ~
$ cd 'C:\slave\test'

cltbld@T-W864-IX-025 /c/slave/test
$ 'c:\\mozilla-build\\hg\\hg' 'clone' 'https://hg.mozilla.org/build/mozharness' 'scripts'
abort: destination 'scripts' is not empty

cltbld@T-W864-IX-025 /c/slave/test
$ rm -rf scripts/

cltbld@T-W864-IX-025 /c/slave/test
$ 'c:\\mozilla-build\\hg\\hg' 'clone' 'https://hg.mozilla.org/build/mozharness' 'scripts'
requesting all changes
adding changesets
adding manifests
adding file changes
added 3225 changesets with 5689 changes to 499 files (+1 heads)
updating to branch default
318 files updated, 0 files merged, 0 files removed, 0 files unresolved

cltbld@T-W864-IX-025 /c/slave/test
$


This is strange, since I would expect if it was not empty, that an attempt would first be made to remove the directory, which seems either not to have happened, or failed for some reason, even though it did not fail when I removed it.

This might be a disk problem as suggested by Jordan in comment 4.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Green job - closing for now...
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Attempting IPMI reboot...Failed.
Filed IT bug for reboot (bug 1180915)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
False alarm.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
QA Contact: armenzg → bugspam.Callek
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Attempting IPMI reboot...Failed.
Filed IT bug for reboot (bug 1181002)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Not so false, turns out to have been broken by this morning's bug 1180877 event. Disabled.
Depends on: 1180877
Re-enabled after switch fix in bug 1181615, and rebooted back into production.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Stuck setting retry on everything after "rm: cannot remove directory `scripts/configs/b2g': Directory not empty"
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1200180
Re-imaged and enabled in slavealloc
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Attempting IPMI reboot...Failed.
Filed IT bug for reboot (bug 1232933)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Re-enabled in slavealloc.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
Stuck doing infinite retries with "RemoveDirectory "C:\slave\test\scripts\mozharness\mozilla\updates": The file or directory is corrupted and unreadable."

Note that it has been prone to this for two years now, whether or not diagnostics shows it, that disk might just be bad.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1246098
Disk diagnostics didn't reveal any issues. The slave has been re-imaged so I enabled it in slavealloc. Let's see how it behaves this time.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
"Unable to remove C:\slave\test\build!"

Two and a quarter years now.

Disabled.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Re-enabled in slavealloc and is taking jobs.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
"Unable to remove C:\slave\test\build!"

Two and five-twelfths years now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Slave was re-imaged and re-enabled in slavealloc,will keep monitoring to see if now will be able to  remove C:\slave\test\build.
Machine is working as intended,same job which failed in 07-07-2016 ( Windows 8 64-bit autoland debug test mochitest-clipboard) worked fine on this machine on 10-07-2016.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Attempting IPMI reboot...Failed.
Filed IT bug for reboot (bug 1307572)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Back online and taking jobs
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
seems to have problems again

 Unable to remove C:\slave\test\build! 

https://treeherder.mozilla.org/logviewer.html#?job_id=89495230&repo=autoland

disabled in slave alloc
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1355450
Back online after disk was repaired and host was reimage by DCOps team.
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
Unable to remove C:\slave\test\build!

Disabled.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Back online after DCOps ran memtests and no issues were found during disk diags and re-image
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Unable to remove C:\slave\test\build!

*sigh* Wish we knew why this slave keeps breaking when hardware diagnostics say everything is OK...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Back online after Dcops team swapped out the video card and reimaged,if the problem will persist we'll have to look into decommissioning this node.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
(In reply to Ryan VanderMeulen [:RyanVM] from comment #30)
> Unable to remove C:\slave\test\build!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Drive was replaces and host re-image by dcops team,back online
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.