Closed Bug 1109862 Opened 10 years ago Closed 9 years ago

Distribute update dbghelp.dll to all Windows XP talos machines for more usable profiler pseudostacks

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
Windows XP
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mconley, Assigned: q)

References

Details

Please see bug 900524 for context, but basically, in order for us to get usable stacks when profiling on our talos machines on XP, we need those machines to have a updated dbghelp.dll file, which I can provide (or is also available by following these instructions: https://bugzilla.mozilla.org/show_bug.cgi?id=900524#c9).
OS: Windows 7 → Windows XP
Blocks: 1121571
Who can help me with this? The Talos profiling infrastructure is now ready to be used in earnest, but Windows XP profiles will be almost useless without this fix.
Flags: needinfo?(rail)
302 coop to prioritize  this.
Flags: needinfo?(rail) → needinfo?(coop)
(In reply to Rail Aliiev [:rail] from comment #2)
> 302 coop to prioritize  this.

Is it the single DLL that is required, or the entirety of the Debugging Tools as mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=900524#c9 ? Where should the DLL be installed on each machine? How can I test that it is installed correctly, i.e. is there a failure case I can trigger to see it work?

Assuming answers to the above, I can deploy the DLL to a handful (5) of XP machines and see if there are unexpected effects. That part is easy. Afterwards, I would need a timeslice from either :Q or :markco to help with a GPO or puppet deployment to put the DLL everywhere.
Flags: needinfo?(coop) → needinfo?(mconley)
The lone dbghelp.dll is all that's required. I believe you have two choices: you can drop it into Windows/System32 of each XP machine, or throw it into every directory that firefox.exe executes in.

You'll probably want the former.
Flags: needinfo?(mconley)
(In reply to Chris Cooper [:coop] from comment #3)
> How can I test that it
> is installed correctly, i.e. is there a failure case I can trigger to see it
> work?

The easiest way is to run a Talos job (e.g. tpaint, which is fairly small) with --spsProfile, as described here: https://wiki.mozilla.org/Buildbot/Talos/Profiling#On_TryServer

Then open the profile for e.g. tpaint in cleopatra and look at the stacks. Here's a tpaint profile without the fix:
http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/0af57006c0795d13f863581963e9c428fac78f3114103c25fa43f8270d589b1ca2ae1c48b227af07b96f1d7550ec23f4e654651982c953d47ad4b843f510241e
The stack you get by expanding along the top of the tree starts with
> (root)
> Startup::XRE_Main
> Timer::Fire
> js::RunScript
openWindow() @ tpaint.html?auto=1&sps_profile_entries=2000000&sps_profile_dir=c%3A%5Cdocume%7E1%5Ccltbld%7E1.t-> x%5Clocals%7E1%5Ctemp%5Ctmpg2qcuy&sps_profile_interval=1&sps_profile_threads=GeckoMain%2CCompositor:103
> BaseProcessStart
> __tmainCRTStartup
> wmain
> NS_internal_main(int,char * *)
> do_main
> XRE_main

The problem here is that these pseudostack entries are too close to the root:
> Startup::XRE_Main
> Timer::Fire
> js::RunScript
openWindow() @ tpaint.html?auto=1&sps_profile_entries=2000000&sps_profile_dir=c%3A%5Cdocume%7E1%5Ccltbld%7E1.t-> x%5Clocals%7E1%5Ctemp%5Ctmpg2qcuy&sps_profile_interval=1&sps_profile_threads=GeckoMain%2CCompositor:103

With the good DLL, the stack should look like this:
> (root)
> BaseProcessStart <-- Needs to start with a few C++ stack frames
> __tmainCRTStartup
> wmain
> NS_internal_main(int,char * *)
> do_main
> XRE_main
> Startup::XRE_Main <-- The first pseudostack entry here
with the other pseudostack entries intermingled with the C++ stack further down.
I installed the Debugging Tools for Windows on t-xp32-ix-006. To be sure I was running in the correct config, I tried to replicate a talos run using the same artifacts from mstange's run listed in the wiki:

https://wiki.mozilla.org/Buildbot/Talos/Profiling#On_TryServer
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec8bf3a470d1
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-ec8bf3a470d1/try-win32/try_xp-ix_test-g1-bm109-tests1-windows-build284.txt.gz

The specific talos command I ran was:
C:\slave\test\build\venv\Scripts\talos --noisy --debug -v --executablePath C:\slave\test\build\application\firefox\firefox --title t-xp32-ix-006 --symbolsPath https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/mstange@themasta.com-ec8bf3a470d1/try-win32/firefox-38.0a1.en-US.win32.crashreporter-symbols.zip --activeTests tp5o_scroll:glterrain --output talos.yml --branchName Try-Non-PGO --authfile C:\slave\test\oauth.txt --spsProfile --webServer localhost

The only changes were the machine name and removal of the automatic upload steps.

I ran the test twice with the DLLs in different places:

1) No dlls copied, dbghelp.dll installed in the default dir = C:\Program Files\Debugging Tools for Windows (x86)
2) dbghelp.dll copied into the Firefox application dir

The profile zips for both are here:

1) http://people.mozilla.org/~coop/dbghelp/default_install_dir/
2) http://people.mozilla.org/~coop/dbghelp/application_dir/

I'm not familiar with Cleopatra, so if someone can verify that either profile set displays the desired stack behavior, that would be helpful. (added needinfo to :mstange for this)

If #1 works, great. It should be a simple install via GPO or even puppet.

If #2 works, this might be better solved by adding dbghelp.dll to tooltool and pulling it down into the application dir only on XP tests with spsProfile enabled.

If we think *all* XP tests might benefit from having this extra debugging enabled, we can try to replace the existing dbghelp.dll in C:\Windows\System32 with the one from the Debugging Tools. That will require more Windows-fu than I currently have, since the stock dbghelp.dll is one of the files that Windows protects religiously. I did try this a couple of different ways, because, man, would that be simpler, but I was unsuccessful.

The machine (t-xp32-ix-006) is still on loan to me if we need further investigation here.
Flags: needinfo?(mstange)
#1 did not work, #2 worked.
Flags: needinfo?(mstange)
(In reply to Chris Cooper [:coop] from comment #6)
> If we think *all* XP tests might benefit from having this extra debugging
> enabled, we can try to replace the existing dbghelp.dll in
> C:\Windows\System32 with the one from the Debugging Tools.

At the moment there wouldn't be any benefit from using the new DLL in XP tests other than profiled Talos runs, but there's no real reason to keep the old DLL around either (except if that's simpler).
We might conceivably want to add a way to run normal unit tests with profiling at some point in the future, so at that point having the new DLL around would be useful. But at the moment that's not a concern.
OK, passing this over to our Windows deployment specialists, :markco and :Q.

Mark, Q: we'd like to replace the stock version of of dbghelp.dll that lives in C:\Windows\system32 on the XP slaves with the updated dll from the Debugger Tools. 

The solitary dll can be found here:

http://people.mozilla.org/~coop/dbghelp/dbghelp.dll
MD5 (dbghelp.dll) = 4003e34416ebd25e4c115d49dc15e1a7

I had trouble trying to replace to the dll in C:\Windows\system32, even as root (comment #6). Hopefully you have more tricks up your sleeves.
Assignee: nobody → relops
Component: Platform Support → RelOps
Product: Release Engineering → Infrastructure & Operations
QA Contact: coop → arich
Version: unspecified → other
Assignee: relops → mcornmesser
Disabled t-xp32-ix-007 to setup and test the GPO.
Just an update. This is proving to be a little tricky. Seems like this dll file is permanently being held open by a process.
Assignee: mcornmesser → q
I am able to get this to install by resetting windows file protection with SFCDisable  on next boot but there is a potential race condition. I am testing a script to disable protection, check file version on next boot do a replace, then on next boot after that renable protection. I am rolling into a gpo to test a pool of candidates
The gpo version is working:

C:\Documents and Settings\Administrator>wmic datafile where name='c:\\windows\\system32\\dbghelp.dll'
AccessMask  Archive  Caption                          Compressed  CompressionMethod  CreationClassName  CreationDate               CSCreationClassName
   CSName         Description                      Drive  EightDotThreeFileName            Encrypted  EncryptionMethod  Extension  FileName  FileSize
 FileType               FSCreationClassName  FSName  Hidden  InstallDate                InUseCount  LastAccessed               LastModified
    Manufacturer           Name                             Path                Readable  Status  System  Version                           Writeable

18809343    TRUE     c:\windows\system32\dbghelp.dll  FALSE                          CIM_LogicalFile    20130430093753.129823-480  Win32_ComputerSyste
m c:\windows\system32\dbghelp.dll  c:     c:\windows\system32\dbghelp.dll  FALSE                        dll        dbghelp   640000
 Application Extension  Win32_FileSystem     NTFS    FALSE   20130430093753.129823-480              20150218230047.390625-480  20080414040000.000000-4
80  Microsoft Corporation  c:\windows\system32\dbghelp.dll  \windows\system32\  TRUE      OK      FALSE   5.1.2600.5512 (xpsp.080413-2105)  TRUE



C:\Documents and Settings\Administrator>gpupdate
Refreshing Policy...

User Policy Refresh has completed.
Computer Policy Refresh has completed.


C:\Documents and Settings\Administrator>wmic datafile where name='c:\\windows\\system32\\dbghelp.dll'
AccessMask  Archive  Caption                          Compressed  CompressionMethod  CreationClassName  CreationDate               CSCreationClassName
   CSName         Description                      Drive  EightDotThreeFileName            Encrypted  EncryptionMethod  Extension  FileName  FileSize
 FileType               FSCreationClassName  FSName  Hidden  InstallDate                InUseCount  LastAccessed               LastModified
    Manufacturer           Name                             Path                Readable  Status  System  Version
Writeable
18809343    TRUE     c:\windows\system32\dbghelp.dll  FALSE                          CIM_LogicalFile    20130430093753.129823-480  Win32_ComputerSyste
m  c:\windows\system32\dbghelp.dll  c:     c:\windows\system32\dbghelp.dll  FALSE                        dll        dbghelp   1213200
 Application Extension  Win32_FileSystem     NTFS    FALSE   20130430093753.129823-480              20150218230331.985330-480  20150218214830.930545-4
80  Microsoft Corporation  c:\windows\system32\dbghelp.dll  \windows\system32\  TRUE      OK      FALSE   6.12.0002.633 (debuggers(dbg).100201-1203)
TRUE
Coop,


 IS there a pool of test machines I should use or just pick a few xp hosts at random to try this out on over the weekend?

Q
Flags: needinfo?(coop)
(In reply to Q from comment #14)
> IS there a pool of test machines I should use or just pick a few xp hosts
> at random to try this out on over the weekend?

Just grab a host or two (or three) in slavealloc, and link them here.
Flags: needinfo?(coop)
t-xp32-ix-006 & t-xp32-ix-007 are already set aside for this, I see.
Using xp 001 - 007 and the replace is taking. Lets let them run over the weekend and see what happens. if it works it will go out pool wide.
(In reply to Q from comment #17)
> Using xp 001 - 007 and the replace is taking. Lets let them run over the
> weekend and see what happens. if it works it will go out pool wide.

The results from the 4 machines from that batch that were enabled are sufficiently green that we can deploy this pool-wide.
Great News!
Very cool! Thank you very much for working on this.
(In reply to Q from comment #19)
> Great News!

Q: is this deployed everywhere now?
Flags: needinfo?(q)
As machines reboot they should be getting the new dll
Flags: needinfo?(q)
Notes for the future: 
 This wound needing a mufti step process due to not being able to easily disable SFC in xp sp3:

First I had to create a new dll cab or (dl_) file for the new 6.x dll by running "makecab dbghelp.dll". That produces a dbghelp.dl_ file.

The GPO distributes the dbghelp.dl_ file to c:\windows\source\i386

The GPO then replaces dbghelp.dll in c:\windows\system32\dllcache

The GPO then replaces dbghelp.dll c:\windows\system32\

All steps are required in order to get the dll file to no be replaced by sfc or be detected as malware.
All GPO crietria on file replacement is based on version checks 5.1.2600.5512 being the old dll and 6.12.0002.633 being the new dll.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.