Closed Bug 1512706 Opened 6 years ago Closed 5 years ago

Update minidump_stackwalk used for tests

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(firefox65 fixed, firefox66 fixed)

RESOLVED FIXED
Tracking Status
firefox65 --- fixed
firefox66 --- fixed

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(1 file)

It turns out the one we currently use currently, at least on windows, is outdated, and it produces bad stack traces in some cases, such as in bug 1510328.
The minidump_stackwalk used for tests ifs picked up from tooltool, and
it looks like the tooltool manifests haven't been updated in 2 years.

As it turns out, this old version is not capable of, at least, stack
walk over crashes originated in LTOed rust code on Windows, while
breakpad trunk, as well as the slightly oldest version we have in-tree
do.

While ideally, we'd go with building minidump_stackwalk as a taskcluster
toolchain artifact, or during the build, but that involves significantly
more work, while we should fix the stack traces of crashes that  people
_do_ get as early as possible.

The tooltool artifacts linked in the updated manifests were generated
the following way:
- Revert the last two changes that happened to minidump_stackwalk.cc
  between 2016 and now (they don't matter for functionality and get in
  the way of the rest below).
- Apply the changes betwen the version of minidump_stackwalk.cc from
  2016 and
https://hg.mozilla.org/users/tmielczarek_mozilla.com/stackwalk-http/raw-file/51e4725ffad4/stackwalk.cc
- Pick the http_symbol_supplier.cc and http_symbol_supplier.h files from
  the same repo as stackwalk.cc above.
- Add the necessary build system bits to build it off the Gecko tree,
  along with the in-tree breakpad code.
- Build it for linux, linux64, macosx64, win32-mingw.

The patch doing all the above is:
https://hg.mozilla.org/try/rev/64491336dc7fca7a1c00ae8c66619b01563bbe4e
The push to build it on the aforementioned platforms:
https://hg.mozilla.org/try/rev/4b621a67ca0bd6cf8954566e180d23e1ba8a9f83

A win64-mingw variant was also built, but is not being used, keeping
things in line with what we currently are using.  We may want to follow
up with an update to use that win64 variant on 64-bits testers.
As mentioned in the commit message, I went with the light solution because:
- This is better fixed sooner than later.
- We currently install minidump_stackwalk in the ubuntu test images. We'd need to stop doing that.
- The existing code from ted's repo actually doesn't compile on non-mingw windows, because it uses some posixy things (e.g. sys/time.h)
- Similarly, that code has a build-time and runtime dependency on curl on other platforms, and we don't have the curl headers installed on linux. However, the rest of breakpad also uses curl, but doesn't need that (it has a copy of the headers, and dlopens libcurl)
- We can either build this with the rest of Firefox, on each push (and we already build most of it already anyways, so it's not like it would be a large overhead), but can't for the reasons mentioned above ; or build this as a toolchain task (and I actually did the work to get almost there, through a --enable-project=... configure flag). In any case, we should definitely put this code in tree. (The former would seem more natural to me, but requires changing the code)
- Then we need to do the right hooking to get the executable either from a test archive or from a toolchain artifact, whichever we choose to use.
Oh, and for completeness, I should mention that I did test the patch with an explicit crash in webrender which yields stack traces with symbols, including for windows system libraries (which wouldn't happen if the patching to add http_symbol_supplier hadn't worked): https://treeherder.mozilla.org/#/jobs?repo=try&revision=62741a343a4356385b2a6633e63c72827b9f4faf

Also, a try run with the patch, that shows the crash from bug 1510328, on an opt build (so, rust LTO), with a proper stack trace:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f9f0fdbb8174ad3367656d3be5192185d5a772e3
(see how it looks in bug 1510328, it's completely different)
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/3fb3977a609c
Update minidump_stackwalk used for tests. r=RyanVM
https://hg.mozilla.org/mozilla-central/rev/3fb3977a609c
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Please nominate this for Beta approval when you get a chance.
Flags: needinfo?(mh+mozilla)
Comment on attachment 9030046 [details]
Bug 1512706 - Update minidump_stackwalk used for tests.

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: Bad stack traces for crashes happening on automation.

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): This doesn't affect Firefox itself, only a program that is used to dump stack traces when crash happen during automated tests. The crash dumps are still untouched and can still be processed manually if something goes wrong with the stack walker.
It would be good to do this on esr60 too, but bug 1503756 gets in the way.

String changes made/needed:
Flags: needinfo?(mh+mozilla)
Attachment #9030046 - Flags: approval-mozilla-beta?
Comment on attachment 9030046 [details]
Bug 1512706 - Update minidump_stackwalk used for tests.

[Triage Comment]
Improves stack traces for crashes in automation. Approved for 65.0b5.
Attachment #9030046 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1514090
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: