Closed Bug 1746630 Opened 2 years ago Closed 1 year ago

Implement new threads.N.frames.N.unloaded_modules minidump-stackwalk schema

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Gankra, Assigned: willkg)

References

Details

Attachments

(2 files)

WIP PR with schema changes here: https://github.com/luser/rust-minidump/pull/367

Things that need to change:

  • Parse the new fields
  • Update "function signature" generation (see mdsw's --human output for my proposed format)
  • Update "crash signature" generation (same format?)

Note that the current proposal is to remove the unloaded module hack, so this would need to be deployed atomically with an update to mdsw.

But that's only one of the two options I'm still waffling between, so feedback is welcome.

(This is a problem for 2022, not today)

Grabbing it and making it a P2 to do soon.

Status: NEW → ASSIGNED
Priority: -- → P2

Copying thoughts over from bug #1757623:

Per that, we'll need to do a few things:

  1. make sure signature generation handles when module and module_offset are null (pretty sure Aria's PR does this)
  2. add support for unloaded modules
  3. figure out how to test this out with crash reports so we can ascertain whether what we're doing is good or terrible or whatever
  4. figure out how all this will work with the lists (prefix, irrelevant, etc)

Also, we need to look at crash storage and the webapp:

  1. telemetry schema changes?
  2. elasticsearch index changes?
  3. webapp display changes?

I'm probably missing stuff, too.

It'd help to have some crash reports we could test with that have unloaded module information.

I spent the last two weeks rewriting large chunks of Elasticsearch and TelemetryBotoS3CrashStorage code. The changes in the stackwalker output will no longer completely hose us there. Further, I think we can push out the new stackwalker now. I'll look into whether there are issues over the weekend creating the new Elasticsearch index and then look at deploying next week.

There's still work to be done. Aria did a PR for changes to signature generation. I need to review those. I need to look at how the webapp needs to change.

I spent some time looking for crash reports that have unloaded modules in the crashing thread. Here's a list:

  • 1f249db0-9cfc-4aae-8ed6-647de0220323
  • 3f843000-8f0e-4277-bf06-1773c0220324
  • 40892bc6-a3e2-41c8-8ad8-0018f0211209
  • 9e53fe80-8847-468c-8b36-60b6e0220323
  • 0be77c63-fda8-4217-8324-9fcc20220324
  • 1550a843-5619-4f0d-8260-f5a860220323
  • 1f249db0-9cfc-4aae-8ed6-647de0220323
  • 1f249db0-9cfc-4aae-8ed6-647de0220323
  • 664abe4b-d91e-4a74-b50b-f72dd0220324
  • 71ed9c5d-7080-49e9-96bb-124520220324
  • 9e53fe80-8847-468c-8b36-60b6e0220323
  • a6b591fe-a6f2-429c-b889-9bd710220324
  • ac699a1f-b451-4124-9b92-955980220324
  • b9837772-ee83-4cd3-b2d0-7d0380220324
  • b9837772-ee83-4cd3-b2d0-7d0380220324
  • d4b9c3af-5206-47db-ad21-627da0220323
  • ff915f6a-71e6-49b4-b8bc-ef7de0220323

I'm working on building out a schema for the processed crash that includes the output of the stackwalker. That makes it possible for me to reason about changes in rust-minidump output over time while minimizing risk.

Given that, I'm making this dependent on me finishing up that work.

Depends on: 1764395

willkg merged PR #6155: "bug 1746630: add unloaded modules to the processed crash schema" in b5cb303.

Next, I need to:

  1. Figure out what to do with the webapp.
  2. Look at Aria's PR for signature generation changes: https://github.com/mozilla-services/socorro/pull/6021/

Will could you prioritize this? I've noticed that we have crashes where two different signatures have collapsed into one because we have two different stack traces, one with an unloaded module (which currently is ignored) and one with. Regarding the signature generation I prepared a simpler patch than Aria's PR which essentially restores the old behavior:

https://github.com/gabrielesvelto/socorro/commit/198d669a0620ee3b86a8662e6fbec0ef92f18dad

Similarly for the webapp I'd show the name of the first unloaded module in the list matching a frame as well as the first matching address. This would basically make Socorro print out crashes with unloaded modules like we did with Breakpad. I'll leave the complexity of dealing with multiple unloaded modules at a later stage if we find crashes where it is relevant; ATM it's not worth spending time on it IMHO.

Flags: needinfo?(willkg)

I'll try to work through this today.

Example crash reports (all the previous examples have expired):

  • edcb4dbc-5516-4b15-9464-26adc0221001
  • 1e8a0180-dd82-4051-b1ac-1006c0221001
  • 9cc1cb90-c92a-4a31-b69c-cebfc0221001
Flags: needinfo?(willkg)

I implemented support for unloaded modules in signature generation basing it off of what Aria (pr #6021) and Gabriele suggested (commit from comment #9). I used Aria's notation because I thought it was clearer.

I also added support for unloaded modules to the report view in the webapp.

I didn't implement Aria's suggestions for advanced support in this third commit:

https://github.com/mozilla-services/socorro/pull/6021/commits/8d0762c8b71096d6fb790bf8ec84a2a31516cf71

From 8d0762c8b71096d6fb790bf8ec84a2a31516cf71 Mon Sep 17 00:00:00 2001
From: Aria Beingessner <a.beingessner@gmail.com>
Date: Tue, 1 Mar 2022 15:03:18 -0500
Subject: [PATCH] Implement advanced signature for unloaded modules

---
 socorro/signature/rules.py | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/socorro/signature/rules.py b/socorro/signature/rules.py
index a73cbac820..11e5ebeb85 100644
--- a/socorro/signature/rules.py
+++ b/socorro/signature/rules.py
@@ -238,13 +238,26 @@ def normalize_frame(
             #
             # Both arrays are pre-sorted to keep the output stable.
 
+            # We want to generate an output like this:
+            #
+            # @0xf800 (unloaded many.dll@0x460,0x700) (unloaded solo.dll@0x5e0)
+            #
+            # * prefixed with raw address
+            # * parenthetical unloaded modules
+            # * if multiple offsets, comma separated
+
+            prefix = ""
+            if offset:
+                prefix = f"@{strip_leading_zeros(offset)}"
+
+            parts = []
             for unloaded in unloaded_modules:
                 if unloaded.module and unloaded.offsets:
-                    for unloaded_offset in unloaded.offsets:
-                        # For now, just grab the first entry and return it.
-                        return "unloaded {}@{}".format(
-                            unloaded.module, strip_leading_zeros(unloaded_offset)
-                        )
+                    stripped = map(strip_leading_zeros, unloaded.offsets)
+                    offsets = ",".join(stripped)
+                    part = "(unloaded {}@{})".format(unloaded.module, offsets)
+                    parts.append(part)
+            return "{} {}".format(prefix, " ".join(parts))
 
         # If there's an offset and no module/module_offset, use that
         if not module and not module_offset and offset:

Do we want the advanced version in signature generation? I'm concerned it'll potentially make signatures long. Maybe we truncate it? For example:

@0xf800 (unloaded many.dll@0x460,0x700) (unloaded solo.dll@0x5e0) (+5)

willkg merged PR #6222: "bug 1746630: support unloaded modules in signature generation and report view" in c6e3987.

This does the following:

  • adds support for unloaded modules to signature generation
  • adds support for unloaded modules to the webapp in the stack and create-a-bug links in the Crash Stats report view

When this goes to production, we need to notify stability, dev-platform, and crashreporting-wg that signatures are changing.

This was pushed to prod just now in bug #1797179. I sent a notification email to the stability, crash-reporting-wg, and dev-platform lists about the changes including that signatures may change. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: