Closed Bug 1636194 Opened 4 years ago Closed 2 years ago

symbolicate inlined functions correctly

Categories

(Eliot :: General, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: mstange)

References

Details

Attachments

(2 files)

The symbolication API symbolicates stacks, but can't symbolicate inlined functions in those stacks correctly.

This bug covers fixing that in the symbolication API.

Bug #524410 covers some of the problems with inlined functions and symbolication and Breakpad symbols files. My understanding is that we need to fix Tecken to not use Breakpad symbols files which don't have the information we need and instead use the debug binaries that also get uploaded.

Bug #1621638 covers looking into switching the symbolicate API to use the Sentry Symbolic crate (https://github.com/getsentry/symbolic). That crate can parse many types of debug information files. I want to work on this after we switch Tecken to using the Symbolic crate.

Gabriele said on Matrix:

I believe the perf people would love 2, and plenty of staff and senior staff Gecko engineers would love to have inlined frames in crash stacks.

Markus (perf) said in https://bugzilla.mozilla.org/show_bug.cgi?id=1614928#c6 :

A related feature request is inline stacks, though those can be treated as orthogonal.

Seems like this is important to do if we can. If there are others who are impacted by this, I'd love to get a list of examples and use cases.

As a note, this bug is specifically about fixing the symbolicate API in Tecken and not about fixing symbolication of stacks in Socorro--bug #1328585 covers that.

One thing that'd be helpful to have is one or more crash reports where there are inlined functions in the stack and the symbolicated stack is wrong. I can use these to build test cases to move forward on this bug.

Summary: support symolicating inlined functions → symbolicate inlined functions correctly
Depends on: 1636210

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #1)

One thing that'd be helpful to have is one or more crash reports where there are inlined functions in the stack and the symbolicated stack is wrong. I can use these to build test cases to move forward on this bug.

From bug 1635693, https://crash-stats.mozilla.org/report/index/4bd770c5-2bcf-4d6a-bf50-f90cd0200506 points at mozilla::RunnableFunction::Run when it should really be pointing at https://searchfox.org/mozilla-beta/rev/cd865d2e353efcbf19a6188807f761ddcb95ee80/extensions/permissions/PermissionManager.cpp#2839 or thereabouts.

I'm sure examples can be multiplied. Any crash with the same signature as bug 1635693 will have the same problem:

all point at the same line for the crash and they are not all the same bug; the root causes are wildly different. (Maybe we should at least be separating those out in signature generation given that we know lambdas are involved.)

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #1)

One thing that'd be helpful to have is one or more crash reports where there are inlined functions in the stack and the symbolicated stack is wrong. I can use these to build test cases to move forward on this bug.

As for the profiler, having inline callstacks will allow us to have much higher fidelity call trees. Here's a comparison between the current Firefox profiler (which does not have inline call stacks) and Xcode Instruments (which does have inline callstacks). For some code, especially rust code, the difference is staggering!

From the Tecken API, the profiler will need, per address:

  • The name of the function symbol that covers the instruction at the given address (currently: function)
  • The offset between the function symbol and the given address (currently: function_offset), or, alternatively, just the module-relative address of the function symbol
  • The list of inline frames, if known. (This is only available if we have debug information for the library. For system libraries, we will have neither inline stacks nor file/line information.) For each inline frame:
    • The function name
    • The filename and line number

So, most importantly, this means that one address can expand to multiple functions, rather than just one.

Old response format:

{
  "frame": 0,
  "module_offset": "0x100cdad",
  "module": "XUL",
  "function": "mozilla::dom::AttrBinding::get_specified(JSContext*, JS::Handle<JSObject*>, mozilla::dom::Attr*, JSJitGetterCallArgs)",
  "function_offset": "0x4d",
  "file": "/Users/mstange/code/obj-m-opt/dom/bindings/AttrBinding.cpp",
  "line": 249
},

Proposed new format:

{
  "frame": 0,
  "module_offset": "0x100cdad",
  "module": "XUL",
  "function": "mozilla::dom::AttrBinding::get_specified(JSContext*, JS::Handle<JSObject*>, mozilla::dom::Attr*, JSJitGetterCallArgs)",
  "function_offset": "0x4d",
  "file": "/Users/mstange/code/obj-m-opt/dom/bindings/AttrBinding.cpp",
  "line": 249
  "inlines": [
    {
      "function": "std::__1::__atomic_base<char const*, false>::store(char const*, std::__1::memory_order)",
    },
    {
      "function": "mozilla::detail::IntrinsicMemoryOps<char const*, (mozilla::MemoryOrdering)1>::store(std::__1::atomic<char const*>&, char const*)",
      "file": "/Users/mstange/code/obj-m-opt/dist/include/mozilla/Atomics.h",
      "line": 201
    },
    {
      "function": "mozilla::detail::AtomicBase<char const*, (mozilla::MemoryOrdering)1>::operator=(char const*)",
      "file": "/Users/mstange/code/obj-m-opt/dist/include/mozilla/Atomics.h",
      "line": 324
    },
    {
      "function": "js::ProfileEntry::initLabelFrame(char const*, char const*, void*, unsigned int, js::ProfileEntry::Kind, js::ProfileEntry::Category)",
      "file": "/Users/mstange/code/obj-m-opt/dist/include/js/ProfilingStack.h",
      "line": 232
    },
    {
      "function": "PseudoStack::pushLabelFrame(char const*, char const*, void*, unsigned int, js::ProfileEntry::Kind, js::ProfileEntry::Category)",
      "file": "/Users/mstange/code/obj-m-opt/dist/include/js/ProfilingStack.h",
      "line": 350
    },
    {
      "function": "mozilla::AutoProfilerLabel::Push(PseudoStack*, char const*, char const*, unsigned int, js::ProfileEntry::Category)",
      "file": "/Users/mstange/code/obj-m-opt/dist/include/GeckoProfiler.h",
      "line": 710
    },
    {
      "function": "mozilla::AutoProfilerLabel::base object constructor(JSContext*, char const*, char const*, unsigned int, js::ProfileEntry::Category)",
      "file": "/Users/mstange/code/obj-m-opt/dist/include/GeckoProfiler.h",
      "line": 695
    },
    {
      "function": "mozilla::AutoProfilerLabel::complete object constructor(JSContext*, char const*, char const*, unsigned int, js::ProfileEntry::Category)",
      "file": "/Users/mstange/code/obj-m-opt/dist/include/GeckoProfiler.h",
      "line": 692
    }
  ]
},

One question would be whether the inlines should be from outside-to-inside or inside-to-outside. I'm currently leaning towards inside-to-outside, because that's what the API of the addr2line crate does.

This is incredibly helpful! Thank you!

Blocks: 1398533

I'm removing the block on bug #1398533. This bug doesn't necessarily rely on that one. That one is about fixing the crash reporting machinery so that minidump-stackwalk can show inlined functions correctly. This one is about fixing Tecken's symbolication API which doesn't need to rely on all our crash reporting machinery (e.g. SYM files).

No longer blocks: 1398533
See Also: → 1563318
Depends on: 1729698
Depends on: 1730286

Jeff: Eliot doesn't use rust-minidump. Is there a bug in Symbolic for supporting INLINE in sym files?

It's https://github.com/getsentry/symbolic/issues/432.

The proposed JSON response format from comment 6 is now implemented in https://github.com/mstange/profiler-get-symbols and is understood by the Firefox profiler front-end (https://github.com/firefox-devtools/profiler/pull/3556, https://github.com/firefox-devtools/profiler/pull/3556/commits/c18e99d1c562f987e97e820e63c2ab48f81f7ad5).

The next steps here are: (edit 2022-07-01: added current status)

  1. Output inline info in dump_syms: https://github.com/mozilla/dump_syms/issues/280 - patch is up, needs review
  2. Consume inline info in symbolic: https://github.com/getsentry/symbolic/issues/432 - done, awaiting release
  3. Make Tecken forward inline info to its response JSON: https://github.com/mozilla-services/tecken/pull/2579 - waiting for symbolic release

Markus: Thank you! This helps a ton!

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED

I'm removing bug 1729698 and bug 1730286 from the "depends on" list - we are pursuing the path of having inline info in the breakpad sym files. We are not switching to native debuginfo in the short term.

I've made a fair amount of progress on this bug over the last week. The current status is the following:

  • Upstream symbolic now supports parsing the new sym format. Once this change is released in a new symbolic version, we can update Tecken to that version and finish and test the Tecken PR for this bug.
  • rust-minidump is getting support for the new sym format in rust-minidump PR #599. The next step is to finish and deploy that change.
  • Once both Tecken and rust-minidump are deployed, it means every consumer is compatible with the new format. At that point we can start emitting the new sym format in dump_syms. To do this, we need to finish and deploy dump_syms PR #392. At that point we should have inline stacks in the profiler for any new Firefox builds that use the new dump_syms version!
  • The above dump_syms PR only supports Mac / Linux / Android for now. To emit inline info for Windows pdbs, some additional work in dump_syms is needed, tracked by dump_syms issue #280.
No longer depends on: 1729698, 1730286
No longer blocks: 1777925
Depends on: 1779630
Depends on: 1779631
Type: defect → enhancement
No longer depends on: 1779630

Current status:

  • rust-minidump is ready to deploy in bug 1779630.
  • dump_syms now has support for outputting inlines for macOS / Linux / Android, but this version isn't used in mozilla-central yet.
  • symbolic-debuginfo hasn't had a release yet, so updating Tecken + fix-stacks is still blocked, and we can't deploy the new dump_syms yet.
  • I've made some progress on Windows support and am continuing to work on it.

In the meantime, I've pushed a try build with an "inlineful" dump_syms, so we now have some example inline .sym files on the symbol server, at least for a while until these symbols get evicted:
macOS libmozglue.dylib: https://symbols.mozilla.org/try/libmozglue.dylib/E0BD475866D23370B4B61AE86A5ACFC00/libmozglue.dylib.sym
Android libxul.so: https://symbols.mozilla.org/try/libxul.so/2D1F0F4A92B5424CED68446CE38A0CC70/libxul.so.sym

Here's a profile captured from the try macOS build: https://share.firefox.dev/3PW6IED
Clicking "Profile Info" -> "Re-symbolicate profile" will re-run the request to the symbolication API. If the request is successful, inline frames and filenames will show up in the call tree. At the moment the request is unsuccessful (returns 500) so the symbols you see there are not from the server, they are from the downloaded binary's symbol table.

Here's a reduced example request:

curl -d '{"jobs":[{"memoryMap":[["XUL","EC1363211A96381FA0C96D6301E7574A0"]],"stacks":[[[0,84057094],[0,21540134],[0,16679105],[0,14627540],[0,7577387],[0,5445094],[0,3362680],[0,17360764],[0,25813110],[0,19264996],[0,21365876],[0,24330722],[0,14822445]]]},{"memoryMap":[["libmozglue.dylib","E0BD475866D23370B4B61AE86A5ACFC00"]],"stacks":[[[0,119167],[0,212778],[0,7104],[0,118708],[0,200461],[0,192641],[0,202842],[0,194597],[0,160032],[0,192681],[0,108535],[0,202333]]]}]}' https://symbolication.services.mozilla.com/symbolicate/v5

Patch is up at https://github.com/mozilla-services/tecken/pull/2579 , associated with bug 1779633. It needs to land before we can output inlines with an updated dump_syms. Let's leave this bug open until bug 1779631 lands.

Oops, sorry, the previous comment was meant for bug 1398533.

Current status:

All the code for this has been written and is now waiting for deploy. I consider my work on this project done.
The remaining action items are:

  1. This Tecken PR needs to be reviewed, landed, and deployed (bug 1779633).
  2. Socorro needs to update its rust-minidump version (bug 1779630).
  3. Then the patch in bug 1779631 can be landed, and we're done. We will get full inline information from the symbolication API.
Assignee: mstange.moz → nobody
Status: ASSIGNED → NEW

Bug 1779631 has landed, so this is now all done!

Assignee: nobody → mstange.moz
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Thank you for doing this!

Thank you for bringing this to completion, it's a huge milestone!

Component: Symbolication → General
Product: Tecken → Eliot
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: