Closed Bug 1757854 Opened 2 years ago Closed 2 years ago

upload_file_minidump contents are the multipart header

Categories

(Socorro :: Antenna, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(1 file)

We rewrote extract_payload to use Falcon's multipart/form-data handling in bug #1562641. I fixed a few issues that popped up in stage, but overall it looked good. Then we deployed that code to production. We fixed a few more things as they came up.

Now I'm seeing a number of incoming crashes where the upload_file_minidump content is something like this:

-----------------------------640A91CB49F2DAC2
Content-Disposition: form-data; name=CrashType

fatal native crash

Examples:

This bug covers figuring out what's going on.

Assignee: nobody → willkg
Status: NEW → ASSIGNED

Assuming HeaderMismatch in the signature is always an indicator of the issue, it looks like it only affects Fenix crash reports:

$ supersearchfacet --start-date='2022-02-23' --end-date='2022-03-02' --period=daily \
    --_facets=product --signature='HeaderMismatch' --format=markdown
date -- Fenix Firefox Focus
2022-02-23 00:00:00 0 0 0 0
2022-02-24 00:00:00 0 3 0 0
2022-02-25 00:00:00 0 1 0 0
2022-02-26 00:00:00 0 1 0 0
2022-02-27 00:00:00 0 1 0 0
2022-02-28 00:00:00 0 9 1 0
2022-03-01 00:00:00 0 5 0 1
2022-03-02 00:00:00 0 1800 0 57

If we look at signatures that have EMPTY in them (denoting some problem with the minidump), we see this:

$ supersearchfacet --start-date='2022-02-23' --end-date='2022-03-02' --period=daily \
    --_facets=product --signature='EMPTY' --format=markdown
date -- Fenix Firefox Focus ReferenceBrowser
2022-02-23 00:00:00 0 7241 772 395 0
2022-02-24 00:00:00 0 6323 832 307 0
2022-02-25 00:00:00 0 5804 661 299 2
2022-02-26 00:00:00 0 6648 748 342 0
2022-02-27 00:00:00 0 6349 487 342 0
2022-02-28 00:00:00 0 6101 870 225 0
2022-03-01 00:00:00 0 6344 838 237 0
2022-03-02 00:00:00 0 5700 1019 245 0

If we look at EMPTY signatures for Fenix over the last 7 days, we see this:

$ supersearchfacet --start-date='2022-02-23' --end-date='2022-03-02' --_facets=signature \
    --signature='EMPTY' --product=Fenix --period=daily
date -- EMPTY: no crashing thread identified EMPTY: no crashing thread identified; EmptyMinidump EMPTY: no crashing thread identified; HeaderMismatch EMPTY: no crashing thread identified; MissingSystemInfo EMPTY: no crashing thread identified; MissingThreadList EMPTY: no crashing thread identified; unknown error OOM large EMPTY: no crashing thread identified; EmptyMinidump
2022-02-23 00:00:00 0 1184 5989 0 3 27 38 0
2022-02-24 00:00:00 0 965 5259 3 10 26 60 0
2022-02-25 00:00:00 0 943 4781 1 4 36 38 1
2022-02-26 00:00:00 0 841 5719 1 1 41 45 0
2022-02-27 00:00:00 0 832 5429 1 3 38 46 0
2022-02-28 00:00:00 0 816 5190 9 5 47 34 0
2022-03-01 00:00:00 0 955 5308 5 9 31 35 1
2022-03-02 00:00:00 0 785 3060 1800 2 24 29 0

Ergo, while I think there is a bug in the extract_payload code or the Fenix crash reports in question are malformed, I think the crash reports it affects have junk minidumps and rust-minidump would have kicked up a "EmptyMinidump" before and now kicks up a "HeaderMismatch".

That looks like a bug in Fenix. I know we've always seen more malformed minidumps on Fenix than on any other platform (see bug 1644486) but I haven't figured out why it's happening yet.

See Also: → 1644486

I tinkerd with different variations of malformed payloads to see if I could get what I'm seeing in the description. If I include a no-bytes upload_file_minidump and miss the \r\n after it, then extract_payload in the collector will slurp up the next multiform part as the upload_file_minidump body. That's exactly like what I was seeing in the description.

Here's a raw form:

--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="DateStamp"\r\nContent-Type: text/plain; charset=utf-8\r\n\r\n2022-03-03T17:05:31.476349\r\n--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="ProductName"\r\nContent-Type: text/plain; charset=utf-8\r\n\r\nFenix\r\n--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="upload_file_minidump"; filename="file.dump"\r\nContent-Type: application/octet-stream\r\n\r\nabcde--c503c85c950243ae83ecb53354be8c5b\r\nContent-Disposition: form-data; name="CrashType"\r\nContent-Type: text/plain; charset=utf-8\r\n\r\nnative crash\r\n--c503c85c950243ae83ecb53354be8c5b--\r\n

Here's an (possibly) easier to read version where \r\n is replaced with newlines:

--c503c85c950243ae83ecb53354be8c5b
Content-Disposition: form-data; name="DateStamp"
Content-Type: text/plain; charset=utf-8

2022-03-03T17:05:31.476349
--c503c85c950243ae83ecb53354be8c5b
Content-Disposition: form-data; name="ProductName"
Content-Type: text/plain; charset=utf-8

Fenix
--c503c85c950243ae83ecb53354be8c5b
Content-Disposition: form-data; name="upload_file_minidump"; filename="file.dump"
Content-Type: application/octet-stream

--c503c85c950243ae83ecb53354be8c5b                                     <-- there should be an additional \r\n here
Content-Disposition: form-data; name="CrashType"
Content-Type: text/plain; charset=utf-8

native crash
--c503c85c950243ae83ecb53354be8c5b--

I'll look into this.

I looked at other Fenix crash reports that have minidumps and they have the CrashType annotation at the end of the dump contents. Ergo, I think sendFile needs to be sending a \r\n after the file contents.

I wrote this up:

https://github.com/mozilla-mobile/android-components/issues/11809

I'll keep tabs on it.

That fix in PR 11809 landed in the android-components repo. I looked at Fenix nightly crash reports where build id > 20220304000000 that have an upload_file_minidump:

All three of those have a CrashType annotation and the upload_file_minidump doesn't end with the multipart part.

But then I remembered that for Fenix, the build id is the geckoview build id and not the product build id. The application build id is inscrutable. There isn't a way to get a list of application build ids for Fenix nightly and know when the builds happened.

I did a supersearchfacets and I expected to see EmptyMinidump to jump back up and HeaderMismatch to drop, but that hasn't happened:

$ supersearchfacet --start-date='2022-03-01' --end-date='2022-03-08' --_facets=signature \
    --signature='EMPTY' --product=Fenix --period=daily --format=markdown
date EMPTY: no crashing thread identified EMPTY: no crashing thread identified; EmptyMinidump EMPTY: no crashing thread identified; HeaderMismatch EMPTY: no crashing thread identified; MissingSystemInfo EMPTY: no crashing thread identified; MissingThreadList EMPTY: no crashing thread identified; unknown error
2022-03-01 00:00:00 955 5308 5 9 31 35
2022-03-02 00:00:00 785 3060 1800 2 24 29
2022-03-03 00:00:00 888 0 4888 3 42 28
2022-03-04 00:00:00 641 6 4674 2 29 33
2022-03-05 00:00:00 792 8 5165 10 63 43
2022-03-06 00:00:00 677 14 5765 2 35 133
2022-03-07 00:00:00 925 9 6038 7 42 153
2022-03-08 00:00:00 484 7 3320 1 42 94

I can't tell by looking at application build ids whether there's been enough uptake on Fenix nightly builds after 3/4/2022 that have the fix to show a change in numbers. I think I'm going to let it go a week and see what happens.

Flags: needinfo?(willkg)
See Also: → 1757938

In https://bugzilla.mozilla.org/show_bug.cgi?id=1757938#c3 Kevin says:

That was fixed about a week ago. Looking at the recent 99.0a1 and 100.0a1 data this is no longer happening. So ideally the crash data is being processed out into the respective crashes.

Given that, I'm going to mark this as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(willkg)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: