1373934 - Crash in mozilla::detail::HashKnownLength<T>

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is 
report bp-52121e4c-b5be-4b5d-b058-196920170617.
=============================================================
Crashing Thread (12)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::detail::HashKnownLength<unsigned char>(unsigned char const*, unsigned __int64) 	obj-firefox/dist/include/mozilla/HashFunctions.h:231
1 	xul.dll 	js::AtomizeChars<unsigned char>(JSContext*, unsigned char const*, unsigned __int64, js::PinningBehavior) 	js/src/jsatom.cpp:499
2 	xul.dll 	js::XDRAtom<1>(js::XDRState<1>*, JS::MutableHandle<JSAtom*>) 	js/src/jsatom.cpp:649
3 	xul.dll 	js::XDRScript<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::Handle<JSFunction*>, JS::MutableHandle<JSScript*>) 	js/src/jsscript.cpp:685
4 	xul.dll 	js::XDRInterpretedFunction<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::MutableHandle<JSFunction*>) 	js/src/jsfun.cpp:668
5 	xul.dll 	js::XDRScript<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::Handle<JSFunction*>, JS::MutableHandle<JSScript*>) 	js/src/jsscript.cpp:870
6 	xul.dll 	js::XDRInterpretedFunction<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::MutableHandle<JSFunction*>) 	js/src/jsfun.cpp:668
7 	xul.dll 	js::XDRScript<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::Handle<JSFunction*>, JS::MutableHandle<JSScript*>) 	js/src/jsscript.cpp:870
8 	xul.dll 	js::XDRState<1>::codeScript(JS::MutableHandle<JSScript*>) 	js/src/vm/Xdr.cpp:177
9 	xul.dll 	js::MultiScriptsDecodeTask::parse(JSContext*) 	js/src/vm/HelperThreads.cpp:492
10 	xul.dll 	js::HelperThread::handleParseWorkload(js::AutoLockHelperThreadState&) 	js/src/vm/HelperThreads.cpp:1857
11 	xul.dll 	js::HelperThread::threadLoop() 	js/src/vm/HelperThreads.cpp:2119
12 	xul.dll 	js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) 	js/src/threading/Thread.h:227
13 	ucrtbase.dll 	o__realloc_base 	
14 	kernel32.dll 	BaseThreadInitThunk 	
15 	ntdll.dll 	RtlUserThreadStart 	
16 	kernelbase.dll 	GetLegacyComposition

reports with this signature and js::AtomizeChars in the stack are regressing in firefox 55 (possibly related to 654190).

https://crash-stats.mozilla.com/search/?signature=%3Dmozilla%3A%3Adetail%3A%3AHashKnownLength%3CT%3E&proto_signature=~js%3A%3AAtomizeChars%3CT%3E&date=%3E%3D2017-03-17T12%3A36%3A49.000Z#crash-reports

Nicolas B. Pierron [:nbp]

Updated

•

7 years ago

Updated

•

7 years ago

Crash Signature: [@ mozilla::detail::HashKnownLength<T>] → [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>]

Liz Henry (:lizzard) (relman/hg->git project)

Comment 1

•

7 years ago

Shu, can you help find someone to investigate these crashes? Looking at just the [@ js::AtomizeChars<T> ] signature, this is the #11 topcrash on beta 55.

Flags: needinfo?(shu)

Shu-yu Guo [:shu]

Comment 2

•

7 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #1)
> Shu, can you help find someone to investigate these crashes? Looking at just
> the [@ js::AtomizeChars<T> ] signature, this is the #11 topcrash on beta 55.

nbp seems to know what's going on here.

Flags: needinfo?(shu) → needinfo?(nicolas.b.pierron)

Nicolas B. Pierron [:nbp]

Comment 3

•

7 years ago

(In reply to [:philipp] from comment #0)
> reports with this signature and js::AtomizeChars in the stack are regressing
> in firefox 55 (possibly related to 654190).

That a good call, I will check if I can make an XDR test case based on Bug 654190 optimization.

Flags: needinfo?(nicolas.b.pierron) → needinfo?(evilpies)

Nicolas B. Pierron [:nbp]

Comment 4

•

7 years ago

Apparently evilpie did not managed to get the parser to generate any of the integer strings while making Bug 654190.

The issue might be the we encode the bytecode after the execution, which would record the mutated state of JSOP_OBJECT object of run-once scripts.

The compartment option singleton-as-templates[1] was made to avoid this issue, by converting JSOP_OBJECT into a cloning function, in order to keep the original object un-mutated, and thus saving the unmutated JSOP_OBJECT.

What might happen, is that we are encoding an object which got a computed string property, which is formatted into an integer string, and this integer string get serialized by XDR and fails while decoding.

We should either:
 - encode the bytecode ahead of time.
 - use incremental encoding.
 - set the singleton-as-template flag on the compartments.

[1] http://searchfox.org/mozilla-central/source/js/src/jsapi.h#2499-2502

Flags: needinfo?(evilpies) → needinfo?(kmaglione+bmo)

Nicolas B. Pierron [:nbp]

Comment 5

•

7 years ago

This signature spiked when enabling the JS start-up bytecode cache (Bug 900784) on a small portion of nightly users, i-e without the MultiScriptsDecodeTask. Also, this happens both on the main thread as well as on the helper thread.

Blocks: js-startup-cache

Nicolas B. Pierron [:nbp]

Comment 6

•

7 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #5)
> This signature spiked when enabling the JS start-up bytecode cache (Bug
> 900784) on a small portion of nightly users, i-e without the
> MultiScriptsDecodeTask. Also, this happens both on the main thread as well
> as on the helper thread.

This does not seems to be related to the JSBC at the moment, as there is only a single report from nightly, and most of the remaining crashes are on beta.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 7

•

7 years ago

Some of these beta crashes are marked as high exploitability in crash-stats. Hiding the bug for now till someone can follow up.

Group: javascript-core-security

Kris Maglione [:kmag]

Updated

•

7 years ago

Depends on: 1382329

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Keywords: sec-moderate

Ryan VanderMeulen [:RyanVM]

Comment 8

•

7 years ago

This is fixed on trunk now I believe? Are we planning to ask for backport to 55 at this point in the cycle (one beta & RC build left)?

[:philipp]

Reporter

Comment 9

•

7 years ago

this is still ongoing in 55.0b13 after the patch from bug 1382329 has landed.

Flags: needinfo?(nicolas.b.pierron)

Nicolas B. Pierron [:nbp]

Comment 10

•

7 years ago

I will notice that HashKnownLength has many, many different callers in all these crash reports, even if most of them seems to be from AtomizeChars called from XDR.

https://crash-stats.mozilla.com/report/index/578db4d4-4316-426c-86fd-d9dbf0170731
https://crash-stats.mozilla.com/report/index/6e082df6-7abd-474c-bf5d-002d50170728

Looking at the crash addresses, it sounds like either we provide a bad pointer, or we provide a bad length argument.
Then we attempt to hash some random things, and SEGV at a page boundary.

These could be explained by some corrupted data in the XDR buffer, or a non-reversible encoding.

In any case, knowing that we don't know anything about this issue, and that this issue appears mostly on beta, investigating this issue will require multiple cycles.  I will recommend to ask :kmag to disable this feature for the last beta, and start investigating these issues in 56.0b*.

Flags: needinfo?(nicolas.b.pierron)

Mike Taylor [:miketaylr]

Comment 11

•

7 years ago

Julien, is it too late to disable since we're in RC-land? (See Comment 10)

Flags: needinfo?(jcristau)

Julien Cristau [:jcristau] (back April 22)

Comment 12

•

7 years ago

What feature would we disable?  The last beta for 55 was a week ago...

Flags: needinfo?(jcristau) → needinfo?(nicolas.b.pierron)

Nicolas B. Pierron [:nbp]

Comment 13

•

7 years ago

(In reply to Julien Cristau [:jcristau] (away until Aug 21) from comment #12)
> What feature would we disable?  The last beta for 55 was a week ago...

I have strictly no idea, and I do not even know if there is a feature flag.
The feature is the MultiScript decoder which is currently used for loading Firefox resource files, from what I understand.

Flags: needinfo?(nicolas.b.pierron)

Ryan VanderMeulen [:RyanVM]

Comment 14

•

7 years ago

Crashes are still happening in builds that contain the fix for bug 1382329 :(

status-firefox55: affected → wontfix

status-firefox57: --- → affected

tracking-firefox56: --- → ?

tracking-firefox57: --- → ?

No longer depends on: 1382329

Gerry Chang [:gchang]

Comment 15

•

7 years ago

Track 56+/57+ as crashes are still happening.

tracking-firefox56: ? → +

tracking-firefox57: ? → +

Gerry Chang [:gchang]

Comment 16

•

7 years ago

Given that this is sec-moderate & we are only 1 week from RC, mark 56 won't fix.

status-firefox56: affected → wontfix

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox58: --- → affected

tracking-firefox58: --- → ?

Nicolas B. Pierron [:nbp]

Comment 17

•

7 years ago

Commenting for Naveed.

Since Firefox 55 made it to release we seems to have more than ~600 crashes per day, which are at 80% start-up crashes, and which are reproduced by the same users (same install time) over and over.  So this crash has the potential of preventing users from running firefox once and for all.

Unfortunately this signature does not sounds actionable.

At the moment, our best chance for us would be to fix Bug 1401939, and hope we can get rid of these crashes too.

status-firefox57: affected → fix-optional

Priority: -- → P2

Kris Maglione [:kmag]

Comment 18

•

7 years ago

Clearing out my old needinfos.

I don't really have anything to add to comment 17. This is mostly blocked on bug 1402167 at this point.

Flags: needinfo?(kmaglione+bmo)

Gerry Chang [:gchang]

Comment 19

•

7 years ago

Track 58- as there are no actions at this point. Feel free to nominate again if it's getting worse.

tracking-firefox58: ? → -

Ted Campbell [:tcampbell]

Assignee

Comment 20

•

7 years ago

Assigning this to myself because it seems similar to XDR-related corruption I'm investigating.

Assignee: nobody → tcampbell

Depends on: 1418894

Ted Campbell [:tcampbell]

Assignee

Comment 21

•

7 years ago

Some of these crashes remain after Bug 1418894. The crash is due to a bad length field from XDR, causing use to keep reading string data and walk off the end of the XDR buffer triggering a fault.

If this crash rate is a problem, we can add further mitigations to XDRBuffer::read / XDRState::codeBytes to check bounds before calling memcpy.

Bug 1423616 is the work to address root-causes.

Depends on: 1423616

Ted Campbell [:tcampbell]

Assignee

Updated

•

7 years ago

Crash Signature: [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] → [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] [@ JSAtom* js::AtomizeChars<T>]

Ted Campbell [:tcampbell]

Assignee

Updated

•

7 years ago

Crash Signature: [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] [@ JSAtom* js::AtomizeChars<T>] → [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] [@ JSAtom* js::AtomizeChars<T>] [@ js::XDRAtom<T>]

[:philipp]

Reporter

Comment 24

•

6 years ago

the volume of these signatures has significantly gone down in 59.0b12 - presumably with the help of bug 1438645 \o/

Ted Campbell [:tcampbell]

Assignee

Comment 25

•

6 years ago

I'm curious how any of these are still getting through..

e.g. https://crash-stats.mozilla.com/report/index/9a5e9883-f192-4f14-b58b-fb2730180304

Ted Campbell [:tcampbell]

Assignee

Comment 26

•

6 years ago

In the remaining crashes, the XDR buffer is valid and in range but we are definitely sharing someone's memory.

In the above crash, the memory map is as follows:

// AllocationBase, BaseAddress, RegionSize, State+Protect
00000c9b0000 00000c9b0000 (000fd000) (RESERVED+RW)
             00000caad000 (00001000) (COMMITED+RW+PAGE_GUARD)
             00000caae000 (00002000) (COMMITED+RW)
00000cab0000 00000cab0000 (00484000) (COMMITED+RO)

The XDR buffer is 0x0ccad7e4..0x0ccb3870, and the data we are about to XDR is in range. This seems to cross from one VirtualAlloc chunk into another. The violation we hit is a page protect rather than unmapped pages we saw in the majority of these crashes (that is fixed).

Ted Campbell [:tcampbell]

Assignee

Comment 27

•

6 years ago

Oops.. I read that wrong. Those first three pagemap entries are unrelated. This is all contained in the last one. What is odd that the AtomizeChars call is giving a ptr of 0x0ccaffff (which should also be readonly), but the fault happens on the second byte 0x0ccb0000.

Ted Campbell [:tcampbell]

Assignee

Comment 28

•

6 years ago

Actually, these are all disk faults during OS page fault handling. The user has a bad disk so when the OS tries to page in an mmap page on-demand from the preload cache disk file, they trigger a fault and the browser goes down. Unfortunately they will keep crashing if the disk sector is bad.

Ted Campbell [:tcampbell]

Assignee

Comment 29

•

6 years ago

Bug 1422087 will purge startup cache after crashes which should remove these remaining failures.

Depends on: 1422087

Nicolas B. Pierron [:nbp]

Comment 30

•

6 years ago

Just to confirm the disk failure hypothesis, could we have a "commitXDRBufferPages" functions, which just force the pages to be loaded in memory before resuming within XDR decoding?

If this hypothesis is correct, all the signature should move to this commitXDRBufferPages function.

Then we could potentially catch it with some RAII only used within this commitXDRBufferPages, and a SEGV handler to fallback nicely to loading the missing files if we are unable to load everything from the XDR file.

Ted Campbell [:tcampbell]

Assignee

Comment 31

•

6 years ago

I think the EXCEPTION_IN_PAGE_ERROR_READ / STATUS_DEVICE_DATA_ERROR error is pretty clear that this is happening. The STATUS_DEVICE_DATA_ERROR is the file system error that Breakpad extracted (See [1]).

What you propose sounds a little to heavyweight. I'm in favor of Bug 1422087 purging the startup cache so that the user doesn't get into a loop of these crashes. That will probably reduce crash rate. If we are still care after, we should approach this as a general problem across browser for anyone who mmaps files. It might be argued that once hitting disk corruption like this, some other important profile I/O will fail too if we didn't crash here.

Query [2] shows there are a few other sources that run into these crashes too.

[1] https://searchfox.org/mozilla-central/rev/bffd3e0225b65943364be721881470590b9377c1/toolkit/crashreporter/google-breakpad/src/processor/minidump_processor.cc#1089-1097
[2] https://crash-stats.mozilla.com/search/?reason=~EXCEPTION_IN_PAGE_ERROR_READ&product=Firefox&date=%3E%3D2018-02-27T04%3A18%3A43.000Z&date=%3C2018-03-06T04%3A18%3A43.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Ted Campbell [:tcampbell]

Assignee

Comment 32

•

6 years ago

Remaining crashes are disk problems. Tracking that general concern in Bug 1444442.

We can close out this security bug now :)

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox57: fix-optional → wontfix

status-firefox58: affected → wontfix

status-firefox59: --- → fixed

status-firefox60: --- → fixed

Target Milestone: --- → mozilla60

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Whiteboard: [adv-main59+]

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: javascript-core-security → core-security-release

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release