Open Bug 803174 Opened 12 years ago Updated 5 months ago

Dispatch() should be infallible by default

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

People

(Reporter: ekr, Unassigned)

Details

Attachments

(1 file, 1 obsolete file)

If I am reading the code correctly, Dispatch() can fail. For instance,
if nsEventQueue::PutEvent() calls NewPage() and the internal calloc
runs out of memory.

A quick skim of the code finds a number of places where Dispatch is used
without error checking. I doubt most of these places really don't care if
the dispatch works or not.

Perhaps Dispatch should require a flag if you want fallible behavior.
(Added a semi-random set of CC's).

We'd probably make an alternate signature ('true' in an arg list isn't very obvious) like DispatchFallible().

Also note that a ton of things indirectly hit this via NS_DispatchToMainThread(), which returns an error that is often not checked.  See dom/devicestorage for example.

A related issue would be fallibility of PR_NewThread().  (Almost) all uses of PR_NewThread() check the result - great!  However, I suspect quite a few if not most of those leave the system in a moderately to severely borked state if PR_NewThread fails.
I agree, and I think that we probably don't even need a fallible version. Let's just make it infallible and see whether that breaks anything. I'd be happy to review the patch.
I assume we would keep the nsresult return value but just have it always return NS_OK?
Yes.
Assignee: nobody → ekr
Comment on attachment 673239 [details] [diff] [review]
Dispatch should be infallible by default

Review of attachment 673239 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Assertions.h
@@ +189,5 @@
> +
> +/* MOZ_CRASH_IF_FAILED(_expr) calls MOZ_CRASH() if NS_FAILED(_expr)  */
> +#define MOZ_CRASH_IF_FAILED(_expr) \
> +  do { \
> +    if (!NS_FAILED(_expr)) { \

Is the ! here just a copy/paste mistake?
Comment on attachment 673239 [details] [diff] [review]
Dispatch should be infallible by default

Review of attachment 673239 [details] [diff] [review]:
-----------------------------------------------------------------

Yes. And I'm going to be very upset if the tests don't catch it!
How are we going to deal with mEventsAreDoomed if we make nsThread::Dispatch infallible?
Ekr is going to post his Try run here showing some Doomed failures.  Most are from a couple of easily-fixed cases where I think it expects to hit Doomed occasionally, so we can provide a function that returns an error on Doomed for anyone who wants that.  There are a couple of others that are more worrisome, but likely the caller really didn't care if it succeeded.

A reasonable approach for deploying this may be to fix the ones shown by Try, then land temporarily on Nightly. IF we get any spike in crashes back out and fix them, and maybe wait to land it right after 20 opens and take the pain then.  (We can bounce a few times then if needed.)

Alternative is to force everyone using Dispatch to opt in or opt out of errors - change Dispatch() to not return an error, then work through all the callers to decide if they want to stick with infallible Dispatch or move to DispatchDoomed :-) (or DispatchFallible, or DispatchToVolatileThread, ...)  This is more work (for a bunch of people), but produces a 'cleaner' result.
Here is the relevant try push:

https://tbpl.mozilla.org/?tree=Try&rev=9d71ce5f9dfe
Attachment #673239 - Attachment is obsolete: true
Comment on attachment 673512 [details] [diff] [review]
Dispatch should be infallible by default

Review of attachment 673512 [details] [diff] [review]:
-----------------------------------------------------------------

This is a first draft

Bsmedberg: any comments on this approach and about how to handle the test crashes this creates (see Jesup's comment).
Attachment #673512 - Flags: feedback?(benjamin)
Can somebody summarize the known failures from try? My general reaction is "clients shuold be designed so that they never end up posting events to dead threads", but perhaps there is some reason that isn't possible.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> Can somebody summarize the known failures from try? My general reaction is
> "clients shuold be designed so that they never end up posting events to dead
> threads", but perhaps there is some reason that isn't possible.

Things can definitely still call NS_DispatchToMainThread after the main thread has long since disappeared.
Attachment #673512 - Flags: feedback?(benjamin) → feedback+
That should really not ever be the case. It is a logic error to dispatch any events after the xpcom-shutdown-threads notifications has been dispatched, and all clients which dispatch events should watch for that or an earlier shutdown notification and cease activity by that point.
NS_DispatchToMainThread: That's a somewhat different issue, though one part of it might be the same (MainThread is 'doomed' but we dispatch to it anyways).  Certainly there's plenty of stuff that don't bother checking the result of that call!  (And often wouldn't have anything sensible to do it it failed.)  

You could: 
a) exit if someone does that; 
b) Let them think it succeeded even it it actually will fail; if they asked MainThread to do something it won't happen.
c) return errors for 'doomed' MainThread or non-existant MainThread with failures in Dispatch for allocations being fatal.  This means NS_DispatchToMainThread() is still fallible and needs to be error-checked.

Do callers who send to mainthread after it goes away handle the error and expect to do something sane with it?  How wide is the set of callers who can get bitten by MainThread?

[ revision based on bsmedberg's comment: ]

I'm good with trying to ferret out all those cases, but I'm guessing there are a fair number where they aren't already listening to shutdown events, and it's hard for them to do so without significant recoding.  But we can simply see, especially if we can have a good way to track them down (and I'm worried about that one).

Also, I'll note that release builds exit() before getting to xpcom-shutdown-threads!  So it's more an issue for our testing infrastructure to avoid spurious oranges/leaks.
Isn't that what NS_DispatchTo* checking and returning errors effectively does?  ISTR from working on bug 715376 that post-xpcom shutdown dispatches were exactly the sort of thing those checks catch.

We can try to make those complain more loudly, but I don't think fixing the fallout would be pretty.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> Can somebody summarize the known failures from try? My general reaction is
> "clients shuold be designed so that they never end up posting events to dead
> threads", but perhaps there is some reason that isn't possible.

There are a number in nsSocketTransportService.  I suspect this actually expects 'doomed' to work and return an error:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16272442&tree=Try

This is another "Doomed" thread:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16271471&tree=Try
but it's hard from the log to know who sent it (other than it was a sync request).  However, I'll note that the current code doesn't check the result of Dispatch() (for sync returns) when inside of a sync Dispatch(), and this error implies the *caller* had gone away, not the callee.

And there's one in LazyIdleThread::ShutdownThread()

This is all from looking at the oranges in the Try run above.  Another try run might ferret out a couple more as they're likely timing issues, but I'm sure there are ones not hit in try!
(In reply to Nathan Froyd (:froydnj) from comment #17)
> Isn't that what NS_DispatchTo* checking and returning errors effectively
> does?  ISTR from working on bug 715376 that post-xpcom shutdown dispatches
> were exactly the sort of thing those checks catch.

Yes, pretty much as far as 'doomed' threads go.  Though per the exit() comment above, this is all about producing 'clean' shutdowns for measuring leaks and the like, not that that's a bad thing.

> We can try to make those complain more loudly, but I don't think fixing the
> fallout would be pretty.

It may not...

The advantage of infallible Dispatch() is a) we don't have to write error-handling code (which never gets tested!!!)  b) we don't once-in-a-blue-moon test the error-handling code in the field, for the first time c) this can noticeably simplify designs due to not needing to be concerned with the ripple effect of the error paths, shutdown watchers, etc.
(In reply to Randell Jesup [:jesup] from comment #19)

> The advantage of infallible Dispatch() is a) we don't have to write
> error-handling code (which never gets tested!!!)  b) we don't
> once-in-a-blue-moon test the error-handling code in the field, for the first
> time c) this can noticeably simplify designs due to not needing to be
> concerned with the ripple effect of the error paths, shutdown watchers, etc.

Corollary: we don't have to worry about someone provoking an untested error path and exploiting it for a security breach.

+dveditz
Severity: normal → S3

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: ekr → nobody
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: