Embedding leaks make it harder to assert shutdown correctness
Categories
(Core :: JavaScript: GC, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: jonco, Assigned: jonco)
References
Details
Attachments
(1 file, 1 obsolete file)
Sometimes the embedding still has live gray roots at shutdown because of a leak somewhere in the system. If this happens that we don't free everything and we get assertion failures for assertions that that everything is cleaned up at shutdown. This leads to frequent bug reports (e.g. bug 1407505, bug 1402473). The reason that we don't just clear all gray roots at shutdown like we do for internal roots is that this would cause some leaks to not get reported. However these assertion failures also hide the real issue. Ideally I think we should report leaks that involve JS sooner, and clear the roots on shutdown.
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
Andrew, I think I've mentioned this to you before. Do you think this is feasible and if so what's the best way to proceed? I don't really know how our leak checking works at the moment.
Comment 2•7 years ago
|
||
Here's how XPCOM leak checking works: - Classes opt into it by calling MOZ_COUNT_CTOR in their ctor and MOZ_COUNT_DTOR in their dtor, with a class name. (There's a slightly different mechanism for refcounted objects, but it doesn't matter for the present discussion.) nsTraceRefcnt.cpp keeps a table of the number of extant instances of each class. - Very late in shutdown, nsTraceRefcnt::Shutdown() dumps out the table of various classes it saw, how many it saw, and how many remain. There is one file per process, with the pid in the file name. - testing/mochitest/leaks.py is a script run by the test harness that checks to make sure there's a leak log for every process that we created, and it parses the table and prints out an error for every class where we had more than one instance remaining at shutdown. In theory, I think your proposal is reasonable. You'd have to make sure that the test harness registers it as an error if there's a leak, and you'd want it to be non-fatal, so we can do the rest of the checking we do at shut down. There's some loss from not giving information about what exactly leaked, but I think that in practice if you are leaking JS you're going to get a completely useless list of leaking objects anyways. In practice, you may need to debug a wide variety of "leaks" in your shutdown detector that we don't see with the XPCOM leak detector. The problem I am imagining is that some random place roots JS objects, but clears them out at some point in shutdown after when you want to add your check. But maybe it will be okay. Some of these leaks may be actual issues, so the work spent on debugging them won't be wasted, but some will likely be silly issues that you just have to fix to make your leak detector work.
Comment 3•7 years ago
|
||
Are there any next steps here? I am happy to disable the talos job on code coverage builds that is perma/failing to reduce the pressure to fix this bug.
Updated•7 years ago
|
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #2) Andrew, thanks for the comments, that's really useful. Joel, I sill want us to fix this. But do disable the talos job if it's causing a lot of noise.
Comment 5•7 years ago
|
||
The bug 1408403 is still failing even after Joel disabled g2 job on ccov builds. :jonco Any idea when this bug is going to be fixed?
Comment 6•7 years ago
|
||
:jonco any updates on this?
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Andrei Ciure[:andrei_ciure_] from comment #6) I'll work on this next.
Assignee | ||
Comment 8•7 years ago
|
||
Here's what I have so far: XPCOM logs any roots left at shutdown in a similar way to the existing leak tracing and then clears them. I extended the existing leak checking scripts to process these logs too, again in a very similar way. The JS engine now asserts that there are no roots left at shutdown. To test this I wrote a patch to force a leak of a JSString (via a Value in a ModuleScript), which produced the following output: TEST-INFO | Main app process: exit 0 runtests.py | Application ran for: 0:00:43.195848 zombiecheck | Reading PID log: /tmp/tmphtVpxxpidlog ==> process 24368 launched child process 24396 zombiecheck | Checking for orphan process with PID: 24396 Stopping web server Stopping web socket server Stopping ssltunnel TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes TEST-INFO | leakcheck | tab process: leak threshold set at 0 bytes TEST-INFO | leakcheck | geckomediaplugin process: leak threshold set at 20000 bytes TEST-INFO | leakcheck | gpu process: leak threshold set at 0 bytes == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 24368 |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| | | Per-Inst Leaked| Total Rem| 0 |TOTAL | 36 56| 1014320 1| 398 |ModuleScript | 56 56| 1 1| nsTraceRefcnt::DumpStatistics: 1394 entries TEST-INFO | leakcheck | default process: leaked 1 ModuleScript 19 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: 56 bytes leaked (ModuleScript) 20 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: found leak of 1 JS GC things String: 1 runtests.py | Running tests: end. Buffered messages finished SUITE-END | took 44s Try is looking good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c6d68bef5640d2ffe5be0ab8cd833ce4023529d&group_state=expanded How does this look to you?
Comment 9•7 years ago
|
||
Comment on attachment 8930152 [details] [diff] [review] bug1407593-check-js-leaks Review of attachment 8930152 [details] [diff] [review]: ----------------------------------------------------------------- This seems reasonable. Is there some reason to create a separate log rather than reporting the leaked objects to the leak checker through the normal means? Your loop could call MOZ_LOG_CTOR repeatedly instead. Adding an extra file seems like a lot of complexity. ::: dom/workers/RuntimeService.cpp @@ +1012,5 @@ > // The CC is shut down, and the superclass destructor will GC, so make sure > // we don't try to CC again. > mWorkerPrivate = nullptr; > + > + CycleCollectedJSRuntime::Shutdown(cx); Bug 1422316 is also filed on this missing shutdown. I don't know why it is like it is. ::: xpcom/base/nsTraceRefcnt.cpp @@ +1373,5 @@ > + aRuntime->TraceNativeGrayRoots(&tracer); > + > + AutoTraceLogLock lock; > + > + fprintf(gJSLeakLog, "Runtime %p leaked %" PRIu64 " JS GC things\n", It would be nice if this included a descriptive string, too. EG worker or main thread. I don't think we ever log the address of a runtime, so this will only be useful for somebody with a debugger.
Assignee | ||
Comment 10•5 years ago
|
||
Picking this up again as this continues to be a problem.
Using MOZ_LOG_CTOR makes things much easier, thanks.
What we want to happen is that GC things that are live at shutdown are reported but are finalized anyway. It might be nice to also clear embedding pointers to GC things at shutdown because otherwise these are left dangling pointing to freed memory.
My current issue is that finalizing preserved wrappers causes an assertion in the wrapper cache destructor.
Assignee | ||
Comment 11•5 years ago
|
||
Patch to report JS GC things that are live at shutdown as leaks by calling MOZ_LOG_CTOR for them. The JS engine now asserts there are no roots, and the CC clears any remaining JS holders (which will already have been reported). I used Unlink() for this because otherwise the wrapper cache asserts if we try and clear the contents of a preserved wrapper from a leaked GC thing's finalizer.
The only fly in the ointment is the test_ext_native_messaging.js xpcshell test, because this patch causes it to fail with an assertion here:
https://searchfox.org/mozilla-central/source/dom/events/EventListenerManager.cpp#138
I'm assuming this is connected to the fact that it leaks (bug 1570666).
Updated•5 years ago
|
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f2d3df5525bd Report embedding leaks of JS GC things r=sfink,sfink?
Assignee | ||
Comment 13•5 years ago
|
||
Lando seems to have messed up the commit messaged. Filed bug 1573398.
Assignee | ||
Updated•5 years ago
|
Comment 14•5 years ago
|
||
\o/
(In reply to Jon Coppeard (:jonco) from comment #13)
Lando seems to have messed up the commit messaged. Filed bug 1573398.
I think the trailing ?
in r=foo?
confuses it. I've noticed it also doesn't remove it when there's a single reviewer.
Comment 15•5 years ago
|
||
bugherder |
Description
•