Closed Bug 1459404 Opened 6 years ago Closed 6 years ago

Memory leak (via ChildAPIManager) when asynchronous extension API is used

Categories

(WebExtensions :: Storage, defect, P1)

60 Branch
defect

Tracking

(firefox61 wontfix, firefox62 verified, firefox63 verified)

VERIFIED FIXED
mozilla63
Tracking Status
firefox61 --- wontfix
firefox62 --- verified
firefox63 --- verified

People

(Reporter: robwu, Assigned: robwu)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(7 files)

Attached file leakjsonSandbox.zip
When calling browser.storage.sync.get (and set?), then the jsonSandbox (created via [1], [2] and [3]) is leaked when the page is closed:

   chrome.storage.sync.get({booleanValue: true}, function(items) {});

If a temporary add-on ID is used (i.e. manifest.json does not have applications.gecko.id), then the content script sandbox is leaked too, but only if a property of the resolved storage object is accessed:

  chrome.storage.sync.get({booleanValue: true}, function(items) { void items.booleanValue; });
  chrome.storage.sync.get({booleanValue: true}, function(items) {});
  // ^ The last does not leak the content script sandbox, the first does.

STR.
1. Put page.html (in leakjsonSandbox.zip) on a server, e.g. at localhost
2. Load the extension (leakjsonSandbox.zip).
3. Open about:memory
4. Open page.html
5. Follow the instructions at page.html (basically create 10 frames, then remove those frames).
   In about:memory, the leaked sandboxes can be found by searching for "leaky, moz".

To reproduce the content script sandbox leakage, unzip the extension, edit manifest.json and remove the "applications" section.

Reproduced in Firefox 59.0.2 and Firefox Nightly 20180504220115


[1]: https://searchfox.org/mozilla-central/rev/c0d81882c7941c4ff13a50603e37095cdab0d1ea/toolkit/components/extensions/ExtensionCommon.jsm#259-268
[2]: https://searchfox.org/mozilla-central/rev/53afcfdbabed96883126d0ebbcac499b358e32f2/toolkit/components/extensions/child/ext-storage.js#66-81
[3]: https://searchfox.org/mozilla-central/rev/53afcfdbabed96883126d0ebbcac499b358e32f2/toolkit/components/extensions/child/ext-storage.js#119-121
Memory report from about:debugging, generated on Firefox Nightly 20180504220115 on Linux as follows:

1. Unzip the attachment from https://bugzilla.mozilla.org/show_bug.cgi?id=1459404
2. Start a server at port 3000 at the directory of step 1.
3. web-ext run -f firefox-nightly -u about:memory -u http://localhost:3000/page.html
4. Click on "Leak me"
5. Click on "Clean me".
6. Switch to "about:memory" and click on "Minimize memory usage"
7. Click on "Measure and save"

See attachment for the result. If you load it at about:memory and use Ctrl-F "leaky, moz", then you will find 10 leaked compartments.
CC logs for the process containing localhost:3000/page.html, after running the STR from comment 1 and clicking "Save verbose" at about:memory.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P1]
I've found the root cause, and will fix it as part of bug 1462121.

When the frames in the demo are unloaded, the pagehide event is triggered:
https://searchfox.org/mozilla-central/rev/00dd116638001772fe354b081353b73f1cad405d/toolkit/components/extensions/ExtensionCommon.jsm#164-173,176

The "// Put this off until the next tick." logic does not work, because the JS stack is empty when pagehide is triggered, and therefore immediately after dispatching "pagehide" the microtask is run, causing this.active to be set to false.

Since the frame is being removed, the inner window is eventually destroyed, which triggers the close handler:
https://searchfox.org/mozilla-central/rev/00dd116638001772fe354b081353b73f1cad405d/toolkit/components/extensions/ExtensionCommon.jsm#178,181-184
Since this.active was set to false before, the pageshow/pagehide listeners aren't removed.

When I suppress the pagehide handler in ExtensionCommon.jsm, then the leak does not occur.
Assignee: nobody → rob
Status: NEW → ASSIGNED
Depends on: 1462121
To show that the above analysis is correct, modify the test case (leakjsonSandbox.zip) as follows:

Edit manifest.json and change:
        "run_at": "document_start"
to
        "run_at": "document_end"

Edit page.html, and insert this JS snippet (e.g. before </script>):

  addEventListener("pagehide", (event) => {
    event.stopImmediatePropagation();
  });

Then run the STR from comment 0 (or comment 1).
When I search for "leaky," in about:memory, then I don't get any results, which means that the memory leak does not happen if the pagehide logic is suppressed: https://searchfox.org/mozilla-central/rev/00dd116638001772fe354b081353b73f1cad405d/toolkit/components/extensions/ExtensionCommon.jsm#164-173
Product: Toolkit → WebExtensions
This turns out to be unrelated to bug 1462121.
The test case that I used in comment 4 is bogus; Locally I changed the name to something else than "leaky", and then it is not surprising that "leaky," doesn't appear in about:memory any more.

I've found that the leak is caused by the unbalanced event listener in https://searchfox.org/mozilla-central/rev/6f86cc3479f80ace97f62634e2c82a483d1ede40/toolkit/components/extensions/ExtensionChild.jsm#883-884

When both of these listeners are unregistered at context closure, the object is GC'd.
I'm currently developing a patch with regression tests.
No longer depends on: 1462121
Explanation of bug for reviewers:

Every instance of BaseContext lazily initializes a ChildAPIManager when an extension API is accessed ("browser" / "chrome"):
https://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionContent.jsm#663,669
https://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionPageChild.jsm#271,277
https://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionPageChild.jsm#322,328

In the constructor of ChildAPIManager, event listeners are registered (on a message manager) but never unregistered:
https://searchfox.org/mozilla-central/rev/6f86cc3479f80ace97f62634e2c82a483d1ede40/toolkit/components/extensions/ExtensionChild.jsm#883-884

Because the ChildAPIManager cannot be GC'd, and the original context is reachable via the "context" member of the ChildAPIManager, the context is kept alive, and subsequently also any other member of the context, such as the jsonSandbox from the top of this bug.

This bug results in wasted memory that can only be reclaimed when the tab is closed (likely when the message manager is destroyed).
Priority: -- → P1
Summary: Memory leak when storage.sync API is used → Memory leak (via ChildAPIManager) when asynchronous extension API is used
Comment on attachment 8992686 [details]
Bug 1459404 - Ensure that ChildAPIManager can be garbage-collected.

https://reviewboard.mozilla.org/r/257544/#review264472

::: toolkit/components/extensions/test/xpcshell/test_ext_contexts_gc.js:31
(Diff revision 1)
> +  });
> +}
> +
> +async function assertContextReleased(contentPage, description) {
> +  await contentPage.spawn(description, async assertionDescription => {
> +    // Force GC, see https://searchfox.org/mozilla-central/rev/b0275bc977ad7fda615ef34b822bba938f2b16fd/testing/talos/talos/tests/devtools/addon/content/damp.js#84-98

please also reference the explanation directly since you have to dig through some files, and a non-perma-link to find the reference.  bug 610166#c12
Attachment #8992686 - Flags: review?(mixedpuppy) → review+
Comment on attachment 8992687 [details]
Bug 1459404 - Add regression test for GC of ExtensionPageContextChild

https://reviewboard.mozilla.org/r/257546/#review264476

I suppose its fine, but splitting this out was unecessary.
Attachment #8992687 - Flags: review?(mixedpuppy) → review+
I added a separate patch for the comment change so that there is a single commit with a useful commit message that links the two comments in the code together (good for blame).

Also, I decided to separate the first two patches because the second test-only patch feels a bit less reliable than the one in the first test due to the artificial wait with a magic number.

I will ask for an uplift of the first patch only, which has the fix and a regression test.
Comment on attachment 8992751 [details]
Bug 1459404 - Add permalink of explanation to garbageCollection logic

https://reviewboard.mozilla.org/r/257608/#review264478

Just a comment update, so I'll skip passing this to a peer for that section.
Attachment #8992751 - Flags: review?(mixedpuppy) → review+
Pushed by rob@robwu.nl:
https://hg.mozilla.org/integration/autoland/rev/0f5d5144aeee
Ensure that ChildAPIManager can be garbage-collected. r=mixedpuppy
https://hg.mozilla.org/integration/autoland/rev/333c6bee4e9c
Add regression test for GC of ExtensionPageContextChild r=mixedpuppy
https://hg.mozilla.org/integration/autoland/rev/0cf499a0fee9
Add permalink of explanation to garbageCollection logic r=mixedpuppy
In all cases (Windows), the tests themselves do pass, but the test fail anyway because the test runner fails at removing the temporary test extension:

E.g. https://treeherder.mozilla.org/logviewer.html#?job_id=188705841&repo=autoland&lineNumber=13335-13348
09:43:37     INFO -  (xpcshell/head.js) | test test_ExtensionPageContextChild_in_toplevel finished (2)
09:43:37     INFO -  "CONSOLE_MESSAGE: (error) [JavaScript Error: "[Exception... "Component returned failure code: 0x80520015 (NS_ERROR_FILE_ACCESS_DENIED) [nsIFile.remove]"  nsresult: "0x80520015 (NS_ERROR_FILE_ACCESS_DENIED)"  location: "JS frame :: resource://gre/modules/Extension.jsm :: cleanupGeneratedFile/< :: line 1815"  data: no]"]
09:43:37     INFO -  cleanupGeneratedFile/<@resource://gre/modules/Extension.jsm:1815:7
09:43:37     INFO -  _do_main@Z:\\task_1531904253\\build\\tests\\xpcshell\\head.js:219:3
09:43:37     INFO -  _execute_test@Z:\\task_1531904253\\build\\tests\\xpcshell\\head.js:533:5
09:43:37     INFO -  @-e:1:1
09:43:37     INFO -  "
09:43:37     INFO -  (xpcshell/head.js) | test run_next_test 5 finished (1)
09:43:37     INFO -  exiting test
09:43:37     INFO -  Error: Found unexpected files in temporary directory: generated-extension.xpi at resource://testing-common/AddonTestUtils.jsm:408
09:43:37     INFO -  init/<@resource://testing-common/AddonTestUtils.jsm:408:15
09:43:37     INFO -  _execute_test/<@Z:\task_1531904253\build\tests\xpcshell\head.js:597:28
09:43:37     INFO -  async*_execute_test@Z:\task_1531904253\build\tests\xpcshell\head.js:594:4
09:43:37     INFO -  @-e:1:1

Bah :(
Flags: needinfo?(rob)
This looks unexpected:

PID 8852 | --DOMWINDOW == 6 (0967C400) [pid = 4292] [serial = 11] [outer = 00000000] [url = about:neterror?e=fileNotFound&u=moz-extension%3A//f342662a-dbd8-4a43-9158-e264c64e7975/iframe.html&c=UTF-8&f=regular&d=Firefox%20can%E2%80%99t%20find%20the%20file%20at%20moz-extension%3A//f342662a-dbd8-4a43-9158-e264c64e7975/iframe.html.]

Apparently the last test has an unnecessary <iframe>. I'll remove it and push without further awaiting review (since this <iframe> does clearly not belong there).

diff --git a/toolkit/components/extensions/test/xpcshell/test_ext_contexts_gc.js b/toolkit/components/extensions/test/xpcshell/test_ext_contexts_gc.js
index 342496854950..7c84ab123fc6 100644
--- a/toolkit/components/extensions/test/xpcshell/test_ext_contexts_gc.js
+++ b/toolkit/components/extensions/test/xpcshell/test_ext_contexts_gc.js
@@ -180,7 +180,6 @@ add_task(async function test_ExtensionPageContextChild_in_toplevel() {
     files: {
       "toplevel.html": `
         <!DOCTYPE html><meta charset="utf8">
-        <iframe src="iframe.html"></iframe>
         <script src="script.js"></script>
       `,
       "script.js": "browser.test.sendMessage('extensionPageLoaded');",



If the test failure is fixed with this bug, then I think that we need to investigate why failure to load an extension page keeps the file locked.
Pushed by rob@robwu.nl:
https://hg.mozilla.org/integration/autoland/rev/47ec1288baef
Ensure that ChildAPIManager can be garbage-collected. r=mixedpuppy
https://hg.mozilla.org/integration/autoland/rev/26ded4c81f03
Add regression test for GC of ExtensionPageContextChild r=mixedpuppy
https://hg.mozilla.org/integration/autoland/rev/c028d93bbf3f
Add permalink of explanation to garbageCollection logic r=mixedpuppy
https://hg.mozilla.org/mozilla-central/rev/47ec1288baef
https://hg.mozilla.org/mozilla-central/rev/26ded4c81f03
https://hg.mozilla.org/mozilla-central/rev/c028d93bbf3f
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
I have measured the impact of the memory leak. The impact of the leak is 14 kB per frame/document (until the tab is closed).

180 kB leaked per frame without the patch.
166 kB leaked per frame with patch.
138 kB leaked per frame when the "browser" extension API object is not lazily initialized (regardless of patch) (so we might have another memleak).


Details:

For each scenario I did the following:
- Open 200 frames (by clicking on a button in a test page).
- Scenario-specific:
  * "nonexisting" property access: access the "nonexisting" property in the global namespace.
  * "browser" property access: access the "browser" property in the global namespace (which triggers lazy initialization of the object that caused the memory leak)
- Click "Minimize memory usage" three times at about:memory
- Take a heap snapshot with the Memory tab of the Browser toolbox and note the size.
- Repeat the above five times.
- Calculate the deltas, discard the two extreme values.
- Calculate the average of the three remaining values (average of medians).

Results (unless stated otherwise, tested with unpatched/buggy Firefox Nightly build ID 20180708100828):
Open 200 frames, access global property "nonexisting" and remove all frames:
27.44 MB per 200 moz-extension:-frames
27.81 MB per 200 file:-frames (for reference, with e10s disabled)
27.58 MB per 200 moz-extension:-frames (Firefox Nightly build ID 20180719100142, with patch)

Open 200 frames, access global property "browser" and remove all frames (this reproduces the root cause of the bug):
36.01 MB per 200 moz-extension:-frames
33.12 MB per 200 moz-extension:-frames (Firefox Nightly build ID 20180719100142, with patch)
Comment on attachment 8992686 [details]
Bug 1459404 - Ensure that ChildAPIManager can be garbage-collected.

Approval Request Comment
[Feature/Bug causing the regression]: API manager of WebExtensions framework does not remove event listeners upon context destruction (e.g. at document unload).
[User impact if declined]: Memory leak in extension pages and non-extension pages with a content script, at least 14 kB per frame until the tab is closed (see https://bugzilla.mozilla.org/show_bug.cgi?id=1459404#c22 )
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No, covered by automated tests (and manually verified).
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No.
[Why is the change risky/not risky?]: The fix itself consists of two extra lines.
[String changes made/needed]: N/A
Attachment #8992686 - Flags: approval-mozilla-beta?
Note, verification usually should not be done by the developer working on the fix or the reviewer. 
In this case, given the complexity of the verifying, I'll take it as pretty good evidence this fix works!
Comment on attachment 8992686 [details]
Bug 1459404 - Ensure that ChildAPIManager can be garbage-collected.

Fix for a memory leak, let's uplift for beta 11. 
Thanks for adding new tests.
Attachment #8992686 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #25)
> Comment on attachment 8992686 [details]
> Bug 1459404 - Ensure that ChildAPIManager can be garbage-collected.
> 
> Fix for a memory leak, let's uplift for beta 11. 
> Thanks for adding new tests.

Hi, should i uplift just the approved part of this bug or wait for the other two to be approved for beta as well?
Flags: needinfo?(lhenry)
(In reply to Andreea Pavel [:apavel] from comment #26)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #25)
> > Comment on attachment 8992686 [details]
> > Bug 1459404 - Ensure that ChildAPIManager can be garbage-collected.
> > 
> > Fix for a memory leak, let's uplift for beta 11. 
> > Thanks for adding new tests.
> 
> Hi, should i uplift just the approved part of this bug or wait for the other
> two to be approved for beta as well?

Just the approved one (this features the fix and tests).
The other two are a test-only patch (adding more extensive test coverage), and a comment update, respectively.
Flags: needinfo?(lhenry)
:robwu thanks for clarifying that for me.
Attachment #8992687 - Flags: approval-mozilla-beta+
Attachment #8992751 - Flags: approval-mozilla-beta+
Attachment #8992687 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Attachment #8992751 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Blocks: 1477688
Attached image Bug1459404.png
I was able to reproduce this issue on Firefox 61.0.1 (20180713213322) under Win 7 64-bit and Mac OS X 10.13.3.

This issue is verified as fixed on Firefox 63.0a1(20180704003137) and Firefox 62.0b15 (20180806191531) under Win 7 64-bit and Mac OS X 10.13.3.

Please see the attached screenshot.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: