1409115 - Ghost window on Twitter and YouTube (with a broken profile?)

Assignee

Description

•

7 years ago

STR:

- Open Twitter in my default profile (it does not seem to work when I'm logged into Twitter on a clean profile)
- Click on one of the "cards" of an individual Tweet. I clicked on an image generated from a Tweet containing an image.
- Close the tab.
- Go to about:memory, click on "minimize memory usage" a few times, then click on "measure".

In the resulting memory report, there should be no windows for Twitter still alive, but it seems like there are. Eventually, these turn into ghost windows.

This may be the same issue as bug 1398652, but I'll file it separately because that may be something else, and because it is marked MoCo confidential now.

Andrew McCreight [:mccr8]

Assignee

Comment 1

•

7 years ago

I bisected this, and got this regression range containing bug 1290481:
  https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=7f1714325cb3e40427e40031a2191fe20cdc3241&tochange=6aa1cc819058deaa13f85278e2f5970b8ced0998

[Tracking Requested - why for this release]: This is a regression that causes a bad leak on a popular site (Twitter). The leak in turn can cause severe browser lagginess.

Blocks: 1290481

status-firefox56: --- → unaffected

status-firefox57: --- → affected

status-firefox58: --- → affected

tracking-firefox57: --- → ?

tracking-firefox58: --- → ?

Keywords: mlk

Whiteboard: [MemShrink]

Andrew McCreight [:mccr8]

Assignee

Comment 2

•

7 years ago

Tom, can you investigate this regression from bug 1290481? I can try to help with the leak investigation, but I'm not familiar with the quota manager.

Flags: needinfo?(ttung)

Ben Kelly [:bkelly, not reviewing]

Comment 3

•

7 years ago

I'm kind of skeptical it could be caused by bug 1290481, to be honest.  Almost all the changes in that bug are in code that runs in the parent process or on IO threads.  It has very little interaction with any DOM objects.

Andrew McCreight [:mccr8]

Assignee

Comment 4

•

7 years ago

Yeah, I'll retry it a few times.

I skimmed through the patches, and this is the only particularly suspicious thing:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/8c60914e10e9#l3.235
That adds a QuotaInfo, which can contain a strong reference to an nsIFile, which I imagine could end up entraining a window, somehow.

Ben Kelly [:bkelly, not reviewing]

Comment 5

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #4)
> I skimmed through the patches, and this is the only particularly suspicious
> thing:
>   https://hg.mozilla.org/integration/mozilla-inbound/rev/8c60914e10e9#l3.235
> That adds a QuotaInfo, which can contain a strong reference to an nsIFile,
> which I imagine could end up entraining a window, somehow.

I certainly hope not.  We shouldn't be doing any nsIFile stuff on the main thread.  And that code explicitly runs in the parent process.  (Sandbox prevents opening files in the child, etc.)

Andrew McCreight [:mccr8]

Assignee

Comment 6

•

7 years ago

I bisected a second time using this command (taken from an intermediate step from my prior bisection) and got the same result:
  ./mach mozregression --profile <my profile> --good 2017-09-05 --bad 2017-09-07

This doesn't quite match up with bug 1398652, which was initially reported against a build from 20170905220108, while bug 1290481 would have first shown up a few days later.

Andrew McCreight [:mccr8]

Assignee

Comment 7

•

7 years ago

I used git bisect across the 16 patches in the series and according to that this is the regressing change set:

Bug 1290481 - P6: Upgrade QuotaManager to v3.0 for adding directory padding file to existing DOM Cache directory. r=bkelly, janv
    
MozReview-Commit-ID: KlVsaGhpABk

(I'm on Linux in case that matters.)

Ben Kelly [:bkelly, not reviewing]

Comment 8

•

7 years ago

Are you running into an issue where you are trying to downgrade the disk schema and stuff is breaking?  Did you do the mozregression with the profile being copied or recreated each time?

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Assignee

Comment 9

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #8)
> Are you running into an issue where you are trying to downgrade the disk
> schema and stuff is breaking?  Did you do the mozregression with the profile
> being copied or recreated each time?

I think mozregression creates a fresh copy of the profile for each step of the bisection. My profile is on the latest version of Nightly.

Flags: needinfo?(continuation)

Ben Kelly [:bkelly, not reviewing]

Comment 10

•

7 years ago

Ok, then I have no explanation for how that rev could possibly create this kind of leak.  Also, I have been running FF57 beta which has that rev and I don't see this leak.  Also, the fact you can't reproduce this on a clean profile suggests there is some kind of schema corruption going on.

We may need a copy of your profile to diagnose the problem.

Ben Kelly [:bkelly, not reviewing]

Comment 11

•

7 years ago

Andrew, do you have any ideas here?

Flags: needinfo?(bugmail)

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Summary: Ghost window on Twitter → Ghost window on Twitter (with a broken profile?)

Andrew Sutherland [:asuth] (he/him)

Comment 12

•

7 years ago

As I understand the operation of mozregression based on the documentation of its "--profile-persistence" option[1] and its default of "clone", given a nightly profile specified like in comment 6, we expect that every time mozregression runs a build prior to the "P6" patch, QuotaManager and IndexedDB and ServiceWorkers will be broken.  This means Twitter can't do anything fancy with IndexedDB or ServiceWorkers and has to fallback to being ephemeral.  (It can use LocalStorage and cookies, however.)

I think the take-away from that is that whatever is causing the ghost window involves IndexedDB or ServiceWorkers.  (I don't see it using the Cache API directly.)

I fear the bisection needs to be re-done using a profile that is freshly created using Firefox 56, which should eliminate the schema change issue.  The default --profile-persistence=clone should work.  (Maybe "clone-first" would work as well if that first run happens at the oldest point of the bisection and there were no schema upgrade backouts involved.)

1: https://github.com/mozilla/mozregression/blob/1256293c1c27b2e7b619827e75ddf377566b2fff/mozregression/cli.py#L144

Flags: needinfo?(bugmail)

Andrew McCreight [:mccr8]

Assignee

Comment 13

•

7 years ago

I guess that makes sense. It sounds like the regression window is not going to be accurate then, as bkelly originally suggested, so I'll clear the various flags.

No longer blocks: 1290481

status-firefox56: unaffected → ---

tracking-firefox57: ? → ---

tracking-firefox58: ? → ---

Flags: needinfo?(ttung)

Keywords: regression

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Blocks: GhostWindows

Andrew Sutherland [:asuth] (he/him)

Comment 14

•

7 years ago

Related questions would be:
- Are web notifications turned on?
- Do you notice ServiceWorkers for twitter hanging out under "/workers(twitter.com)/worker(https://twitter.com/push_service_worker.js" in the memory reports?  (Maybe also under decommitted?)

Andrew McCreight [:mccr8]

Assignee

Comment 15

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #14)
> Related questions would be:
> - Are web notifications turned on?

I don't think so.

> - Do you notice ServiceWorkers for twitter hanging out under
> "/workers(twitter.com)/worker(https://twitter.com/push_service_worker.js" in
> the memory reports?  (Maybe also under decommitted?)

No.

David Durst [:ddurst]

Comment 16

•

7 years ago

STR wfm on my nightly 58.

David Durst [:ddurst]

Comment 17

•

7 years ago

(Sorry, that was not clear: using mccr8's STR, I also got the same ghost results.)

Ben Kelly [:bkelly, not reviewing]

Comment 18

•

7 years ago

(In reply to David Durst [:ddurst] from comment #17)
> (Sorry, that was not clear: using mccr8's STR, I also got the same ghost
> results.)

What if you create a new profile?

Flags: needinfo?(ddurst)

David Durst [:ddurst]

Comment 19

•

7 years ago

I'm trying to re-create it with the same one (because in theory I should be able to).

David Durst [:ddurst]

Comment 20

•

7 years ago

And now I can't re-create it. I'm on 20171016100113, which I *think* happened in the process of restarting to clean things up,  -- but tbh, I didn't record the build ID before (I updated in 1398652 from 20170915100121, but I didn't record to what before now). :\

Flags: needinfo?(ddurst)

David Durst [:ddurst]

Comment 21

•

7 years ago

Just happened again. Only 3 tabs open (twitter, wikipedia, pitchfork). After minimizing memory usage, there's already one ghost.

Ben Kelly [:bkelly, not reviewing]

Comment 22

•

7 years ago

David, Andrew, do either of you have any cookie or storage preferences changed from the default?  For example delete cookies on exit, etc?

I noticed I had a profile that could not do SW things and its because we have disabled SW in FF57 when nsContentUtils::StorageAllowed() returns eDeny.

Maybe we have a leak somewhere when things are denied this way?

Flags: needinfo?(ddurst)

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Assignee

Comment 23

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #22)
> David, Andrew, do either of you have any cookie or storage preferences
> changed from the default?  For example delete cookies on exit, etc?

Not as far as I know. (Right now I also have ghost windows for Facebook and Reddit, FWIW...)

Flags: needinfo?(continuation)

Ben Kelly [:bkelly, not reviewing]

Comment 24

•

7 years ago

We probably need one of the problem profiles zipped up and shared with me, asuth, or janv.

Or, maybe try running in a DEBUG build and see if any useful warnings pop out from dom/cache, dom/quota, etc?

David Durst [:ddurst]

Comment 25

•

7 years ago

Not as far as I know either. FWIW, I think my most recent was a ghost for wikipedia, and not twitter.

Flags: needinfo?(ddurst)

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

7 years ago

Priority: -- → P2

Ben Kelly [:bkelly, not reviewing]

Comment 26

•

7 years ago

Hmm, maybe now that :asuth has landed his backcompat schema hack for bug 1290481 we could try:

1. Upgrade to nightly with bug 1404344.  (May not be spun yet.)
2. See if the leak reproduces when you run on a build before bug 1290481.  (This should now avoid the schema bustage with the downgrade.)
3. If so, start a new bisect with the first bad being before bug 1290481.  (You want to avoid traversing revisions between bug 1290481 and bug 1404344.

What do you think?

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Assignee

Comment 27

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #26)
> What do you think?

The problem is that I don't remember seeing ghost windows back in the start of September, so I think maybe the new factor is something in my browser state and not Firefox itself.

David Durst [:ddurst]

Comment 28

•

7 years ago

Am I correct in assuming that "Refresh Nightly" would be sufficient to clean up the profile -- and then see if this recurs?

David Durst [:ddurst]

Comment 29

•

7 years ago

Just reproduced it with a cleaned profile using build 20171016220427, and not twitter -- in this case, it was from a forum (https://www.doityourself.com/) that has a lot of ads embedded.

I can try using other builds, since this was neither a lot of windows (8) and regular use. Should be able to reproduce in just a few days, but I can't start until later this week.

Summary: Ghost window on Twitter (with a broken profile?) → Ghost windows

Andrew McCreight [:mccr8]

Assignee

Comment 30

•

7 years ago

(In reply to David Durst [:ddurst] from comment #29)
> Just reproduced it with a cleaned profile using build 20171016220427, and
> not twitter -- in this case, it was from a forum
> (https://www.doityourself.com/) that has a lot of ads embedded.

Did that happen when the tab for the forum was still open or after you closed it?

David Durst [:ddurst]

Comment 31

•

7 years ago

Still open.

Andrew McCreight [:mccr8]

Assignee

Comment 32

•

7 years ago

(In reply to David Durst [:ddurst] from comment #31)
> Still open.

That's not really a leaking window then, most likely. Probably the page removed the iframe but it is still holding it alive through JS. It is a false positive of our current ghost window heuristic. See bug 1360310.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Flags: needinfo?(continuation)

David Durst [:ddurst]

Comment 33

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #32)
> (In reply to David Durst [:ddurst] from comment #31)
> > Still open.
> 
> That's not really a leaking window then, most likely. Probably the page
> removed the iframe but it is still holding it alive through JS. It is a
> false positive of our current ghost window heuristic. See bug 1360310.

Aha. Gotcha. Resetting.

Summary: Ghost windows → Ghost window on Twitter (with a broken profile?)

Cosmin Muntean [:cmuntean], Ecosystem QA

Comment 35

•

7 years ago

I have tested this issue using the provided STR from comment 0, but I haven't managed to reproduce it. Tried to reproduce this on latest Nightly (58.0a1) build, on a older Nightly build from 2017-10-16 and also on latest Beta 57.0b12 build.
Tried with multiple scenarios, but every time when I have closed the Twitter tab there were no ghost window in memory report. I have used a clean profile and also an older one for testing.

Andrew, in comment 7 you said that you are on Linux, what version of Linux did you use? I have tested this on Arch Linux and also on Windows 7.

Eric Rahm [:erahm]

Updated

•

7 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Andrew McCreight [:mccr8]

Assignee

Comment 36

•

7 years ago

I'm not really sure what to do with this.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Assignee

Comment 37

•

7 years ago

I've done a little more investigation into this, on my same profile. I added some code to run the Unlink method on the ghost window. This manages to get rid of most of the DOM stuff for the tab, but the GC stuff all remains. It looks like the JS reflector for the window is being leaked via a field called mScriptOwner. Based on SearchFox, this comes from IDB.

Component: DOM → DOM: IndexedDB

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

7 years ago

Flags: needinfo?(jvarga)

Andrew McCreight [:mccr8]

Assignee

Comment 38

•

7 years ago

Ok, maybe that's not fair. There's a Promise with one unknown reference that is keeping an IDBDatabase alive:

0x7f6d091dcd60 [Promise]
    --[mPromiseObj]--> 0x7f6cf4123f40 [JS Object (Promise)]
    --[group_global]--> 0x7f6d0579a100 [JS Object (Window)]
    --[globalIndexedDBs]--> 0x7f6cf412df20 [JS Object (Object)]
    --[dm_typeahead]--> 0x7f6d0384e4c0 [JS Object (Object)]
    --[database]--> 0x7f6cfd3d8730 [JS Object (IDBDatabase)]
    --[UnwrapDOMObject(obj)]--> 0x7f6d06076120 [IDBDatabase ]

I'm not sure why the Promise is alive.

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Component: DOM: IndexedDB → DOM

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Flags: needinfo?(jvarga)

Andrew McCreight [:mccr8]

Assignee

Comment 39

•

7 years ago

Ben is also seeing a leak on Twitter, in bug 1422912.

One thing we noticed is that for both of us, if we try to enable web notifications via the Twitter settings, clicking on the "Turn on" button does nothing. On my profile on my other machine, if I enable it and then block notifications, Twitter gives me an error message. Ben's serviceworker.txt was corrupted, and when he cleared it, the leak seemed to stop happening. My serviceworker.txt file looks okay at a glance, but maybe my service workers are also in a broken state.

Comment 40

•

7 years ago

When I get a verbose CC log, I see one or two nsGlobalWindows for twitter leaking, one for the card and one for twitter itself. For the card window, it is held alive by a promise, as seen above in comment 38.

I did refcount logging for nsGlobalWindow, and looked for the missing reference. The window is not held alive through shutdown. Looking from the end, there is a ton of refcount traffic associated with the shutdown CC, and before that, there's a bunch from when I ran the memory reporter, but in the middle there is this single release, which I think must be the missing reference:

<nsGlobalWindowInner> 0x7f232e679800 4 Release 2370 [thread 0x7f23524834c0]
    #01: operator delete(void*) (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/mozalloc.h:230)
    #02: nsTArray_Impl<RefPtr<mozilla::dom::workers::ServiceWorkerJob::Callback>, nsTArrayInfallibleAllocator>::DestructRange(unsigned long, unsigned long) (/media/ssd/mc/obj-dbg.noindex/dist/include/nsTArray.h:2025 (discriminator 32))
    #03: ~nsTArray_Impl (/media/ssd/mc/obj-dbg.noindex/dist/include/nsTArray.h:887)
    #04: ~RefPtr (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/RefPtr.h:78)
    #05: operator delete(void*) (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/mozalloc.h:230)
    #06: mozilla::dom::workers::ServiceWorkerJob::Release() (/media/ssd/mc/dom/workers/ServiceWorkerJob.h:144 (discriminator 10))
    #07: nsTArray_Impl<RefPtr<mozilla::dom::workers::ServiceWorkerJob>, nsTArrayInfallibleAllocator>::DestructRange(unsigned long, unsigned long) (/media/ssd/mc/obj-dbg.noindex/dist/include/nsTArray.h:2025 (discriminator 34))
    #08: mozilla::RefPtrTraits<mozilla::dom::workers::ServiceWorkerJobQueue>::Release(mozilla::dom::workers::ServiceWorkerJobQueue*) (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/RefPtr.h:41)
    #09: mozilla::dom::workers::ServiceWorkerManager::Observe(nsISupports*, char const*, char16_t const*) (/media/ssd/mc/dom/workers/ServiceWorkerManager.cpp:3863)
    #10: non-virtual thunk to mozilla::dom::workers::ServiceWorkerManager::Observe(nsISupports*, char const*, char16_t const*) (/media/ssd/mc/dom/workers/ServiceWorkerManager.cpp:3835)
    #11: nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) (/media/ssd/mc/xpcom/ds/nsObserverList.cpp:111)
    #12: nsTHashtable<nsObserverList>::GetEntry(char const*) const (/media/ssd/mc/xpcom/ds/nsTHashtable.h:135 (discriminator 2))
    #13: mozilla::ShutdownXPCOM(nsIServiceManager*) (/media/ssd/mc/xpcom/build/XPCOMInit.cpp:865 (discriminator 2))

It looks like that on shutdown the ServiceWorkerManager purges some list of jobs, which causes an array ServiceWorkerJob::Callbacks to be destroyed. ServiceWorkerResolveWindowPromiseOnRegisterCallback is a subclass of Callback that holds a strong pointer to a window, so maybe that's what is at fault?

Component: DOM → DOM: Service Workers

Andrew McCreight [:mccr8]

Assignee

Comment 41

•

7 years ago

Attached patch Diagnostic patch. DO NOT LAND (obsolete) — Details — Splinter Review

I made this little diagnostic patch that sets a global flag when the service manger is shutting down, and then prints out a message in the ServiceWorkerResolveWindowPromiseOnRegisterCallback dtor that includes the window and the promise if we are doing that during shutdown. I confirmed that on my Twitter test case that the window matches the twitter window I am leaking, and the Promise is the same Promise that is rooting the card window.

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

status-firefox57: affected → wontfix

status-firefox59: --- → affected

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Blocks: 1420795

Andrew McCreight [:mccr8]

Assignee

Comment 42

•

7 years ago

I'm seeing this Twitter leak on my Linux machine's main profile. On my OSX machine's main profile, I can readily reproduce a YouTube ghost window, by shift-clicking to a particular YouTube video from a particular reddit page. Note that these STR don't work on my Linux machine, but they do work 100% of the time on my OSX profile! I ran my default profile in a debug build with the patch here, and I confirmed that the leaking window was also being held alive as the window field of a ServiceWorkerResolveWindowPromiseOnRegisterCallback. It is possible that bug 1420795 is the same thing.

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Summary: Ghost window on Twitter (with a broken profile?) → Ghost window on Twitter and YouTube (with a broken profile?)

Andrew McCreight [:mccr8]

Assignee

Comment 43

•

7 years ago

Also, I did a try push that fatally asserts when we leak a ServiceWorkerResolveWindowPromiseOnRegisterCallback through shutdown, and it didn't trigger, so this isn't a case that seems to be covered by our testing. https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bd087728a331bb8486bdd8e5642c31a66c17478

Ben Kelly [:bkelly, not reviewing]

Comment 44

•

7 years ago

There is some other bug causing the service worker job queue to get stuck.  We need to investigate that as well.  Fixing the leak when it gets stuck is worth doing separate, though.

Andrew McCreight [:mccr8]

Assignee

Comment 45

•

7 years ago

The way this service worker ownership works is like this: there's a per-process ServiceWorkerManager, which has a table where the values are strong references to RegistrationDataPerPrincipal. This has a table mJobQueues where the values are strong references to ServiceWorkerJobQueue. This contains an array of strong references to ServiceWorkerJob. This in turn contains various strong references to ServiceWorkerJob::Callback.

That class has a few subtypes:
a) ServiceWorkerJobQueue::Callback, which contains a strong reference back to a ServiceWorkerJobQueue.
b) ServiceWorkerResolveWindowPromiseOnRegisterCallback, which contains a strong reference to a window and to a Promise (which in turn will end up containing a strong reference to a window).
c) UnregisterJobCallback contains a strong reference to a nsIServiceWorkerUnregisterCallback, which can be implemented in JS or C++. One of the C++ implementations, UnregisterCallback, holds a strong reference to a promise.
d) UpdateJobCallback, which also has a subtype that holds a strong reference to a promise.

The keys for the various tables seem to be based on something more like a principal than a window, but they can keep alive per-window things. This whole setup seems prone to leaking, but I'm not sure how to improve it.

The strong window reference in (b) can be changed to a weak reference (nobody would expect these service worker things to be keep the window alive), but I don't know what to do about all of the promises, because those are probably only kept alive by the service worker stuff, so we can't make a weak reference to it.

At least for the case of ServiceWorkerResolveWindowPromiseOnRegisterCallback, it creates some kind of blank promise:
RefPtr<Promise> promise = Promise::Create(sgo, result);
...and then doesn't do anything until with it until it is ready to be resolved. It looks like it would be possible to keep a weak reference to the window in the callback, and then lazily create the actual promise from that window once we're ready to actually resolve it. I'll see if that's possible for the other cases.

Andrew McCreight [:mccr8]

Assignee

Comment 46

•

7 years ago

Oh I see, ServiceWorkerManager::Register() actually returns the promise. So I guess you do need to actually create the promise there. But on the other hand, I think that means it is okay to only have a weak reference to it in the callback, because presumably we don't actually need to do anything with it if content has thrown away any reference to it.

Ben Kelly [:bkelly, not reviewing]

Comment 47

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #46)
> Oh I see, ServiceWorkerManager::Register() actually returns the promise. So
> I guess you do need to actually create the promise there. But on the other
> hand, I think that means it is okay to only have a weak reference to it in
> the callback, because presumably we don't actually need to do anything with
> it if content has thrown away any reference to it.

This sounds correct to me.

Andrew McCreight [:mccr8]

Assignee

Comment 48

•

7 years ago

I have a patch that makes a bunch of callback references weak, and it seems to fix the Twitter leak for me. I haven't tried it on YouTube yet. One problem is that WorkerThreadUpdateCallback has a RefPtr<PromiseWorkerProxy>, which should really get the same treatment, but the proxy doesn't support weak references so I'm not sure how to do that.

Andrew McCreight [:mccr8]

Assignee

Comment 49

•

7 years ago

For (d), the PromiseResolverCallback held by UpdateJobCallback holds alive a promise, but it looks like it is some kind of MozPromise, which is some multithreaded thing that is not directly holding alive JS, so I'll leave that alone.

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Assignee: nobody → continuation

Ben Kelly [:bkelly, not reviewing]

Comment 50

•

7 years ago

The PromiseWorkerProxy is for a Promise created on a worker global, so it should not be holding any window references.

Comment hidden (mozreview-request)

ServiceWorkerJobs are held alive in queues via a per-process global
data structure. Under some unknown circumstance, these queues can get
stuck. Therefore, to avoid leaks we need to make these callbacks only
have weak references to content windows and promises.

Review commit: https://reviewboard.mozilla.org/r/205976/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/205976/

Andrew McCreight [:mccr8]

Assignee

Comment 52

•

7 years ago

This seems to break service workers enough that even the first test in dom/workers/test/serviceworkers/ times out, so maybe it isn't a real fix, but it just breaks service workers so they can't run and thus we can't leak.

Ben Kelly [:bkelly, not reviewing]

Comment 53

•

7 years ago

I guess the problem is when script chains off the promise, but doesn't hold it alive.

Andrew McCreight [:mccr8]

Assignee

Comment 54

•

7 years ago

Oh, right, that makes sense. Somehow the ownership of these promises has to be tied to the window and not the service worker manager. If the window owns them, then we can tell the CC about it and then we shouldn't have this leak.

Ben Kelly [:bkelly, not reviewing]

Comment 55

•

7 years ago

I wonder if we should have some kind of PromiseWindowProxy like we have the PromiseWorkerProxy.  The window would then have some mechanism of signaling the window is shutting down, etc.

This would be easier if we had something that provided a common shutdown hook for nsIGlobalObjects like bug 1379209 suggests.

Andrew McCreight [:mccr8]

Assignee

Comment 56

•

7 years ago

I implemented my idea by adding a multiset of strong references to promises to nsGlobalWindow. When service workers create a promise, they add it to this multiset. The CC traverses/unlink the multiset, so these promises can't keep the window alive, but the promises won't go away early. I encapsulated all of this in a PromiseWorkerProxy class as Ben suggested. This still fixes the leak, and passes at least the mochitest plain service worker tests.

Comment hidden (mozreview-request)

Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/205976/diff/1-2/

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Attachment #8934749 - Attachment is obsolete: true

Ben Kelly [:bkelly, not reviewing]

Updated

•

7 years ago

Updated

•

7 years ago

Depends on: 1424300

Andrew McCreight [:mccr8]

Assignee

Comment 58

•

7 years ago

L64 try looks okay: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d7c24a281451f9db3d78a62bac870538ef130a80

I have no idea how to test this, unfortunately.

Ben Kelly [:bkelly, not reviewing]

Comment 59

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #58)
> L64 try looks okay:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=d7c24a281451f9db3d78a62bac870538ef130a80
> 
> I have no idea how to test this, unfortunately.

I think you would have to add some kind of xpidl or webidl interface that creates a promise, but never resolves it.  Not sure if its worth it or not...

Comment hidden (mozreview-request)

Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/205976/diff/2-3/

Ben Kelly [:bkelly, not reviewing]

Comment 61

•

7 years ago

mozreview-review

Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

https://reviewboard.mozilla.org/r/205976/#review212552

::: dom/promise/PromiseWindowProxy.h:40
(Diff revision 3)
> +// promise. It solves (2) by adding a strong reference to the promise
> +// on the window itself. This ensures that as long as both the
> +// PromiseWindowProxy and the window are still alive that the promise
> +// will remain alive. This strong reference is cycle collected, so it
> +// won't cause the window to leak.
> +class PromiseWindowProxy

final

::: dom/promise/PromiseWindowProxy.h:43
(Diff revision 3)
> +// will remain alive. This strong reference is cycle collected, so it
> +// won't cause the window to leak.
> +class PromiseWindowProxy
> +{
> +public:
> +  PromiseWindowProxy(nsPIDOMWindowInner* aWindow, mozilla::dom::Promise* aPromise);

Should we provide Move() constructors, etc?  I guess we can add them when we need to.

::: dom/promise/PromiseWindowProxy.cpp:21
(Diff revision 3)
> +
> +
> +PromiseWindowProxy::PromiseWindowProxy(nsPIDOMWindowInner* aWindow, Promise* aPromise)
> +  : mPromise(aPromise)
> +{
> +  auto* window = nsGlobalWindowInner::Cast(aWindow);

MOZ_DIAGNOSTIC_ASSERT() that aWindow and aPromise are non-nullptr here?

::: dom/promise/PromiseWindowProxy.cpp:37
(Diff revision 3)
> +}
> +
> +RefPtr<Promise>
> +PromiseWindowProxy::Get() const
> +{
> +  if (!mPromise) {

Please add MOZ_ASSERT(NS_IsMainThread()) to the various proxy methods.

Attachment #8935134 - Flags: review?(bkelly) → review+

Comment hidden (mozreview-request)

Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/205976/diff/3-4/

Pulsebot

Comment 63

•

7 years ago

Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/70ab37e6d1ed
Fix leaks by adding promise window proxy. r=bkelly

Andrew McCreight [:mccr8]

Assignee

Comment 64

•

7 years ago

[Tracking Requested - why for this release]: This is a really bad leak that has started to affect YouTube, possibly as they started rolling out use of a new WebAPI. It only affects some Firefox profiles, for unknown reasons.

tracking-firefox58: --- → ?

tracking-firefox59: --- → ?

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Whiteboard: [MemShrink:P2] → [MemShrink:P1]

Ben Kelly [:bkelly, not reviewing]

Updated

•

7 years ago

Blocks: 1424374

Andrew McCreight [:mccr8]

Assignee

Updated

•

7 years ago

Updated

•

7 years ago

Blocks: 1421681

Liz Henry (:lizzard) (relman/hg->git project)

Comment 66

•

7 years ago

Tracking this for 58/59 since it's a memory leak happening with a popular site.

tracking-firefox58: ? → +

tracking-firefox59: ? → +

Andreea Pavel [:apavel]

Comment 67

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/70ab37e6d1ed

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox59: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox-esr52: --- → disabled

Andrew McCreight [:mccr8]

Assignee

Comment 68

•

6 years ago

Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

Approval Request Comment
[Feature/Bug causing the regression]: service workers
[User impact if declined]: really bad memory leaks on Youtube and Twitter, for some unknown segment of users
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes, I checked locally
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: Not very
[Why is the change risky/not risky?]: It doesn't change too much, and it has been on Nightly for a few days without any apparent problems.
[String changes made/needed]: none

Attachment #8935134 - Flags: approval-mozilla-beta?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 69

•

6 years ago

Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

While this maybe a big code change, it seems like a good bug to fix. Beta58+

Gerry, fyi.

Flags: needinfo?(gchang)

Attachment #8935134 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Cosmin Sabou [:CosminS]

Comment 70

•

6 years ago

The patch does not apply to beta. Please take a look. Thanks!

Flags: needinfo?(continuation)

Ben Kelly [:bkelly, not reviewing]

Comment 71

•

6 years ago

Note, we uplifted the fix for the problem causing the SW job queue to stall.  So maybe we can get away without uplifting this if it's too hard.

Cosmin Sabou [:CosminS]

Comment 72

•

6 years ago

The patch cannot be landed. Here is the error message:

grafting 454683:70ab37e6d1ed "Bug 1409115 - Fix leaks by adding promise window proxy. r=bkelly"
merging dom/base/nsGlobalWindow.cpp and dom/base/nsGlobalWindowInner.cpp to dom/base/nsGlobalWindow.cpp
merging dom/promise/moz.build
merging dom/workers/ServiceWorkerManager.cpp
merging dom/workers/ServiceWorkerRegistration.cpp
warning: conflicts while merging dom/base/nsGlobalWindow.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')

Please take a look. Thanks!

Ryan VanderMeulen [:RyanVM]

Comment 73

•

6 years ago

Looks like this just needs rebasing around the Inner/Outer split. Should be a fairly mechanical change.

Andrew McCreight [:mccr8]

Assignee

Comment 74

•

6 years ago

Attached patch Fix leaks by adding promise window proxy. (rebased for beta) (obsolete) — Details — Splinter Review

The service worker job queue can get stuck sometimes, so we don't want
the per-process service worker data structure to hold strong
references to content. Instead, I add a proxy that is only a weak
reference. The wrinkle is that we need to keep the promise alive as
long as the job and the window are otherwise alive. I solve this by
putting a cycle collected reference to the promise on the window
itself.

MozReview-Commit-ID: 4qFaKQYB9mX

Andrew McCreight [:mccr8]

Assignee

Comment 75

•

6 years ago

I've rebased this for beta.

Flags: needinfo?(continuation)

Cristina Coroiu [:ccoroiu]

Comment 76

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/3db9f0dfe9aa

status-firefox58: affected → fixed

Cristina Coroiu [:ccoroiu]

Comment 77

•

6 years ago

Backed out changeset 3db9f0dfe9aa (bug 1409115) for multiple failures r=backout

Failure push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=3db9f0dfe9aafcd32fdb9803b685ebb49b39121d

Backout: https://hg.mozilla.org/releases/mozilla-beta/rev/641f30e026169ca825c5c479752a4931730ec6d1

Flags: needinfo?(continuation)

Cristina Coroiu [:ccoroiu]

Updated

•

6 years ago

status-firefox58: fixed → verified disabled

Cristina Coroiu [:ccoroiu]

Updated

•

6 years ago

status-firefox58: verified disabled → affected

Andrew McCreight [:mccr8]

Assignee

Comment 78

•

6 years ago

Sorry about that. In the middle of trying to rebase my patch, I changed two places that didn't need to be changed. I think I've fixed it, but I'll wait to upload a new patch until my try run finishes. https://treeherder.mozilla.org/#/jobs?repo=try&revision=970838864911270be00443a2f636c7cc4de27b82

Andrew McCreight [:mccr8]

Assignee

Comment 79

•

6 years ago

Attached patch Fix leaks by adding promise window proxy. (rebased for beta) — Details — Splinter Review

Andrew McCreight [:mccr8]

Assignee

Updated

•

6 years ago

Attachment #8937829 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Assignee

Updated

•

6 years ago

Flags: needinfo?(continuation)

Natalia Csoregi [:nataliaCs]

Comment 80

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/914ddbe4a5f0

status-firefox58: affected → fixed

Sylvestre Ledru [:Sylvestre]

Updated

•

6 years ago

Flags: needinfo?(gchang)

Diagnostic patch. DO NOT LAND 7 years ago Andrew McCreight [:mccr8] 2.16 KB, patch		Details \| Diff \| Splinter Review
Bug 1409115 - Fix leaks by adding promise window proxy. 7 years ago Andrew McCreight [:mccr8] 59 bytes, text/x-review-board-request	bkelly : review+ ritu : approval-mozilla-beta+	Details
Fix leaks by adding promise window proxy. (rebased for beta) 6 years ago Andrew McCreight [:mccr8] 17.20 KB, patch		Details \| Diff \| Splinter Review
Fix leaks by adding promise window proxy. (rebased for beta) 6 years ago Andrew McCreight [:mccr8] 17.16 KB, patch		Details \| Diff \| Splinter Review