Closed Bug 1234973 Opened 8 years ago Closed 8 years ago

Very high memory usage - increasing about 1GB/minute

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: sfink, Unassigned)

References

()

Details

(Keywords: regressionwindow-wanted, Whiteboard: [MemShrink:P3][necko-would-take])

Attachments

(3 files, 1 obsolete file)

My Nightly's memory usage is climbing very very rapidly. It's made it to 29GB before crashing. Going to about:memory and doing "Minimize memory usage" or (I think) just the "GC" button temporarily fixes it.

From the DMD output, it appears to be in some HttpChannel-ish stuff.

This is 2015-12-22. 2015-12-21 had the same problem.
Seems related to redirects.  Patrick, any ideas?
Component: DOM → Networking
Flags: needinfo?(mcmanus)
Whiteboard: [MemShrink]
> Going to about:memory and doing "Minimize memory usage" or (I think) just the "GC" button temporarily fixes it.

Oh, I missed this part.  I guess its not a real leak, but it does seem odd to be using that much memory.
looks like new security model - christoph?
Flags: needinfo?(mcmanus) → needinfo?(mozilla)
Steve, can you tell if there are particular sites or addons causing this issue? I'm not personally seeing it on a more recently Nightly on OSX. It would be good to have steps to reproduce.
Flags: needinfo?(sphink)
Sorry, I know I should have put more in here, but I've switched to Chrome for the time being. If I leave Firefox open, it eventually starts to take over my machine, and by the time I notice it, it's a 3-5 minute wait before it lets me switch windows to either kill it or fix it.

Also, if I restart into safe mode, it fails to restore my tabs. (!?) Which makes it a bit harder.

I see the problem with almost nothing loaded. For a long time, I always had treeherder live in a tab, but in this current run I am most definitely seeing the problem and I haven't loaded that tab.

I will do some more experimentation today.
Flags: needinfo?(sphink)
Just updated nightly and still haven't seen any unusual memory usage, not with my normal profile which has ~400 tabs loaded, nor with another one with just 3 tabs.
Er, ok, upgrading to 2015-12-29 fixed this. I clearly saw it before upgrading, and haven't seen it all day since.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mozilla)
Resolution: --- → WORKSFORME
Ugh. It's back. I just upgraded my RAM from 16GB->32GB, and today I found my browser using 31GB (30GB of it in heap-unclassified). This is the 2016-01-06 nightly. I'm attempting to figure out how to reproduce it reliably so I can try safe mode etc.
I'm not sure. I restarted in safe mode, and it didn't seem to be happening. But at one point, I suddenly saw 30% unclassified (333MB, iirc). It went away after a little while. So I'm not sure if it's the same or not; my usual problem grows much more steadily.
This morning I have 900MB heap-unclassified in a content process using a total of 1700MB.  Seems overall sluggish as well.  No CPU usage as far as I can see.

46.0a1 (2016-01-07)

Has anyone seen this with e10s disabled?
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Specifically, I see loading tabs take close to a minute in this current state.
(In reply to Patrick McManus [:mcmanus] from comment #3)
> looks like new security model - christoph?

Throughout the entire implementation phase of the new security model I haven't encountered any such memory leaks. Also, I don't quite understand why the new security model would leak so much memory. Anyway, we should keep an eye on it. As Andrew points out in Comment 4, it would be very helpful if we have a reliable way to reproduce.
I really shouldn't have used the word "leak" in the title, since mine is definitely not a leak. It's more of a GC/CC scheduling issue, though it also suggests that there's a situation where we're churning through more memory than seems healthy. (But perhaps that's only due to whatever it is that's happening, like something spinning on redirects or something.)

I have continued to attempt to reproduce. I'm going to claim that it does *not* happen to me in safe mode, even though I did once see >300MB heap-unclassified there.

I am now running with all of my addons disabled except for tree-style tabs. It doesn't seem to be happening. My current plan of action is to re-enable all addons, get it to happen again, and snoop on network traffic. Then I'll start bisecting through the addons.
Summary: Massive memory "leak" -- about 1GB/minute → Very high memory usage - increasing about 1GB/minute
I need to go look into something else, so I'll dump my current state.

I re-enabled all my addons, waited for the memory usage to start shooting up, and then dumped out network traffic:

sudo tcpdump -s 4000 -c 1000 -i enp0s25 | perl -lne '$d{$1}++ if / > (\S+?):/; END { while (($h, $c) = each %d) { print "$c $h" } }' | sort -n

That showed lots and lots of https packets going to and from 104.25.219.4. (940 out of 1000 packets). Going to https://104.25.219.4, I see that it's a cert for a whole bunch of domains, one of which I have a tab for: cookthestory.com.

But I wouldn't expect that tab to be loaded, since I recently restarted and haven't visited it. If I grep for it in the about:memory dump, it does not show up.

If I open the browser console, I see an endless stream of error messages:

NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIInterfaceRequestor.getInterface] network-monitor.js:84:0
too much recursion network-monitor.js:84:14

getInterface(iid) {
    if (iid.equals(Ci.nsIProgressEventSink)) {
      return this;
    }
    if (this._wrappedNotificationCallbacks) {
      return this._wrappedNotificationCallbacks.getInterface(iid);
    }
    throw Cr.NS_ERROR_NO_INTERFACE;
  },

Line 84 is the return this._wrappedNotificationCallbacks.getInterface(iid) line.
The plot sickens...

With all addons but TST disabled, I'm getting something different. I see a quite large heap-unclassified (up to 1.25GB) that does *not* go away when I click the GC or minimize memory usage buttons in about:memory. I did start this session with the addons enabled, though. Perhaps something got "stuck" due to an addon?

Anyway, it seems like the (probably) infinite recursion thing should probably be figured out before I worry too much about the heap-unclassified.
The usage and heap-unclassified is really bad, but without steps to reproduce it's hard to do anything here.
Whiteboard: [MemShrink] → [MemShrink:P3]
New record! 62.4GB heap-unclassified

It seems to be associated with Adblock Plus. I was running without it for a day and a half with no problems, then re-enabled it this morning and immediately started seeing the problem. I'm going to disable it again now.

I just opened up the browser console, and I am still seeing the "too much recursion"  errors.

Also, I'm getting a new error mixed in:

11:57:20.452 NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "too much recursion" {file: "resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/webconsole/network-monitor.js" line: 84}]'[JavaScript Error: "too much recursion" {file: "resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/webconsole/network-monitor.js" line: 84}]' when calling method: [nsIInterfaceRequestor::getInterface]1 <unknown>
I looked at the code pointed to by that DMD log, and although I may be fooling myself, it looks pretty clear what's happening: AdBlock Plus is somehow getting into a redirect loop through esource://devtools/shared/webconsole/network-monitor.js" line: 84 and every time through the loop it appends a principals record onto mRedirectChain. Presumably, it'll do this until the infinite recursion is caught.

I'm not sure who to needinfo, but I'll try vporof since he's reviewed network-monitor.js. I don't know if it's the fault of that code, or that just happens to be watching things. Note that I don't have the devtools open while this is happening, afaik. When the DMD log was captured, I didn't have the browser console open either. But I did have ABP installed.
Flags: needinfo?(vporof)
The code in network-monitor.js fuels the actors used by the webconsole and the network monitor tools used in devtools. They should only be tracking what's going on, and if there's requests/redirects happening, the netmonitor simply records them. I'm not sure if it's the watcher's responsibility to know when to stop watching.
Flags: needinfo?(vporof)
Ok, I'll pick another victim, based on the other file mentioned in the recursion loop. Mossop, can you take a look at comments 17 and 18? I'm hoping this is obvious to someone familiar with this stuff.
Flags: needinfo?(dtownsend)
The other file is just a commonjs module loader, it doesn't really do anything other that load other js files so they can run. I'd be very surprised if that was related.
Flags: needinfo?(dtownsend)
Ok, I spent a couple of hours on this today because it was killing me. Which is sad, because I already knew the problematic URL.

So I build my own firefox from 5760541fd240. I modified network-monitor.js to check whether this == this._wrappedNotificationCallbacks, since that would explain an immediate infinite recursion, but fwict this is not the case.

Anyway, I managed to get DumpJSStack() output, which agrees with what I've been saying above: the site www.cookthestory.com leads into an infinite recursion involving resource://devtools/shared/webconsole/network-monitor.js.

The specific url I was using is http://www.cookthestory.com/2014/04/14/how-to-cook-fish-from-frozen/

When I close the tab containing that URL, I continue seeing the error messages in the web console. Then if I restart, I am no longer able to load the URL because it will crash.

It is notable that I get the recursion problem even though the tab is not loaded. Just having it dormant in my tab list is enough.

Also, this is without AdBlock Plus. My enabled addons are Mass Password Reset, Tree Style Tab, and the gecko profiler. The gecko profiler does not seem to be relevant, though; I saw most of these problems before turning it on.

I'm seeing some heap-unclassified with ABP disabled (never much more than 50%), but it's relatively small. It does *not* go away when I force a GC, though. Strangeness.

Anyway, in summary:
 - if cookthestory.com is in my tab list, even unloaded, I get miserable performance overall and the web console shows thousands of "too much recursion" errors
 - if ABP is enabled, that also seems to translate into huge amounts of heap-unclassified memory that will go away if I GC
 - when I load the URL in my own build, it will try to load for a while then crash
Attachment #8712976 - Attachment is obsolete: true
Do you have devtools open when you see this?
I sure hope we don't do anything with network-monitor.js when devtools aren't shown.

I tested on Chrome and also there cpu and memusage go way up with the site when its network monitor thingie is visible.
If devtools is disabled, I see both FF and Chrome taking ~4% CPU time when only that page is loaded.
And memory usage stays about the same.
Closing the tab in FF does drop memory and brings cpu usage down to ~0%.


But anyhow, if you see network-monitor.js usage even when devtools are closed, that is a rather major issue. We should not be slowing down normal browsing with devtools.
None of these cases had the devtools open, but I *do* have the Web Console open to see the errors, which probably amounts to the same thing?

Without devtools or the web console, if I try to load the page I get 50-75% CPU usage in the chrome process, 25% CPU in the content process. That's while it still says "Connecting...", which it now sticks at for a very long time (perhaps until something times out? I'm trying it now, but it is still in that state after 2 minutes or so.) about:memory shows no heap-unclassified during this time, and I saw the memory usage go from 1.2GB->1.5GB->1.4GB->1.4GB, so nothing too suspicious there.

My release Nightly does *not* crash loading the page. The one I built (with no nondefault mozconfig) does.

But in short, no, I don't think I see network-monitor.js usage unless the web console is open. (I'm not sure how I could tell, though. But I have no reason to suspect it.) Which I guess means this bug should be forked into 2 or 3 bugs: 1 for the memory usage, 1 for the crash, 1 for the network-monitor.js loop with the console open. All of these seem to be related to redirects, I think?
Whiteboard: [MemShrink:P3] → [MemShrink:P3][necko-would-take]
See Also: → 1255826
This is just FYI.

Been struggling with these red "explicit/gfx (error in memory reporter)" entries in about:memory and tried:
- updating graphic drivers
- turning off hardware acceleration in Firefox:Options->Advanced
- Creating a blank profile did help but after re-installing some plugins it came back
- tried to isolate the plugin and eventually found AdBlock Plus to somehow being involved

It takes a while (1-2hrs) before it happens but it always seems to happen when i'm watching a YouTube video (i'm also using YouTubeCenter Dev #538 which I also disable completely with no change).

Finally, I removed AdBlock and things worked well so I re-enabled all plugins, hw acceleration, YouTubeCenter  ..etc and replaced AdBlock with uBlock Origin and my system hasn't failed in 24Hrs.
(In reply to Maxim from comment #28)
Disregard. It just happened again.
(In reply to Maxim from comment #28)
> Been struggling with these red "explicit/gfx (error in memory reporter)"
> entries in about:memory

That sounds like bug 1234343, which has been fixed in Firefox 46. If you still see that in version 46 or newer, please file a new bug and put [MemShrink] in the whiteboard tag.
Is this still happening?
Flags: needinfo?(sphink)
Hi Steve,

I'm unable to reproduce this issue you are having. Given the fact that the issue seems to be related to something that is specific to your setup, could you please try to find a regression range using Mozregression tool?
Information on the tool is available at http://mozilla.github.io/mozregression/. Please don't hesitate to contact us if you encounter any problems.

Also, is this still reproducible on your end ? If yes, can you please retest this using latest FF release and latest Nightly build (https://nightly.mozilla.org/) and report back the results ? When doing this, please use a new clean Firefox profile, maybe even safe mode, to eliminate custom settings as a possible cause (https://goo.gl/PNe90E).

Thanks,
Paul.
I haven't seen this in quite a while. (Which may be partly because other problems are kicking in first, but still.)
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Flags: needinfo?(sphink)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: