508128 - httpd.js running in xpcshell GC's too infrequently

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Description

•

15 years ago

When repeatedly running mochitest on content/media/test/test_seek.html, I can see in Task Manager or by using `top` that xpcshell.exe's memory usage rapidly increases with every run. It expands by a few hundred megabytes every reload of test_seek.html. If you wait long enough, (I assume) a garbage collection is done, and the memory shrinks back down to a few hundred megabytes. When running this many times, it's easy to expand the memory usage to over a gigabyte.

Usually the first run expands the memory usage to a few hundred magabytes, but if I add a call to gc() at the end of _connectionClosed in httpd.js:719, memory always stays at around 9MB.

Note that test_seek is disabled on Windows because as it fails intermittently. This is because either necko or httpd.js can't handle suspending or resuming connections, this is a separate issue, and I'm looking into this. If test_seek.html appears to not terminate, that's not related to this gc issue. If I set the media code to not try to suspend/resume connections, this memory bloat still happens, so that's not the cause.

This bug happens on at least Linux and Windows. You have to edit content/media/test/Makefile.in to enable test_seek.html on Windows if you want to test on Windows.

STR:
1. Run mochitest on content/media/test_seek.html
2. Observe memory usage of xpcshell.exe in `top`/Task Manager.
3. Repeatedly reload test_seek.html page, note ever increasing memory usage.

Andreas Gal :gal

Comment 1

•

15 years ago

Please try the patch in bug 506125

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Comment 2

•

15 years ago

(In reply to comment #1)
> Please try the patch in bug 506125

It may help, but not enough; it's still only take a few runs to get over 1GB on Linux running seek_test.html.

Andreas Gal :gal

Comment 3

•

15 years ago

We should investigate whether the MaybeGC hook is called properly. It should trigger the GC as your working set size increases (for every 18.5% increase you get a GC).

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 4

•

15 years ago

Does xpcshell do any calling of MaybeGC by default?  It may be that we need to run it with a -b argument to get steadier memory behaviour here, or just add the gc() call to httpd.js as alluded above.

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Comment 5

•

15 years ago

(In reply to comment #4)
> Does xpcshell do any calling of MaybeGC by default?

I tested this with Anreas' patch from bug 506125, and it looks like xpcshell doesn't ever call MaybeGC(). During a mochitest run the browser does, but xpcshell doesn't. It looks like xpcshell only GCs on exit. I've run through a /content mochitest run on Windows with a break point in xpcshell's instance of js_GC() and it's not hit. Memory use for xpcshell.exe peaked at about 885MB, firefox.exe at about 170MB. Remember this was only for the content mochitests!

> It may be that we need to run it with a -b argument to get steadier memory 
> behaviour here, or just add the gc() call to httpd.js as alluded above.

I looks like xpcshell doesn't have a -b argument, could you elaborate please?

Andreas Gal :gal

Comment 6

•

15 years ago

That explains it. I will look at xpcshell and see if we can easily add MaybeGC calls.

Jeff Walden [:Waldo]

Updated

•

15 years ago

Assignee: general → nobody

Component: JavaScript Engine → XPConnect

QA Contact: general → xpconnect

Hardware: x86 → All

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Comment 7

•

15 years ago

Attached patch Patch httpd.js to gc() in _connectionClosed [Backout: Comment 28] — Details — Splinter Review

Patch adds a gc() call, keeps xpcshell memory down to about 15MB or so.

<gal> cpearce: so bent said xpcshell is mostly used for testing
<gal> and if we do an automated gc heuristics, you couldn't test GC stuff easily any more
<gal> so maybe we should simply add that manual gc() call
<cpearce> gal: that's a shame, but ok.
<gal> yeah, I agree
<gal> the regular shell triggers the GC as expected
<cpearce> I'll put a patch up for gc() in httpd.js then.
<cpearce> Are there any other scripts run by xpcshell that we need to do something similar in?
<gal> I wouldn't know
<gal> I know we have tests that hit last ditch gc
<brendan> we shouldn't do too much gc() hacking

Attachment #392624 - Flags: review?

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Updated

•

15 years ago

Attachment #392624 - Flags: review? → review?(jwalden+bmo)

Jeff Walden [:Waldo]

Updated

•

15 years ago

Attachment #392624 - Flags: review?(jwalden+bmo) → review-

Jeff Walden [:Waldo]

Comment 8

•

15 years ago

Comment on attachment 392624 [details] [diff] [review]
Patch httpd.js to gc() in _connectionClosed
[Backout: Comment 28]

I'm sympathetic, but no.  xpcshell should have some way -- I believe by default -- for GC to occur naturally over time, without requiring explicit calls to a gc function.  If there is some reason to need more precise control over GC, that functionality should be opt-in; the common case is not to be testing precise GC behavior.  It is not the duty of every xpcshell user to work around this lack.

As a side effect, your patch probably causes the reftest harness to fail.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

15 years ago

Brendan, see comment #8. Can you, Jeff and Andreas reach an agreement on what needs to be done here, and do it? This is a pretty huge problem --- I don't know how we've struggled on this long without noticing it. (OK, I have noticed huge httpd.js memory usage before, but I didn't consider that GC might just not be happening.)

Jeff Walden [:Waldo]

Comment 10

•

15 years ago

GC will happen, but only at the extreme of $hugemem, not optimistically before then, I wasn't thinking of that, sorry.

If you want to minimize the time it takes to gc, adding |data.length = 0| after |output.writeByteArray(data, data.length)| in httpd.js will probably make a large difference for this particular case; I'd be fine with that addition.

Brendan Eich [:brendan]

Comment 11

•

15 years ago

Can we do something expedient in httpd.js ASAP, with a followup bug for the right fix to xpcshell filed and patched soon enough?

/be

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Comment 12

•

15 years ago

(In reply to comment #8)
> (From update of attachment 392624 [details] [diff] [review])
> I'm sympathetic, but no. [...]
> As a side effect, your patch probably causes the reftest harness to fail.

I pushed the patch to tryserver, there were a few random-looking failures but all the tests ran at least, and all tests also run locally. I've pushed to tryserver again to see if the random-looking-failures happen again. How could this change cause the reftest harness to fail?

Andreas Gal :gal

Comment 13

•

15 years ago

Looks like I am going to be fixing this. My memory pressure-based GC heuristic patch removes triggering GC from the allocator path and since xpcshell never calls MaybeGC, nochrome talos fails on some platforms now.

Assignee: nobody → gal

Andreas Gal :gal

Updated

•

15 years ago

Blocks: 506125

Andreas Gal :gal

Comment 14

•

15 years ago

Chris, can you get the quick fix in? I will provide a generalized fix.

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Comment 15

•

15 years ago

(In reply to comment #14)
> Chris, can you get the quick fix in? I will provide a generalized fix.

Sure, who can will review+ it?

Andreas Gal :gal

Comment 16

•

15 years ago

My first pick would be jwalden, followed by sayrer. Both are cc'd.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 17

•

15 years ago

jwalden just refused to r+ it.

Andreas Gal :gal

Comment 18

•

15 years ago

Well he knows that corner a lot better me. I can't overrule him. Waldo, whats up with the review? Are you ok with a temporary stop-gap measure until I have the MaybeGC path in place? We can back it out later. It sounds like this issue is biting people right now.

Robert Sayre

Comment 19

•

15 years ago

if this is turning tinderboxen orange as I write, we should take the httpd.js bandaid until we get a real fix.

bandaids are ok as long as they get removed properly.

Andreas Gal :gal

Comment 20

•

15 years ago

is that an r=sayrer?

Jeff Walden [:Waldo]

Comment 21

•

15 years ago

I expect this would break reftests because there wouldn't be a gc() function to call in reftest; the server is run there as a component, and I wouldn't expect components to have such a function in scope.  Thinking a little more, it's possible that the method is called late enough that the thrown ReferenceError happens to be lucky and doesn't break anything.  It's also possible my non-fanatical attempts to make httpd.js code exception-safe (or, rather, don't-hang-if-an-exception-is-thrown-safe) were successful in this case; hard to say without investigating, which I'm not inclined to do.

There are bandaids that change the way existing APIs are used to work around problems in them.  There are also gross, awful hacks that perform some completely random change unmotivated by what the surrounding code does.  I think this is an instance of the latter and not an instance of the former.

Jeff Walden [:Waldo]

Comment 22

•

15 years ago

Here's a hack I'd accept, because it doesn't touch httpd.js itself: if we added testing/mochitest/tests/run-gc.sjs with contents |function handleRequest(request, response) { response.setHeader("Cache-Control", "no-cache", false); Components.utils.forceGC(); }| and occasionally loaded that file, would that also do a JS GC and ameliorate the problem?

Robert Sayre

Comment 23

•

15 years ago

(In reply to comment #22)
> Here's a hack I'd accept, because it doesn't touch httpd.js itself: if we added
> testing/mochitest/tests/run-gc.sjs with contents |function
> handleRequest(request, response) { response.setHeader("Cache-Control",
> "no-cache", false); Components.utils.forceGC(); }| and occasionally loaded that
> file, would that also do a JS GC and ameliorate the problem?

gah. way, way too much ink spilled on a trivial, temporary bandaid patch. 

Making an sjs file and calling is more complicated. We should just call Components.utils.forceGC() on connection close until Gal gets us a better fix, if you ask me.

Jeff Walden [:Waldo]

Comment 24

•

15 years ago

Attached patch Periodically gc in the server process (obsolete) — Details — Splinter Review

Actually, easier would be to just do it in the server directly, like so.  Rebuilding and testing this now...

Jeff Walden [:Waldo]

Comment 25

•

15 years ago

Comment on attachment 392624 [details] [diff] [review]
Patch httpd.js to gc() in _connectionClosed
[Backout: Comment 28]

Please guard this with |if (typeof gc === "function)|.  Also, I hate myself for doing this.

My patch didn't work, I'm guessing because by the time it's called the garbage your gc collects is entrained by XPCOM gunk, and I don't see a way to force a cycle collection in xpcshell.  (The usual way requires nsIDOMWindowUtils, which exists in the browser.)

Attachment #392624 - Flags: review- → review+

Robert Sayre

Comment 26

•

15 years ago

let's try this on the tracemonkey tree:

http://hg.mozilla.org/tracemonkey/rev/1391e8b39c73

Chris Pearce [:cpearce (Not reading bugmail)]

Reporter

Comment 27

•

15 years ago

(In reply to comment #26)
> let's try this on the tracemonkey tree:
> 
> http://hg.mozilla.org/tracemonkey/rev/1391e8b39c73

This seems to have made the tracemonkey test boxes consistently orange across the board. :(

Robert Sayre

Comment 28

•

15 years ago

yup, already backed out.

http://hg.mozilla.org/tracemonkey/rev/b0a52a773828

Andreas Gal :gal

Comment 29

•

15 years ago

Fixed by 506117 as a byproduct.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → DUPLICATE

Andreas Gal :gal

Comment 31

•

15 years ago

Fixed by 506125, not 506117. Apologies.

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

No longer blocks: 506125

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Attachment #392786 - Attachment is obsolete: true

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Attachment #392624 - Attachment is obsolete: true

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Attachment #392624 - Attachment description: Patch httpd.js to gc() in _connectionClosed → Patch httpd.js to gc() in _connectionClosed [Backout: Comment 28]

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 32

•

15 years ago

Un-duping. There's been no progress on bug 506125 for two months, and we need this fixed ASAP, and surely we can do something simple in httpd.js that doesn't require us to implement intelligent memory-pressure-based GC across the board.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 33

•

15 years ago

I pushed the comment #26 patch to the try-servers. It seems have caused a hang in extension manager tests:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1257114322.1257122789.7235.gz#err0

TEST-PASS | /builds/slave/sendchange-linux-unittest/mozilla/objdir/_tests/xpcshell/test_extensionmanager/unit/test_LightweightThemeManager.js | test passed
TEST-PASS | /builds/slave/sendchange-linux-unittest/mozilla/objdir/_tests/xpcshell/test_extensionmanager/unit/test_bug257155.js | test passed

command timed out: 300 seconds without output, killing pid 1083

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

15 years ago

Summary: xpcshell.exe gc's too infrequently → httpd.js running in xpcshell GC's too infrequently

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

15 years ago

Windows failed in unit tests, too, but at a different test, so I suppose the hang is probably at a random place:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1257114322.1257123928.19656.gz#err0

TEST-PASS | e:\builds\slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_dm\unit\test_bug_382825.js | test passed
TEST-PASS | e:\builds\slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_dm\unit\test_bug_384744.js | test passed
TEST-PASS | e:\builds\slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_dm\unit\test_bug_395092.js | test passed

command timed out: 300 seconds without output

On both Linux and Windows (Mac hasn't reported yet), we have a hang at the end of chrometests, after they've passed:

TEST-PASS | chrome://mochikit/content/browser/xpinstall/tests/browser_whitelist6.js | Install should have been triggered by the right uri

Browser Chrome Test Summary
	Passed: 4000
	Failed: 0
	Todo: 5

*** End BrowserChrome Test Results ***
INFO | automation.py | Application ran for: 0:03:00.187000
SUCCESS: The process with PID 8288 has been terminated.
before 995328, after 995328, break 00000000

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 35

•

15 years ago

Interestingly, Mac hung in exactly the same place as Linux:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1257114322.1257125858.8527.gz#err1

TEST-PASS | /builds/slave/sendchange-macosx-unittest/mozilla/objdir/_tests/xpcshell/test_extensionmanager/unit/test_LightweightThemeManager.js | test passed
TEST-PASS | /builds/slave/sendchange-macosx-unittest/mozilla/objdir/_tests/xpcshell/test_extensionmanager/unit/test_bug257155.js | test passed

command timed out: 300 seconds without output, killing pid 16779

The hang after running chrome tests also happened on Mac.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 36

•

15 years ago

We hang in test_bug299716.js because it does do_timeout(DELAY, "run_test_pt2()"). That timeout never fires, I think because we GC before it fires and the GC cleans up all references to the timer, which cancels the timer.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

15 years ago

Attached patch fix do_timeout [checked in comment #50] — Details — Splinter Review

Fix do_timeout so pending timers aren't GCed and canceled.

Attachment #409642 - Flags: review?

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

15 years ago

Attachment #409642 - Flags: review? → review?(jwalden+bmo)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 38

•

15 years ago

The remaining hang is that httpd.js fails to terminate after the browser has exited from the browser mochitests. We get to
>>> shutting down server on port 8888
and then we close the connection to /server/shutdown, but it looks like there are still other connections open. Maybe we missed a connection-closed notification because something got GCed that was supposed to notice when a connection has closed? It's 100% reproducible for me.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 39

•

15 years ago

Yes, the problem is that the connection for browser/components/privatebrowsing/test/browser/staller.sjs never closes.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 40

•

15 years ago

It looks like this:

  function stall() {
    response.write("stalling...\n");
  }

  response.setHeader("Content-Type", "text/plain", false);
  stall();

  const nsITimer = Components.interfaces.nsITimer;
  var timer = Components.classes["@mozilla.org/timer;1"]
                        .createInstance(nsITimer);
  timer.initWithCallback(stall, 500, nsITimer.TYPE_REPEATING_SLACK);

so my guess is that this is supposed to write to a closed socket, which will trigger an error that lets httpd.js notice that the connection is closed. But I bet we're GCing away this timer callback so we never write and never notice the socket closed.

Jeff Walden [:Waldo]

Updated

•

15 years ago

Attachment #409642 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Comment 41

•

15 years ago

Comment on attachment 409642 [details] [diff] [review]
fix do_timeout [checked in comment #50]

Seems sensible.

The naming convention for head.js-private variables is non-existent, and I was tempted to grumble here, but this isn't any different than what exists -- filed bug 525819 to hit that.  Also, I filed bug 525820 to make do_timeout take a nullary function (not a string) as an argument.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 42

•

15 years ago

Attached patch fix staller.sjs (obsolete) — Details — Splinter Review

Similar fix for staller.sjs. This is slightly more complicated, we have to keep firing the timer until the response has ended (the browser closes its side and we get an error writing to the socket). This requires adding an isEnded API to Response.

Attachment #409662 - Flags: review?(jwalden+bmo)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 43

•

15 years ago

With those patches, I got some new test failures on tryserver on Windows:

TEST-UNEXPECTED-FAIL | e:\builds\slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_necko\unit\test_cookie_header.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/unit/test_cookie_header.js | Expected to find header 'Cookie' but didn't find it - See following stack:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/unit/test_cookie_header.js | Unexpected exception: 2147500036 - See following stack:
TEST-UNEXPECTED-FAIL | e:\builds\slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_necko\test\test_host.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_host.js |  == HTTP/1.1 400  - See following stack:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | error thrown by responseCheck: 2147500036 - See following stack:
TEST-UNEXPECTED-FAIL | e:\builds\slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_necko\test\test_request_line_split_in_two_packets.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_request_line_split_in_two_packets.js |  == HTTP/1.1 200 TEST PASSED - See following stack:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | error thrown by responseCheck: 2147500036 - See following stack:
TEST-UNEXPECTED-FAIL | e:\builds\slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_necko\test\test_seizepower.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_seizepower.js |  == too-late passed - See following stack:
TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | error thrown by responseCheck: 2147500036 - 

Linux and Mac had the test_cookie_header.js failures but not the others.

Jeff Walden [:Waldo]

Updated

•

15 years ago

Attachment #409662 - Flags: review?(jwalden+bmo) → review-

Jeff Walden [:Waldo]

Comment 44

•

15 years ago

Comment on attachment 409662 [details] [diff] [review]
fix staller.sjs

>diff --git a/netwerk/test/httpserver/httpd.js b/netwerk/test/httpserver/httpd.js

>+  /**
>+   * Returns true when the response has ended (completely constructed if possible
>+   * and the connection closed); further actions on this will fail.
>+   */
>+  get isEnded()
>+  {
>+    return this._ended;
>+  },

I'd rather not add this.  write() is guaranteed to throw if the response is fully constructed (whether completely sent or not), and it is entirely safe to rely on that behavior.


>diff --git a/browser/components/privatebrowsing/test/browser/staller.sjs b/browser/components/privatebrowsing/test/browser/staller.sjs

There seems to be more complexity here than necessary, some to support a repeating timer using that explicit timer type and some to support multiple timers (except this only ever uses one).  I had trouble following the control flow and the timer-release behavior.  I'd rather see repeated reuse of a one-shot timer as below, which I think is clearer and more concise (this semi-relies on a one-shot timer clearing its callback if not reinitialized; feel free to add a try { } catch (e) { timer = null; throw e; } around the contents of stall if you prefer):

var timer;
function handleRequest(request, response)
{
  response.setStatusLine(request.httpVersion, 200, "OK");
  response.setHeader("Content-Type", "text/plain", false);

  const nsITimer = Components.interfaces.nsITimer;
  timer = Components.classes["@mozilla.org/timer;1"]
                    .createInstance(nsITimer);
  function stall(t)
  {
    response.write("stalling...\n");
    timer.initWithCallback(stall, 500, nsITimer.TYPE_ONE_SHOT);
  }

  stall();
  response.processAsync();
}

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 45

•

15 years ago

OK, I guess that'll work.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

15 years ago

Attached patch fix subtle QI bug in test_cookie_header.js [checked in comment 50] — Details — Splinter Review

This one's a doozy. If someone does a QI on JS-wrapped XPCOM object, the JS wrapper remembers that it has the methods of that interface, so if someone else gets that XPCOM object, they'll see those methods without QI'ing. But if you get a GC in between, the JS wrapper goes away, and the "someone else" won't see those methods.

In this case, the test was QIing to the wrong interface but finding the method anyway ... unless we happen to GC in between!

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

15 years ago

Attached patch fix staller.sjs (v2) [checked in comment 70] — Details — Splinter Review

I hope this is what you had in mind? Seems to work.

Attachment #409662 - Attachment is obsolete: true

Attachment #409896 - Flags: review?(jwalden+bmo)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 48

•

15 years ago

With these patches, things look good locally. My try-server build seems to be in limbo.

Jeff Walden [:Waldo]

Comment 49

•

15 years ago

Comment on attachment 409896 [details] [diff] [review]
fix staller.sjs (v2) [checked in comment 70]

Well, what I had in mind was exactly what I typed out.  :-)  Maybe that had some thinkos in it?  No matter, this is as good.

Attachment #409896 - Flags: review?(jwalden+bmo) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 50

•

15 years ago

Checked in patches from comments #37 and #46:
http://hg.mozilla.org/mozilla-central/rev/89bce43a99d1
http://hg.mozilla.org/mozilla-central/rev/4aa00e390ced

My try-server build with the full set of patches did not work. On Windows the errors in comment #43 recurred (other than test_cookie_header errors). They don't occur locally, as I mentioned. So more investigation is needed.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 51

•

15 years ago

Attached patch Fix RawTest QI issue [checked in comment 70] — Details — Splinter Review

I think I found the bug. RawTest's 'writer' gets a stream from XPCOM in onOutputStreamReady and then sometimes calls waitToWriteOutput, which expects that stream to have been previously QIed to nsIAsyncOutputStream. That QI was done on the stream object previously in performNextTest, but GC might have stripped it. There is a similar issue for 'reader'.

Pushed to try server...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 52

•

15 years ago

I think our test-running xpcshells need to GC vigorously if only so we can prevent more tests being added with these bugs!

Brendan Eich [:brendan]

Comment 53

•

15 years ago

(In reply to comment #52)
> I think our test-running xpcshells need to GC vigorously if only so we can
> prevent more tests being added with these bugs!

"These bugs" are all bug 307441, a bad bug from the First Age. Who will fix it?

/be

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 54

•

15 years ago

My try-server build passed on Linux (again), got what could be a new intermittent failure on Mac:
TEST-UNEXPECTED-FAIL | /builds/slave/sendchange-macosx-unittest/mozilla/objdir/_tests/xpcshell/test_necko/unit/test_traceable_channel.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | /builds/slave/sendchange-macosx-unittest/mozilla/objdir/_tests/xpcshell/test_necko/unit/test_traceable_channel.js | replaced http response body == original http response body - See following stack:

Windows never reported back :-(

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 55

•

15 years ago

Ah, Windows just finally reported back. A couple of the errors are still there :-(.

TEST-UNEXPECTED-FAIL | e:\builds\moz2_slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_necko\test\test_host.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_host.js |  == HTTP/1.1 400  - See following stack:
TEST-UNEXPECTED-FAIL | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | error thrown by responseCheck: 2147500036 - See following stack:
TEST-UNEXPECTED-FAIL | e:\builds\moz2_slave\win32-unit\mozilla\objdir\_tests\xpcshell\test_necko\test\test_seizepower.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_seizepower.js | exceptions test passed ==  - See following stack:

Jeff Walden [:Waldo]

Comment 56

•

15 years ago

Are you manually eliding stacks in the above output, or is the stack-dumping code not working?  (Although, stacks might be fairly generic here given the asynchronicity of the tests and the RawTest gunk, so I'm not sure if they'd actually help or not.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 57

•

15 years ago

I'm copying from the Build Error Summary. The log has more stack data.
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1257303442.1257311741.1432.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1257304225.1257314321.30148.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1257306712.1257319468.21533.gz

That's from a push with some diagnostic output, but I might have broken something in the process. Checking now.

Jeff Walden [:Waldo]

Comment 58

•

15 years ago

The first one at least looks like it's what's expected, although there's some fail there because it's catching an exception and throwing a new one (losing the actually interesting stack in the process).  Still examining...

Jeff Walden [:Waldo]

Comment 59

•

15 years ago

      onOutputStreamReady: function(stream)
      {
        var data = testArray[testIndex].data[dataIndex];

        var written = 0;
        try
        {
          written = stream.write(data, data.length);
          if (written == data.length)
            dataIndex++;
          else
            testArray[testIndex].data = data.substring(written);
        }
        catch (e) { /* stream could have been closed, just ignore */ }

        // Keep reading data until there's no more data to read
        if (written != 0)
          waitToWriteOutput(stream);
        else
          stream.close();
      }

So, |testArray[testIndex].data| is an array of strings.  Initially |data| is the next one of those strings to write out.  We try to write it out.  If we wrote it all out, then we increase |dataIndex| to work on the next string to write out.  If we *didn't*, however, we grab what wasn't written and re-queue it:

            testArray[testIndex].data = data.substring(written);

|data.substring(written)| is a string.  |testArray[testIndex].data| was previously an array.  Type error!  The left-hand side should be |testArray[testIndex].data[dataIndex]|.  The TypeError you're seeing would happen if the remaining substring is small but |dataIndex| was large.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 60

•

15 years ago

OK, so I've fixed that, and fixed another bug where we should be checking dataIndex < testArray[testIndex].data.length, but we've still got failures.

On Linux:
http://tinderbox.mozilla.org/showlog.cgi?tree=MozillaTry&errorparser=unittest&logfile=1257390383.1257399263.19861.gz&buildtime=1257390383&buildname=Linux%20try%20hg%20unit%20test&fulltext=1#err1
TEST-UNEXPECTED-FAIL | /builds/slave/sendchange-linux-unittest/mozilla/objdir/_tests/xpcshell/test_necko/test/test_seizepower.js |  == async seizure passed - See following stack:
JS frame :: /builds/slave/sendchange-linux-unittest/mozilla/testing/xpcshell/head.js :: do_throw :: line 201
JS frame :: /builds/slave/sendchange-linux-unittest/mozilla/testing/xpcshell/head.js :: do_check_eq :: line 253
JS frame :: /builds/slave/sendchange-linux-unittest/mozilla/objdir/_tests/xpcshell/test_necko/test/test_seizepower.js :: checkAsyncSeizure :: line 252
JS frame :: /builds/slave/sendchange-linux-unittest/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 606

On Windows:
http://tinderbox.mozilla.org/showlog.cgi?tree=MozillaTry&errorparser=unittest&logfile=1257390383.1257402748.25078.gz&buildtime=1257390383&buildname=WINNT%205.2%20try%20hg%20unit%20test&fulltext=1#err0
TEST-INFO | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | Swallowed exception [Exception... "Component returned failure code: 0x80470002 (NS_BASE_STREAM_CLOSED) [nsIBinaryInputStream.available]"  nsresult: "0x80470002 (NS_BASE_STREAM_CLOSED)"  location: "JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 585"  data: no], see following stack:
undefined
NEXT ERROR TEST-UNEXPECTED-FAIL | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_host.js |  == HTTP/1.1 400  - See following stack:
JS frame :: e:\builds\moz2_slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_throw :: line 201
JS frame :: e:\builds\moz2_slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_check_eq :: line 253
JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_host.js :: check400 :: line 303
JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 606

TEST-INFO | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | Swallowed exception [Exception... "Component returned failure code: 0x80470002 (NS_BASE_STREAM_CLOSED) [nsIBinaryInputStream.available]"  nsresult: "0x80470002 (NS_BASE_STREAM_CLOSED)"  location: "JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 585"  data: no], see following stack:
undefined
NEXT ERROR TEST-UNEXPECTED-FAIL | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_request_line_split_in_two_packets.js |  == HTTP/1.1 200 TEST PASSED - See following stack:
JS frame :: e:\builds\moz2_slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_throw :: line 201
JS frame :: e:\builds\moz2_slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_check_eq :: line 253
JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_request_line_split_in_two_packets.js :: checkLotsOfLeadingBlankLines :: line 133
JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 606


TEST-INFO | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | Swallowed exception [Exception... "Component returned failure code: 0x80470002 (NS_BASE_STREAM_CLOSED) [nsIBinaryInputStream.available]"  nsresult: "0x80470002 (NS_BASE_STREAM_CLOSED)"  location: "JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 585"  data: no], see following stack:
undefined
TEST-UNEXPECTED-FAIL | e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_seizepower.js | exceptions test passed ==  - See following stack:
JS frame :: e:\builds\moz2_slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_throw :: line 201
JS frame :: e:\builds\moz2_slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_check_eq :: line 253
JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_seizepower.js :: checkExceptions :: line 243
JS frame :: e:/builds/moz2_slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 606

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 61

•

15 years ago

All of these errors are consistent with RawTest just dropping data on the floor before calling the data-check callback.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 62

•

15 years ago

I wonder if it's possible for reader.onInputStreamReady to be fired sometimes when there's no data in the stream yet? I pushed another patch to the try-server to see if I can narrow that down.

Jeff Walden [:Waldo]

Comment 63

•

15 years ago

Plausible.  :-\

It's depressing how complicated this code is, but on the other hand 1) at least it's not spread across a handful of different tests, and 2) if the tests didn't exist we'd be in an even worse position.

Regarding the written != 0 check you mentioned as looking wrong, the reason for it, after thinking a little more, is that if the stream closes before we've finished writing all data (entirely possible when the test is that the server fails fast with a 400 error), we have to stop writing out data.  Switching to just check whether more data remains to be written seems not enough to handle that case, but I'm not entirely sure what the better thing to do is.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 64

•

15 years ago

I guess we can check written != 0 too. onOutputStreamReady shouldn't fire unless we can write at least one byte.

The latest push has no errors on Linux or Mac (although there could just be randomness here, of course) but the same 3 errors on Windows. Here's a relevant snippet:
http://tinderbox.mozilla.org/showlog.cgi?tree=MozillaTry&errorparser=unittest&logfile=1257414317.1257428920.30309.gz&buildtime=1257414317&buildname=WINNT%205.2%20try%20hg%20unit%20test&fulltext=1#err0

*** copying undefined bytes to output: 66,97,100,32,114,101,113,117,101,115,116,10
*** _waitForData
*** onInputStreamReady
*** cancel(0)
*** onStopRequest async callback
*** onStopRequest [status=0x0]
*** closing connection 2 on port 2904
before 1060864, after 1052672, break 00000000
TEST-INFO | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js | Swallowed exception [Exception... "Component returned failure code: 0x80470002 (NS_BASE_STREAM_CLOSED) [nsIBinaryInputStream.available]"  nsresult: "0x80470002 (NS_BASE_STREAM_CLOSED)"  location: "JS frame :: e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 587"  data: no], see following stack:
undefined
*** reader.onInputStreamReady: available=0
NEXT ERROR TEST-UNEXPECTED-FAIL | e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_host.js |  == HTTP/1.1 400  - See following stack:
JS frame :: e:\builds\slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_throw :: line 201
JS frame :: e:\builds\slave\win32-unit\mozilla\testing\xpcshell\head.js :: do_check_eq :: line 253
JS frame :: e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/test_host.js :: check400 :: line 303
JS frame :: e:/builds/slave/win32-unit/mozilla/objdir/_tests/xpcshell/test_necko/test/head_utils.js :: anonymous :: line 610

I think the key line here is "reader.onInputStreamReady: available=0". That's just wrong.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 65

•

15 years ago

(In reply to comment #63)
> It's depressing how complicated this code is, but on the other hand 1) at least
> it's not spread across a handful of different tests, and 2) if the tests didn't
> exist we'd be in an even worse position.

If we find out that lack of GC in httpd.js is causing a lot of the orange across the board, then I say if the tests didn't exist we'd be in a much *better* position.

In fact I think we should disable RawTests and land these fixes. The tests are clearly broken and I have had enough of trying to fix them.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 66

•

15 years ago

Attached patch Fix additional RawTest bugs [checked in comment 70] — Details — Splinter Review

Attachment #410631 - Flags: review?(jwalden+bmo)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 67

•

15 years ago

Attached patch Additional diagnostics to report more data about exceptions in tests (obsolete) — Details — Splinter Review

Attachment #410632 - Flags: review?(jwalden+bmo)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 68

•

15 years ago

Attached patch Additional logging and asyncWait change — Details — Splinter Review

I don't know if this patch helps at all, but it might helpful for future work on these tests.

Jeff Walden [:Waldo]

Updated

•

15 years ago

Attachment #410631 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Comment 69

•

15 years ago

Comment on attachment 410632 [details] [diff] [review]
Additional diagnostics to report more data about exceptions in tests

>diff --git a/netwerk/test/httpserver/test/head_utils.js b/netwerk/test/httpserver/test/head_utils.js

Hm, we should have added this noting and reporting functionality ages ago, didn't occur to me that such functionality should be generalized.  I wonder: is this because people aren't testing the exceptions thrown by methods enough?


>@@ -571,45 +571,58 @@ function runRawTests(testArray, done)
>   /** Data received so far from the server. */
>   var received = "";
> 
>   /** Reads data from the socket. */
>   var reader =
>     {
>       onInputStreamReady: function(stream)
>       {
>-        var bis = new BinaryInputStream(stream);
>+        try {

try
{

catch (e)
{

finally
{

throughout.


>+          var av = 0;
>+          try
>+          {
>+            av = bis.available();
>+          }
>+          catch (e) {
>+            do_note_exception(e);
>+            /* default to 0 */

Place the comment before the noting; that seems the more logical ordering (same elsewhere too).  We default to 0, therefore we merely note the exception rather than doing something with it like checking its value.


>diff --git a/netwerk/test/unit/test_traceable_channel.js b/netwerk/test/unit/test_traceable_channel.js

>   onStopRequest: function(request, context, statusCode) {
>-    this.listener.onStopRequest(request, context, statusCode);
>-    httpserver.stop(do_test_finished);
>+    dump("In TracingListener.onStopRequest\n");
>+    try {
>+      this.listener.onStopRequest(request, context, statusCode);
>+      httpserver.stop(do_test_finished);
>+      dump("Did TracingListener.onStopRequest\n");
>+    } catch (e) {
>+      dump("TracingListener.onStopRequest swallowing exception: " + e + "\n");
>+    }
>   },

This formulation is a bad idea because if onStopRequest throws, the server will never be stopped and do_test_finished will never be called.  Instead, move the server-stop call into a finally clause.

Attachment #410632 - Flags: review?(jwalden+bmo) → review+

(not currently active) Ted Mielczarek

Updated

•

15 years ago

Assignee: gal → roc

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 70

•

15 years ago

Checked in attachment #410013 [details] [diff] [review], attachment #409896 [details] [diff] [review] and attachment #410631 [details] [diff] [review].

http://hg.mozilla.org/mozilla-central/rev/e24c9b4be5c6
http://hg.mozilla.org/mozilla-central/rev/e91e024ace59
http://hg.mozilla.org/mozilla-central/rev/23fa68bf3610

I'm not planning to work on this myself more in the immediate future, but I think this bug still really needs to be resolved quite urgently. I hope someone else can pick up the baton.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Assignee: roc → nobody

Josh Matthews [:jdm]

Assignee

Updated

•

14 years ago

Depends on: 601557

Josh Matthews [:jdm]

Assignee

Comment 71

•

14 years ago

I'm looking into this.  I'm seeing a very repeatable assertion failure in test_async_notifications_404.js which looks like a weakness in the scope of the request proxies, which I've filed as bug 601557.

Josh Matthews [:jdm]

Assignee

Updated

•

14 years ago

Assignee: nobody → josh

Josh Matthews [:jdm]

Assignee

Comment 72

•

14 years ago

Having applied attachment 410632 [details] [diff] [review] and made Waldo's requested changes, I see this notice before the test failure occurs:

TEST-INFO | /Users/moco/src/m-c/ff-dbg/_tests/xpcshell/netwerk/test/httpserver/test/head_utils.js | Swallowed exception [Exception... "Component returned failure code: 0x80470002 (NS_BASE_STREAM_CLOSED) [nsIBinaryInputStream.available]"  nsresult: "0x80470002 (NS_BASE_STREAM_CLOSED)"  location: "JS frame :: /Users/moco/src/m-c/ff-dbg/_tests/xpcshell/netwerk/test/httpserver/test/head_utils.js :: anonymous :: line 532"  data: no], see following stack:

Josh Matthews [:jdm]

Assignee

Comment 73

•

14 years ago

Apparently that's not the cause, however, because the same exception occurs in successful runs as well.

Josh Matthews [:jdm]

Assignee

Comment 74

•

14 years ago

As far as I can tell, we're suffering from a similar problem to bug 601557.  The input stream created in performNextTest goes out of scope, and apparently nobody else bothers to hold a strong reference to it.  When httpd.js dispatches a runnable that ends up triggering a GC, it races against the stream becoming readable.  If the GC wins, the stream is destroyed and in its dying throes triggers a runnable which calls OnInputStreamReady and keeps the stream alive for the callback.  The easiest solution is to retain a strong ref to the stream somewhere in head_utils.js, such as within the reader object.

Josh Matthews [:jdm]

Assignee

Comment 75

•

14 years ago

Attached patch Ensure input stream lifetime is not cut short by a GC. [checked in comment 80] — Details — Splinter Review

Josh Matthews [:jdm]

Assignee

Comment 76

•

14 years ago

Attached patch Additional diagnostics to report more data about exceptions in tests [checked in comment 80] — Details — Splinter Review

Here's the updated diagnostic patch for eventual landing.  Carrying forward the previous r+.

Josh Matthews [:jdm]

Assignee

Updated

•

14 years ago

Attachment #481089 - Flags: review?(jwalden+bmo)

Josh Matthews [:jdm]

Assignee

Updated

•

14 years ago

Attachment #410632 - Attachment is obsolete: true

Josh Matthews [:jdm]

Assignee

Updated

•

14 years ago

Attachment #481090 - Attachment description: "httpd.js running in xpcshell GC's too infrequently" [ → Additional diagnostics to report more data about exceptions in tests

Josh Matthews [:jdm]

Assignee

Comment 77

•

14 years ago

I've had several try server runs with the patch from bug 601557 and here applied, and the only failures I've received are known intermittent oranges that don't appear to be any worse with this GC behaviour change.  I'm feeling pretty good about this.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 78

•

14 years ago

You rock! Please land this!

Jeff Walden [:Waldo]

Comment 79

•

14 years ago

Comment on attachment 481089 [details] [diff] [review]
Ensure input stream lifetime is not cut short by a GC. [checked in comment 80]

>diff --git a/netwerk/test/httpserver/test/head_utils.js b/netwerk/test/httpserver/test/head_utils.js
>@@ -517,16 +518,17 @@ function runRawTests(testArray, done)
>   /** Data received so far from the server. */
>   var received = "";
> 
>   /** Reads data from the socket. */
>   var reader =
>     {
>       onInputStreamReady: function(stream)
>       {
>+        do_check_true(stream == this.stream);

=== please, == is a bit broken (not in this instance, but bad choices can snowball into bad habits).

Is it the case that this issue also needs to be fixed in runHttpTests?  Looks like it to me, but maybe I'm missing something.  If you agree, please add:

* a |listener._channel = ch;| before the |ch.asyncOpen(listener, null)| in
  |performNextTest|,
* a |/** Current channel being observed by this. */ _channel: null,| in the
  |listener| literal,
* a |do_check_true(request === this._channel);| in |onStartRequest|,
* and a |this._channel = null;| in |onStopRequest|

while you're making the changes you've already suggested.  Sorry for the delay in getting to the review...

Attachment #481089 - Flags: review?(jwalden+bmo) → review+

Josh Matthews [:jdm]

Assignee

Comment 80

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/031018350d0f
http://hg.mozilla.org/mozilla-central/rev/24c51e16703a
http://hg.mozilla.org/mozilla-central/rev/367117828795

Status: REOPENED → RESOLVED

Closed: 15 years ago → 14 years ago

Resolution: --- → FIXED

Josh Matthews [:jdm]

Assignee

Comment 81

•

14 years ago

And the actual gc() patch bounced because it lit up the xpcshell tests.  Investigating yet again.  Sigh.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 82

•

14 years ago

The failures in xpcshell tests all seem to be:

###!!! ASSERTION: nsExceptionManager not thread-safe

at the stack:

 5  libxul.so!nsExceptionManager::Release [nsExceptionService.cpp:031018350d0f :
    eip = 0x02860530   esp = 0xbfe7ac00   ebp = 0xbfe7ac38   ebx = 0x0357750c
    Found by: call frame info
 6  libxul.so!nsExceptionService::DoDropThread [nsExceptionService.cpp:031018350
    eip = 0x0285f744   esp = 0xbfe7ac40   ebp = 0xbfe7ac68   ebx = 0x0357750c
    esi = 0x0840f998
    Found by: call frame info
 7  libxul.so!nsExceptionService::DropAllThreads [nsExceptionService.cpp:0310183
    eip = 0x0285f782   esp = 0xbfe7ac70   ebp = 0xbfe7ac78   ebx = 0x0357750c
    esi = 0x0840f998
    Found by: call frame info
 8  libxul.so!nsExceptionService::Shutdown [nsExceptionService.cpp:031018350d0f 
    eip = 0x0285fa47   esp = 0xbfe7ac80   ebp = 0xbfe7ac98   ebx = 0x0357750c
    esi = 0x0840f998
    Found by: call frame info
 9  libxul.so!nsExceptionService::Observe [nsExceptionService.cpp:031018350d0f :
    eip = 0x0285fa93   esp = 0xbfe7aca0   ebp = 0xbfe7aca8   ebx = 0x0357750c
    esi = 0x0840f998
    Found by: call frame info
10  libxul.so!nsObserverList::NotifyObservers [nsObserverList.cpp:031018350d0f :
    eip = 0x027fb5e3   esp = 0xbfe7acb0   ebp = 0xbfe7acd8   ebx = 0x0357750c
    esi = 0x0840f998
    Found by: call frame info
11  libxul.so!nsObserverService::NotifyObservers [nsObserverService.cpp:03101835
    eip = 0x027fc6ff   esp = 0xbfe7ace0   ebp = 0xbfe7ad18   ebx = 0x0357750c
    esi = 0x0840f998
    Found by: call frame info
12  libxul.so!mozilla::ShutdownXPCOM [nsXPComInit.cpp:031018350d0f : 624 + 0x3b]
    eip = 0x027df6d6   esp = 0xbfe7ad20   ebp = 0xbfe7ad98   ebx = 0x0357750c
    esi = 0x0840f998
    Found by: call frame info


Seems like a bug in nsExceptionService, perhaps?

Josh Matthews [:jdm]

Assignee

Comment 83

•

14 years ago

There's also this failure:

TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/mozilla-central_fedora64_test-xpcshell/build/xpcshell/tests/services/sync/tests/unit/test_records_crypto.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>
Setting the identity for passphrase
TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-INFO | (xpcshell/head.js) | test 2 pending
2010-10-22 13:21:27	Test                 INFO	Setting up server and authenticator
2010-10-22 13:21:27	Test                 INFO	Generating keypair + symmetric key
2010-10-22 13:21:27	PubKeyManager        DEBUG	Generating RSA keypair
2010-10-22 13:21:27	PubKeyManager        DEBUG	Generating RSA keypair... done
2010-10-22 13:21:27	Test                 INFO	Setting up keyring
2010-10-22 13:21:27	Test                 INFO	Creating a record
TEST-PASS | /home/cltbld/talos-slave/mozilla-central_fedora64_test-xpcshell/build/xpcshell/tests/services/sync/tests/unit/test_records_crypto.js | [run_test : 80] http://localhost:8080/crypto/steam == http://localhost:8080/crypto/steam
TEST-PASS | /home/cltbld/talos-slave/mozilla-central_fedora64_test-xpcshell/build/xpcshell/tests/services/sync/tests/unit/test_records_crypto.js | [run_test : 81] ../crypto/steam == ../crypto/steam
2010-10-22 13:21:27	Test                 INFO	Encrypting a record
2010-10-22 13:21:27	PubKeyManager        TRACE	Importing record: http://localhost:8080/keys/pubkey
2010-10-22 13:21:28	Net.Resource         DEBUG	GET success 200 http://localhost:8080/keys/pubkey
2010-10-22 13:21:28	PrivKeyManager       TRACE	Importing record: http://localhost:8080/keys/privkey
before 1576960, after 1417216, break 01db7000
2010-10-22 13:21:28	Net.Resource         DEBUG	GET success 200 http://localhost:8080/keys/privkey
before 1421312, after 1413120, break 01db7000
TEST-UNEXPECTED-FAIL | (xpcshell/head.js) | 2147500037

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 84

•

14 years ago

(In reply to comment #82)
> Seems like a bug in nsExceptionService, perhaps?

I talked to jst about this, and he sounded ok with just making the reference counting threadsafe in this case.

http://hg.mozilla.org/mozilla-central/file/3a5b84a290d7/xpcom/base/nsExceptionService.cpp#l95 already documents that it can be released on a different thread during shutdown.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

14 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Josh Matthews [:jdm]

Assignee

Updated

•

14 years ago

Depends on: 606592

Josh Matthews [:jdm]

Assignee

Comment 85

•

14 years ago

Turns out that the patch in bug 606592 to address comment 84 also resolves the quite mysterious test_records_crypto.js failure.  I can't explain where the failure comes from, but I've got multiple try runs demonstrating completely green trees with that patch applied.

Josh Matthews [:jdm]

Assignee

Comment 86

•

14 years ago

Oh joy. Some more try runs revealed that the test_records_crypto.js failure is alive and well, despite the patches in bug 606592 and bug 450812.  More investigation required.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 88

•

14 years ago

Great, if you can fix that maybe we can land this before anyone else accidentally adds new dependencies on not GCing :-)

(no longer active)

Comment 91

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/7ee0fe6aec58

Status: REOPENED → RESOLVED

Closed: 14 years ago → 13 years ago

Resolution: --- → FIXED

Whiteboard: fixed-in-cedar

Target Milestone: --- → mozilla7

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 92

•

13 years ago

Yeeehaw! Thank you Josh!

David Mandelin [:dmandelin]

Comment 93

•

13 years ago

Josh, jsreftests have started intermittently timing out on Android (see bug 662468), with a lot of GC spew from xpcshell. Do you think this bug might have introduced that?

Joel Maher ( :jmaher ) (UTC -8)

Comment 94

•

13 years ago

for the tegras we run xpcshell on a host computer and access it remotely from Android.  Maybe it is related to this patch, but it wouldn't be related to httpd.js and xpcshell binary.

Josh Matthews [:jdm]

Assignee

Comment 95

•

13 years ago

If this patch is responsible for a change in behaviour, it would be that httpd.js is now GCing more frequently. As Joel points out, that means a remote machine is doing the GCing, not the fennec process on Android. I'm having a hard time coming up with any reasons this patch could be causing the jsreftest problems.

cmtalbert

Comment 96

•

13 years ago

(In reply to comment #95)
> If this patch is responsible for a change in behaviour, it would be that
> httpd.js is now GCing more frequently. As Joel points out, that means a
> remote machine is doing the GCing, not the fennec process on Android. I'm
> having a hard time coming up with any reasons this patch could be causing
> the jsreftest problems.

See bug 648104, comment 9.  One of the issues with the reftest family of tests on android is that we are forced to run all reftests at a very high resolution (for a tegra), which significantly impacts available memory on the tegras.

Josh Matthews [:jdm]

Assignee

Comment 97

•

13 years ago

(In reply to comment #96)
> See bug 648104, comment 9.  One of the issues with the reftest family of
> tests on android is that we are forced to run all reftests at a very high
> resolution (for a tegra), which significantly impacts available memory on
> the tegras.

Yes, but that was the case before this patch as well, so I don't see how that's relevant.

David Mandelin [:dmandelin]

Comment 98

•

13 years ago

(In reply to comment #95)
> If this patch is responsible for a change in behaviour, it would be that
> httpd.js is now GCing more frequently. As Joel points out, that means a
> remote machine is doing the GCing, not the fennec process on Android. I'm
> having a hard time coming up with any reasons this patch could be causing
> the jsreftest problems.

Josh, thanks for the explanation: I agree, it is almost certainly not due to the GC change.

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

5 years ago

Blocks: 1597227

Patch httpd.js to gc() in _connectionClosed [Backout: Comment 28] 15 years ago Chris Pearce [:cpearce (Not reading bugmail)] 1.08 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Periodically gc in the server process 15 years ago Jeff Walden [:Waldo] 1.27 KB, patch		Details \| Diff \| Splinter Review
fix do_timeout [checked in comment #50] 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 3.01 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
fix staller.sjs 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 2.93 KB, patch	Waldo : review-	Details \| Diff \| Splinter Review
fix subtle QI bug in test_cookie_header.js [checked in comment 50] 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 1.09 KB, patch		Details \| Diff \| Splinter Review
fix staller.sjs (v2) [checked in comment 70] 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 1.57 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Fix RawTest QI issue [checked in comment 70] 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 1.61 KB, patch		Details \| Diff \| Splinter Review
Fix additional RawTest bugs [checked in comment 70] 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 1.37 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Additional diagnostics to report more data about exceptions in tests 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 10.12 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Additional logging and asyncWait change 15 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 2.70 KB, patch		Details \| Diff \| Splinter Review
Ensure input stream lifetime is not cut short by a GC. [checked in comment 80] 14 years ago Josh Matthews [:jdm] 1.42 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Additional diagnostics to report more data about exceptions in tests [checked in comment 80] 14 years ago Josh Matthews [:jdm] 10.42 KB, patch		Details \| Diff \| Splinter Review