Closed Bug 662555 Opened 13 years ago Closed 13 years ago

Race condition when measuring requestEnd

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

Attachments

(1 file, 2 obsolete files)

nsHttpChannel::OnStopRequest calls gHttpHandler->CancelTransaction.  That method posts an event to the socket thread that cancels the transaction asynchronously (by call to nsHttpTransaction::Close).  In that method we stamp the requestEnd time.   Sometimes this event fires later then we read the value of requestEnd time on the main thread, giving us a null value for it.

The time (also more precise) has be stamped at nsHttpTransaction::HandleContent when we detect eof.


In general we should check if the timings are (and can be/have to be) in sync with the activity distributor.
Blocks: 576006
Would it be good enough to set the requestEnd timestamp in nsHttpChannel::OnStopRequest?  AFAICT we really only care about when the data finished getting to the user, not when the transaction was cancelled asynchronously.
We could do that in OnStopRequest, right, but just to note that this is also an event posted from the socket thread to the main thread.  So, there is some delay.  Question is, do we want to include that delay or not?  All other measures are captured on the socket thread, so why not this one as well?  Also these should be pure-as-can-be network only timings.

Also it shows that we don't protect the timings with a mutex.  I wouldn't actually add a mutex (negative perf effect) but rather (if possible, wouldn't break the web timing API) make the values accessible only after OnStopRequest has been called, e.g. mPending == false.  But it is not for this bug.

Adding a hard block for bug 658894 as it shots leak test down (undetected by try before).
Blocks: 658894
> We could do that in OnStopRequest, right, but just to note that this is also an event posted from the socket thread to the main thread.  So, there is some delay.  Question is, do we want to include that delay or not?

Well, OnStop is when the channel "ends" from the client's perspective, so it seems like a good place to take the measurement.

>  All other measures are captured on the socket thread, so why not this one as well? 

Some events (DNS resolution, TCP connection, etc) do not get notifiction on the user thread, so we can't measure them from there.  But the timings are ultimately for a user to understand how much time in the network layer their code takes, and that include whatever delay we take in posting OnStop to the main thread.   I'm not too worried about which thread we measure which event on, as long as the metrics are sensible.  And OnStopReq seems the sensible place for requestEnd.
There are 3 points of view IMO:

1. To be precise.  To exactly measure the times we spend by some activities.  We are using this API for our C++ coded telemetry, that has to be precise as it can be, but we can use different ways to get the telemetry, so not a strong argument...

2. To be compatible with other ways we measure.  Mainly with the activity observer that Firebug uses.  NS_HTTP_ACTIVITY_SUBTYPE_RESPONSE_COMPLETE is what Firebug uses for stamping the "response end" time and it is exactly at the place I suggest, [1].  I checked we are compatible for the rest of the notification, which is good.

3. Deliver what Web Timing API wants:  "This attribute (responseEnd) must return the time immediately after the user agent receives the last byte of the current document or immediately before the transport connection is closed, whichever comes first. The document here can be received either from the server, relevant application caches or from local resources."  Also, there is a web version of Firebug that would like to use it.  It wouldn't be good to give different numbers using XPI Firebug and web Firebug.


So, it is clear this needs to be put to EOF detection, the same place where the activity observer notifies.


[1] http://hg.mozilla.org/mozilla-central/annotate/e319657f9c92/netwerk/protocol/http/nsHttpTransaction.cpp#l1125
Attached patch v1 (obsolete) — Splinter Review
This is a patch that puts the stamp to the right place.  We have to leave it as well in nsHttpTransaction::Close when we get an error state.  In case of an error that method seems to be called before nsHttpChannel::OnStopRequest, so we are safe.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #540246 - Flags: review?(jduell.mcbugs)
Comment on attachment 540246 [details] [diff] [review]
v1

Review of attachment 540246 [details] [diff] [review]:
-----------------------------------------------------------------

> This attribute (responseEnd) must return the time immediately 
> after the user agent receives the last byte of the current document 
> or immediately before the transport connection is closed, whichever comes first

You're right.
Attachment #540246 - Flags: review?(jduell.mcbugs) → review+
thanks for fixing this! but shouldn't this have a TimingEnabled() check? I guess I forgot that in the original codepath...
(In reply to comment #8)
> thanks for fixing this! but shouldn't this have a TimingEnabled() check? I
> guess I forgot that in the original codepath...

Probably should.  I can do that in a followup.
Backed out for apparently causing reftest assertion failures on Windows debug: http://hg.mozilla.org/integration/mozilla-inbound/rev/3890c438f22d
Whiteboard: [inbound]
I have no idea why this should cause reftest assertions.  Unfortunatelly logs are gone by now.

I believe this was backed out because the original version of the bug 658894 patch was causing assertion failures (changeset fcf08eff68eb) but a new version currently landed on m-c (eb5866601f88) doesn't suffer from it.

I vote for just relanding this.
Attached patch v1.1 (obsolete) — Splinter Review
Same as the v1 patch just added the TimingEnabled() check.
Attachment #540246 - Attachment is obsolete: true
Attachment #550146 - Flags: review+
Hmm... looks like the spec has changed in the mean time:

responseEnd: "This attribute must return the time immediately after the user agent finishes receiving the last byte of the resource from from relevant application caches or from local resources."

There is no more anything about closing the connection.

I will update the patch.
Attached patch v2Splinter Review
This will also fix bug 673226.

http://tbpl.mozilla.org/?tree=Try&rev=07e1908a3b94
Attachment #550146 - Attachment is obsolete: true
Attachment #550161 - Flags: review?(jduell.mcbugs)
Blocks: 673226
Attachment #550161 - Flags: review?(jduell.mcbugs) → review+
Keywords: checkin-needed
Sent to try:
http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=626e32292fdc

Assuming all green, will push to inbound after.

I'm happy to fix it this time, but for the future please can you make sure the format (context, commit message, author) is correct:
http://blog.bonardo.net/2010/06/22/so-youre-about-to-use-checkin-needed

Thanks :-)
Target Milestone: --- → mozilla9
http://hg.mozilla.org/mozilla-central/rev/95996d0763a6
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: