Closed Bug 1165314 Opened 9 years ago Closed 9 years ago

slow transfer times between EC2 and S3

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: q)

References

()

Details

(Whiteboard: [windows])

Attachments

(9 files, 1 obsolete file)

      No description provided.
Attached image moz_usw2_vs_usw1.jpg
Q -> AWS
--------

We have several windows nodes in in use1 that are timing out jobs uploading to taskcluster-public-artifacts.s3-us-west-2.amazonaws.com. I can show fast results to other sites and speedtests as well as s3 browser connections to usw1. However, we show incredibly poor speeds of of 2mb/s or less uploading to any bucket in usw2 (Oregon). See below attachment of an s3 browser session the top file is transferring to usw2 from use1 and the second is transferring to usw1 from use1. I have found other users with complaints about slow uploads to usw2 but no identified fixes. This is stopping us from moving build production into AWS services and any help would be greatly appreciated.
Instance ID(s): i-5a2d43a6
AWS -> Q
--------
4/12/2015

G'day,

Thanks for logging this query with us.

I am going to set up a test environment and try to see if I can get the same results with cross region uploads to an S3 bucket from use1 to usw2. This is going to take me a little time and I would also like to chat to one of my S3 experts here to see if we can identify any possible causes, so expect to hear from me tomorrow. 

In the meantime, would it be possible for you to run a tcptraceroute so that we can get some idea if there is any latency at any connection points along the network path?

http://simulatedsimian.github.io/tracetcp.html

Any results you can give me would be much appreciated, else just wait to hear from me.

Best regards,

Karl G.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
AWS -> Q
--------
4/12/2015

Hi,

Further to my last response I have now configured an EC2 Windows 2012 Server in US-East-1 and an S3 bucket in US-West-2.

Strangely enough I did not see any performance issues when copying over a 100MB file using s3 browser.

I then went back to the instance id that you supplied and I noticed that you have quite a few rules in terms of routing.

I then had a look at the various S3 endpoints to try to determine what underlying IP addresses may be getting used:

http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

When I pinged s3-us-west-2.amazonaws.com I received an endpoint IP address of 10.12.13.162

When I pinged s3-us-west-1.amazonaws.com I received an endpoint IP address of 54.231.232.192

Looking at your routing rules for your instance it seems that traffic going to 10.12.13.162 would be directed via your VPN gateway, whereas traffic to 54.231.232.192 would be directed to an Internet Gateway.

I am beginning to think that this may be where the problem originates. To determine if this theory holds any weight, it is now vitally important that I get the output of tcptraceroute to both destinations or at the very least the output of the tracert command to both destinations, as this will confirm whether or not a different network path is being used.

I would also suggest modifying the routing table to route traffic to S3 endpoints in US-West-2 via an Internet Gateway to see if this improves performance, as this will further help us determine the root cause.

I look forward to seeing your responses and working with you further to find a permanent solution to this issue.

Best regards,

Karl G.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
Q -> AWS
--------
4/13/2015

Thanks for the reply. I will double check the routing issues. I have never received a reply in that 10. address range but it may be intermittent based on your test results.
Q -> AWS
--------
4/13/2015 (attached file has been deleted)

Here  is a trace during a slow upload  ignoring only port 3389 and host  169.254.169.254. I can't seem to find a vgw route or after the first pass any relation 10.12.13
AWS -> Q
--------
4/14/2015

Thanks for the packet capture. 

It does provide some interesting information that would on the surface seem to invalidate my theory.

The packets I was interested in were 4, 58, 38 and 110. These packets seem to show IP addresses that fall within your routing range to an IGW.

What I would really like to do now is further confirm this via output of the tracert command.

-----

To gather this information, when the slowness is being experienced please run the following commands from a Windows command prompt (preferably on the the same server that your are seeing the slow uploads on):

tracert taskcluster-public-artifacts.s3-us-west-2.amazonaws.com > trace1.txt

tracert taskcluster-public-artifacts.s3.amazonaws.com > trace2.txt

tracert s3-us-west-2.amazonaws.com > trace3.txt

-----

Once these commands are completed, please attach the 3 files (trace1.txt, trace2.txt, and trace3.txt) to the case.

I will then review these files and determine the next steps.

Best regards,

Karl G.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
Attached file trace1.txt
Traces captured during a slow uplaod
Attached file trace2.txt
Attached file trace3.txt
AWS -> Q
--------
4/18/2015

Hi,

Thanks for the traceroute output.

The data in these files is quite interesting in that it clearly shows the S3 endpoints you are connecting to in the 54.231.x.x address range.

Looking at your routing table associated with your instance I see a matching route via an IGW, so this would clearly show that my theory that traffic was being routed via your VPN gateway is incorrect.

I have now re tested the copy using my own instance in US East 1 to a bucket in US West 2 and again I can see no performance degradation. 

The next steps to try to narrow this down is going to be twofold.

On my side I am going to refer this case to our S3 operations team to ascertain if they know of any known reason to explain this behaviour.

In addition to this I would also like to continue testing with you to try to more clearly define the problem area. To this end could you please launch a new instance to test from with the following constraints:
- Use a standard Amazon provided AMI, such as a Windows 2008R2 Base image (a t2.medium should suffice)
- Do not install any additional software on this instance or join it to any domain
- Try to launch in a subnet that has minimal routes, ie. one that does not contain more than one or two routes with default traffic (0.0.0.0/0) being directed to an IGW.
- Test uploads to your bucket

My goal with this test is to try to narrow down the component areas where problems can be occurring. At this stage I am unsure whether the slowdown is from inside the instance or not, as the tracert output does not show anything untoward I need to gain a better understanding of what may be influencing the speed issues you are seeing.

Ideally I would like to work with you closely on this one and would really like to organise a phone call and screen sharing session with you to undertake the testing I have just outlined. That way I can gauge results in real time and also possibly see the slow down issue you are experiencing first hand.

I am based in Sydney, Australia so time overlap will need to be managed, however if you are located on the West Coast US I am confident we can organise a testing session afternoon your time, please let me know if you are agreeable.

Best regards,

Karl G.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
AWS -> Q
--------
4/18/2015

Hi,

I wanted to provide you an update on Karl G.'s behalf with some additional information that can help us narrow down the issue.

Looking into your data, it seems you are hitting S3 via 54.231.161.145
The route that matches this sends to IGW -- Route Table: rtb-b3d03cd6:
54.230.0.0/15	igw-09b7cc67	active

In addition to reproducing the issue from a new subnet per Karl's suggestion. Can we confirm the IPs you are hitting when you see slow performance (available via packet capture)? I'll include this in the information we request below.

Can you also work on capturing the info below for slow and fast requests to give us more context and ability to trend commonalities for the slow requests?
* What is the perceived difference between the slow and previously fast requests? Do you have throughput, latency and/or request rate metrics for comparison? I noticed that you mentioned 2Megabits/s throughput, what do you see for the fast us-east-1 -> us-east-1 uploads?
* Are the slow uploads intermittent or persistent?
* Are you seeing performance degradation from any other request type(s) like GETs, if so which ones?
* Can you provide a minimal packet capture of the slow request (will confirm IP address being used for slow request)?

For both a slow and non-slow request, can you please provide us with the following?
* Request details
  - Bucket and object (i.e. s3://bucket/path/to/object)
  - The DNS name you're using to resolve S3 (i.e. s3-us-west-2.amazonaws.com)
  - The specific S3 IP addresses serving the slow request (IP addresses can be obtained with tools like netstat or packet captures). A TCP traceroute (and MTRs if the issue is intermittent) to the specific IP address while you are seeing slow performance could also be helpful to help us track down where the issue may be coming from.
* What are the details of the responses from S3 for the requests? Please be sure to include:
  - x-amz-request-id (also known as a Request ID)
  - x-amz-id-2 (also known as HostID or Extended Request ID)

For example, they may look like this in the HTTP response:
< x-amz-request-id: 644A0AD8A4B48C5D
< x-amz-id-2: OeA6HPuuFKM+nGyaxajic9mAZyvJiGQHq4x+pUmYRc/YWlhLv1dnEGHA3m21R2o/

The above information will allow us to see the network state between you and S3, as well as provide us information to track individual slow requests. We thank you for your patience and look forward to hearing from you so we can further assist. Please let us know if you have any questions about any of this.

Best regards,

Joe C.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
Q -> AWS
--------
4/27/2015

I was out of town for the lat week and we are still seeing the issue and need the case reviewed further. I will attach a new trace ( in addition to the first)  today showing throughput speeds of 60 to 100 mb/s from use1 to use1  and from use1 to usw1 at the same time as slow speeds are observed with usw2. If you review  the first case attachment you will see the live comparison:

attachment 8606276 [details]

Q
AWS -> Q
--------
4/28/2015

Hello,

To follow up on Joe C's behalf, we would like to once more request from you for further details on the slow and non-slow request.  

	• Do you have throughput, latency, and/or request rate metrics for both cases? 
	• Could you provide us with the speed differences between the slow and the non-slow request?
	• Are the slow upload speeds intermittent or persistent?
	• Are you still experiencing the slow upload speeds?
	• Are you seeing performance degradation from any other request type(s) like GETs, if so which ones?
	• Please provide a minimal packet capture of the slow request (this will confirm the IP address being used for slow request)?
	• The specific S3 IP addresses serving up the slow request ( this can be found using tools such as netstat or packet captures using winpcap).
	• A TCP trace route to the specific S3 IP address while you are experiencing the slow performance.
	• MTR output to the specific S3 IP address while the slow performance is occurring.
	• S3 address of the bucket and object ( e.g. s3://bucket/path/to/object )
	• What are the details of the responses from S3 for the requests? 
		○ Please be sure to include:
		○ x-amz-request-id (also known as a Request ID)
		○ x-amz-id-2 (also known as HostID or Extended Request ID
	• For example, they may look like this in the HTTP response:
	< x-amz-request-id: 644A0AD8A4B48C5D >
	< x-amz-id-2: OeA6HPuuFKM+nGyaxajic9mAZyvJiGQHq4x+pUmYRc/YWlhLv1dnEGHA3m21R2o/
	
Again we apologize for the need to ask you once more for these questions. However these further details would help us better diagnose your current issue. If you have any questions regarding the topics covered above please let us know by responding to this case.

Best regards,

Yu A.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
Q -> AWS
--------
4/29/2015 (attachments no longer exist)

The below are S# uploads from an EC2 host in USE1 

transfer to USW2:

http://www.filedropper.com/traceduringusw2transfer

and the same file being transferred to USE1:

http://www.filedropper.com/traceduringuse1transfer
AWS -> Q
--------
4/30/2015

Hello, 

I understand that you are experiencing upload performance issues to s3. After reviewing the information on this case I looked into the instance further and found that the route table is overly complex which may be a bottleneck. I would recommend simplifying this to alleviate issues. Please see the following information for more details on route tables: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Route_Tables.html
 If you are doing this for security reason I would suggest using network ACLs:   http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_ACLs.html 
Please make sure to never hard code AWS end points, as our endpoint change. To continue the investigation of this issue if you can provide the following information. 
-Have you been able to test this from a different subnet? 
-What is the DNS name you are using to resolve S3?
-What is the object path in S3?
-The most important item to help this move forward are 
- x-amz-request-id (also known as a Request ID)
- x-amz-id-2 (also known as HostID or Extended Request ID)

For example, they may look like this in the HTTP response:
< x-amz-request-id: 644A0AD8A4B48C5D
< x-amz-id-2: OeA6HPuuFKM+nGyaxajic9mAZyvJiGQHq4x+pUmYRc/YWlhLv1dnEGHA3m21R2o/

Best regards,

Larry B.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
Q -> AWS
--------
5/4/2015

It appears that information is note record in the default log for s3 browser I will craft a request to upload a simple file to get a request id.

As a side note this is happening to another of our users from usw2 downloading to usw1 in case id 1391313231
AWS -> Q
--------
5/4/2015

Hello,

We can see your other case, and an engineer is already in touch with your colleague investigating the issue.
Meanwhile, when you are ready with the requested info please sent it to us, so our colleagues from S3 team could further troubleshoot this issue.

Best regards,

Ionut D.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
Q -> AWS
--------
5/5/2015

Hello folks,

 Thank you for the follow up below is the data requested. We appreciate any more insight you can offer this issue is killing out performance.   

-Have you been able to test this from a different subnet? 
1) Yes we have been able to replicate on multiple sub nets in usw1 ec2 to usw2 s3 (multiple buckets) and use1 usw2 s3 (mutiple buckets) 
2) We have been able to replicate it with a stock AMI outside our vpc from use1 ec2 to usw2 s3 (multiple buckets)
 
-What is the DNS name you are using to resolve S3?
1) In the case of the test traces attached to this ticket ( as the trace should show ): https://mozilla-releng-test-usw2.s3.amazonaws.com/
2) As mentioned earlier in the ticket production is using:  taskcluster-public-artifacts.s3-us-west-2.amazonaws.com


 -What is the object path in S3?
This was the slow file upload test from the attached traces: 
https://mozilla-releng-test-usw2.s3.amazonaws.com/pack-1b55b787af1ebc771efa36deff74b4fdf1c4b719.pack

 -The most important item to help this move forward are - x-amz-request-id (also known as a Request ID)
Per the trace the request id for the slow test attached is: x-amz-request-id: B14E757E8D50E302

 - x-amz-id-2 (also known as HostID or Extended Request ID)
Per the trace the x-amz-id-2 for the slow test attached is: aVuBRp17TiOO44Fg9gjNqJ7RkYz14Jnfjtd8g2+ovBbHyrckid7qKdHZaSxGN9ow/K52KYicrjA=
AWS -> Q
--------
5/6/2015

Hi,

As we looked into your traceroutes, it seems the latency is not abnormally high.

We also tried to look into your request IDs, but the request ID provided was for a HEAD request, rather than a PUT request so we were unable to provide much insight from the S3 side about the slow PUT request.

Can you gather Request ID and Extended Request ID (a.k.a. x-amz-id-2 or HostID) for a slow upload request?
As well, can you help obtain a packet capture of this transfer?

> 2) We have been able to replicate it with a stock AMI outside our vpc from use1 ec2 to usw2 s3 (multiple buckets) 
Is this instance still available for us to look at configured with it's original network configuration? Can you provide us with the instance ID of this test so we can take a look at it's network configuration? If not, can you reproduce the slow behavior outside of your VPC (on a subnet with only route going to IGW) and provide us the instance ID of the instance?

We are still having difficulties reproducing this with a quick test from test instance (with only route being 0.0.0.0/0 to IGW) in EC2 (VPC) in us-east-1 to S3 in us-west-2. We were able to transfer 491MB in approx. 11s = 44.636363636 MB/s with similar latency values you are seeing in your traceroutes. I'll attach my testing data at the end of this correspondence. Please let us know if you have any questions about any of this.

[ec2-user@ip-10-0-0-107 ~]$ ls -hl
total 0
-rw-rw-r-- 1 ec2-user ec2-user 491M May  6 23:23 bigfile

[ec2-user@ip-10-0-0-107 ~]$ time aws s3 cp bigfile s3://superjoeuswest2/test100mb
upload: ./bigfile to s3://superjoeuswest2/test100mb

real	0m10.995s
user	0m2.800s
sys	0m0.772s

[ec2-user@ip-10-0-0-107 ~]$ sudo traceroute -P TCP -p 80 superjoeuswest2.s3-us-west-2.amazonaws.com
traceroute to superjoeuswest2.s3-us-west-2.amazonaws.com (54.231.161.153), 30 hops max, 60 byte packets

 1  * * *
 2  * *^C[ec2-user@ip-10-0-0-107 ~]$ sudo traceroute -T -p 80 superjoeuswest2.s3-us-west-2.amazonaws.com
traceroute to superjoeuswest2.s3-us-west-2.amazonaws.com (54.231.162.105), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  205.251.245.52 (205.251.245.52)  71.050 ms  72.238 ms  71.006 ms
 6  205.251.244.92 (205.251.244.92)  65.516 ms 205.251.244.100 (205.251.244.100)  77.218 ms  76.728 ms
 7  54.239.41.29 (54.239.41.29)  69.518 ms 205.251.244.112 (205.251.244.112)  75.130 ms 54.239.41.189 (54.239.41.189)  71.466 ms
 8  205.251.232.101 (205.251.232.101)  64.489 ms 205.251.232.213 (205.251.232.213)  65.099 ms 205.251.232.105 (205.251.232.105)  69.931 ms
 9  205.251.225.56 (205.251.225.56)  64.463 ms 205.251.232.189 (205.251.232.189)  71.342 ms 205.251.225.56 (205.251.225.56)  64.415 ms
10  205.251.232.207 (205.251.232.207)  76.899 ms  75.659 ms 205.251.232.103 (205.251.232.103)  69.474 ms
11  * * *
12  * * *
13  s3-us-west-2-r-w.amazonaws.com (54.231.162.105)  66.439 ms  66.659 ms  70.864 ms

Best regards,

Joe C.
Amazon Web Services
We value your feedback. Please rate my response using the link below.
===================================================
Q -> AWS
--------
5/13/2015

After spinging up instances again and checking in on use1 we are still seeing the problem. I think our next steps here are to setup a new subnet in our vpc and attach a routing table we can change with impunity for testing. Since  I am hesitant to add the layer of a cygwin environment to the issue I don't want to use the s3 api tools directly. Is there a better tool than s3browser to get you the data you requested on a windows platform ( the problem is more visible on our windows builders ) ?
Q -> AWS
--------
5/14/2015 (uploaded trace did not capture the transfer)

New transfer from instance ID i-3cf937c3

This instance id is in a new Subnet with only 2 entries in its route table.

Below is the info for the transfer and the packet trace 

x-amz-id-2: 9Nl6L7FyRmcd3Pd3gMNku7P8y1RYVMXrT6vweOhYC+u4J+ggbN7UYVAO8hDtcMjbNqFgUe5ZFhw=
x-amz-request-id: BD917EF9AB19930C
Date: Thu, 14 May 2015 03:59:39 GMT, ETag: "97a8cec0cc5d5f2ca3d2bdbfb2533694"

Trace capture here ( in the same bucket that the test was run on) :

https://mozilla-releng-test-usw2.s3.amazonaws.com/usw2_transfer_slow%20from_use1.pcapng
From the etherpad in https://etherpad.mozilla.org/slowwin-test

Testing on a 2008 host in our VPC in use1, hitting dedicated S3 buckets in use1, usw1, and usw2.  Host is on a dedicated subnet in our VPC with a dedicated default-to-IGW route table.

Testing is with s3cmd.  For "new" buckets without DNS propagation, s3cmd's PUT gets a 307 redirect even before the entity-body is sent, followed by an RST.  s3cmd sees this as a connection failure and retries.  This resolves itself once DNS for the bucket propagates, so we've used only propagated buckets here.  I've disabled multipart uploads so that aborted uploads don't stick around.  We're using HTTP (no SSL) so that packet traces are intelligible.

  python s3cmd put --debug --disable-multipart file1m s3://mozilla-releng-slowwin-test-usw2/file1m

DNS resolution:
 mozilla-releng-slowwin-test-use1.s3.amazonaws.com is an alias for s3-1-w.amazonaws.com.
 mozilla-releng-slowwin-test-usw2.s3.amazonaws.com is an alias for s3-us-west-2-w.amazonaws.com.
 mozilla-releng-slowwin-test-usw1.s3.amazonaws.com is an alias for s3-us-west-1-w.amazonaws.com.

Note that the hosts in our VPC use recursive DNS servers in our datacenter in santa clara for DNS resolution; any geo-based resolution will interpret the request as coming from the west coast, even though the instance is in us-east-1.

Issues:
 - within use1, fantastically slow (10's of K/s) transfer times *sometimes* from windows hosts
   - rarely replicated; this is probably the key issue
 - from EC2/use1 -> S3/usw2, slow speeds (100K/s), possibly limited by window size
Attached image transfer-rates.png
We ran this python script overnight on the windows test host in use1, hoping to see a pattern in transfer rate variation.

----
import time
import subprocess
import re

cmd = "python s3cmd put --disable-multipart file1m s3://mozilla-releng-slowwin-test-usw2/file1m"
while True:
        t = time.time()
        output = subprocess.check_output(cmd, shell=True, stderr=subprocess.PIPE)
        rate, mult = re.search(r', ([0-9]+.[0-9]+) (.B)/s', output).group(1, 2)
        rate = float(rate)
        if mult == 'kB':
                rate *= 1024
        elif mult == 'mB':
                rate *= 1024 * 1024
        print t, rate
        time.sleep(10)
----

The results are in the attached graph.  The rate is in bytes/sec.  Minimum 100003.84, maximum 138885.12, average 118909.8596.  So consistently between about 100K/s and 135 K/s.  So this is the second issue identified in comment 22: consistently slow transfers from EC2/use1 to S3/usw2.  From a Linux system in the same subnet, with the same script and the same version of s3cmd, I can consistently get about 1300 K/s, or about 10x faster.  If I disable window scaling (echo 0 > /proc/sys/net/ipv4/tcp_window_scaling, which limits the window to 64K), I get about 700 K/s; only 5x faster than Windows.  That puts the blame pretty squarely on the Windows TCP stack; window scaling accounts for part of the problem, and I think it's reasonable to assume that the remainder of the problem is other un-accounted-for TCP tuning issues.

Long-fat-network issues start to crop up when TCP's window size (the maximum amount of data allowed "in flight" at any one time) is smaller than the bandwidth-delay product for the network (the amount of data the network itself is capable of holding in flight).  Flipping over our bar napkin, we have a 64k window, and the RTT (delay) from use1 to usw2 is about 75ms based on the packet traces we saw.  So 64k = BW * 75ms gives a bandwidth of about 860K/s.  The constrained Linux trials above are pretty close to this theoretical maximum.

Worth noting that window scaling doesn't seem to make a difference for transfers within use1 -- that's not a long fat network.

Our conversation with Amazon should, then, focus on the first issue in comment 22, which is about intermittent *very* slow transfer speeds.  I don't seem to have caught any of those overnight (they'd appear in this graph).  However, in my manual testing I did see it once:

----
1MB from win test host to use1, no TLS: successful transfer with some sequence errors
776 frames, wireshark says 0.092886 RTT
window size max is 64k
PUT request in frame at 3.047296; PUT completely acked at 20.702331; 58KB/s
  (s3cmd reported 57.8KB/s.. so believable)
max win size 64k (no scaling)
----

That's a 92ms RTT *within* use1.  Now, if we could just replicate that behavior we could start to dig into the "why".
Q got window scaling enabled, and the windows are indeed bigger now.  Some more runs of sx_time.py still show the same range of transfer rates, though!  Here's a sample of the dump (via tcpdump, which shows the unscaled window value)

13:56:14.160998 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [SEW], seq 1783242456, win 8192, options [mss 1460,nop,wscale 14,sackOK,TS val 3964277 ecr 0], length 0
> 13:56:14.229174 IP 54.231.161.34.80 > 10.134.160.179.56696: Flags [S.], seq 1895321803, ack 1783242457, win 8190, options [mss 1460,nop,wscale 6,nop,nop,sackOK], length 0
> 13:56:14.229192 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], ack 1, win 4, length 0
> 13:56:14.229233 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [P.], seq 1:456, ack 1, win 4, length 455
> 13:56:14.230966 IP 54.231.161.34.80 > 10.134.160.179.56695: Flags [.], ack 2, win 127, length 0
> 13:56:14.315370 IP 54.231.161.34.80 > 10.134.160.179.56696: Flags [P.], seq 1:403, ack 456, win 127, length 402
> 13:56:14.315899 IP 54.231.161.34.80 > 10.134.160.179.56696: Flags [P.], seq 403:408, ack 456, win 127, length 5
> 13:56:14.315911 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], ack 408, win 3, length 0
> 13:56:14.326430 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [P.], seq 456:996, ack 408, win 3, length 540
> 13:56:14.326507 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], seq 996:2456, ack 408, win 3, length 1460
> 13:56:14.326513 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], seq 2456:3916, ack 408, win 3, length 1460
> 13:56:14.326515 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [P.], seq 3916:5092, ack 408, win 3, length 1176
> 13:56:14.326579 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], seq 5092:6552, ack 408, win 3, length 1460
> 13:56:14.326581 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], seq 6552:8012, ack 408, win 3, length 1460
> 13:56:14.420885 IP 54.231.161.34.80 > 10.134.160.179.56696: Flags [.], ack 8012, win 32746, length 0
> 13:56:14.420901 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [P.], seq 8012:9188, ack 408, win 3, length 1176
> 13:56:14.420914 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], seq 9188:10648, ack 408, win 3, length 1460
> 13:56:14.420917 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], seq 10648:12108, ack 408, win 3, length 1460
> 13:56:14.420918 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [P.], seq 12108:13284, ack 408, win 3, length 1176
> 13:56:14.489249 IP 54.231.161.34.80 > 10.134.160.179.56696: Flags [.], ack 13284, win 32746, length 0
> 13:56:14.489277 IP 10.134.160.179.56696 > 54.231.161.34.80: Flags [.], seq 13284:14744, ack 408, win 3, length 1460

The amazon headers from that particular transaction (extracted from the packet dump) are
x-amz-id-2: yCKgn6ISt5doYLBl0Dv0RAt7rl0rtbrNsGZSvsZMRVfmcr0SX8VB+f0nsURvGXq6VHauSnJ37oI=
x-amz-request-id: 0AF88391ED2F4D3D

Looking at the "ack 8012, win 32746" at 13:56:14.420885, with the wscale 6 applied that means AWS has acked receipt up through sequence number 8012, with room for 32746*64 = 2095744 bytes in its buffer.  So the EC2 instance should feel welcome to send bytes 8013 through 2103757; yet it only sends through 13284, then stops for 60ms until it gets the next ACK.  It isn't even sending 32746 bytes (under the hypothesis it's ignoring the wscale from AWS and treating the window value literally).  What is it doing during those 60ms when it could be sending data?
Here's a similar transfer, this time a POST over SSL to people.mozilla.com.

> 14:51:07.318183 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [SEW], seq 596120331, win 8192, options [mss 1460,nop,wscale 14,sackOK,TS val 135339 ecr 0], length 0
> 14:51:07.399545 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [S.E], seq 2988224313, ack 596120332, win 14480, options [mss 1436,sackOK,TS val 624514879 ecr 135339,nop,wscale 7], length 0
> 14:51:07.399565 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], ack 1, win 4, options [nop,nop,TS val 135347 ecr 624514879], length 0
> 14:51:07.547451 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [P.], seq 1:61, ack 1, win 4, options [nop,nop,TS val 135362 ecr 624514879], length 60
> 14:51:07.627783 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 61, win 114, options [nop,nop,TS val 624515107 ecr 135362], length 0
> 14:51:07.635702 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], seq 1:1425, ack 61, win 114, options [nop,nop,TS val 624515115 ecr 135362], length 1424
> 14:51:07.635719 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], seq 1425:2849, ack 61, win 114, options [nop,nop,TS val 624515115 ecr 135362], length 1424
> 14:51:07.635720 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [P.], seq 2849:3640, ack 61, win 114, options [nop,nop,TS val 624515115 ecr 135362], length 791
> 14:51:07.635729 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], ack 3640, win 4, options [nop,nop,TS val 135371 ecr 624515115], length 0
> 14:51:07.676300 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [P.], seq 61:387, ack 3640, win 4, options [nop,nop,TS val 135375 ecr 624515115], length 326
> 14:51:07.762110 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [P.], seq 3640:3699, ack 387, win 122, options [nop,nop,TS val 624515241 ecr 135375], length 59
> 14:51:07.762348 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 387:1811, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762351 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 1811:3235, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762356 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 3235:4659, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762357 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 4659:6083, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762359 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 6083:7507, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762361 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 7507:8931, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762362 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 8931:10355, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762364 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 10355:11779, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762366 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 11779:13203, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.762370 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 13203:14627, ack 3699, win 4, options [nop,nop,TS val 135384 ecr 624515241], length 1424
> 14:51:07.842672 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 3235, win 167, options [nop,nop,TS val 624515322 ecr 135384], length 0
> 14:51:07.842689 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 14627:16051, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842692 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [P.], seq 16051:17475, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842694 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 17475:18899, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842695 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 18899:20323, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842697 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 20323:21747, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842698 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 21747:23171, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842699 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 23171:24595, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842722 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 6083, win 212, options [nop,nop,TS val 624515322 ecr 135384], length 0
> 14:51:07.842733 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 24595:26019, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842735 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 26019:27443, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842736 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 27443:28867, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842738 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 28867:30291, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842740 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 30291:31715, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842757 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 8931, win 258, options [nop,nop,TS val 624515322 ecr 135384], length 0
> 14:51:07.842766 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [.], seq 31715:33139, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 1424
> 14:51:07.842768 IP 10.134.160.179.49465 > 63.245.214.133.443: Flags [P.], seq 33139:33266, ack 3699, win 4, options [nop,nop,TS val 135392 ecr 624515322], length 127
> 14:51:07.842827 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 11779, win 303, options [nop,nop,TS val 624515322 ecr 135384], length 0
> 14:51:07.842828 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 14627, win 348, options [nop,nop,TS val 624515322 ecr 135384], length 0
> 14:51:07.923225 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 17475, win 393, options [nop,nop,TS val 624515402 ecr 135392], length 0
> 14:51:07.923226 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 20323, win 439, options [nop,nop,TS val 624515402 ecr 135392], length 0
> 14:51:07.923226 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 23171, win 459, options [nop,nop,TS val 624515402 ecr 135392], length 0
> 14:51:07.923284 IP 63.245.214.133.443 > 10.134.160.179.49465: Flags [.], ack 33266, win 427, options [nop,nop,TS val 624515402 ecr 135392], length 0

The "E" and "W" in the first packet correspond to the ECR and CWR bits.  ECR is requesting an echo of the timestamp, which the other side does, and I don't think is relevant here, although since AWS didn't reply with an ECR bit set in comment 24, they weren't used on either side in that connection.  The CWR might be relevant.  

At any rate, the "ack 3235, win 167" at "14:51:07.842672" with a wscale of 7 indicates that people's window extends to 3235+167*128 = 24611, so the EC2 instance can send up to that sequence number -- and indeed, it sends up to 24595, far more than 4096 bytes.  It then gets "ack 6083, win 212", acking up to 33219, which the EC2 instance proceeds to send.  It continues on like this -- with a great deal more than 4k in flight.
Sorry, the "E" is ECE - ECN Echo.  Both that and CWR are part of http://en.wikipedia.org/wiki/Explicit_Congestion_Notification They're set for the first two packets of the three-way handshake, which is (I think) part of ECN+.  But they aren't in effect afterward.  It certainly doesn't explain why when talking to AWS the host seems to have a 4096-byte window.
Also, the congestion window, cwnd, is tracked in segments, not bytes -- and the MSS is 1460, not a divisor of 4096.  So I doubt this is cwnd.
Attached file s3-window.analysis
I just can't stay away.  Here's an analysis of the s3 logs for window size.  The "* to" fields indicate the sequence number for which we've gotten an ack, to which we've sent, and the end of the receiver's window.  "unacked" is the amount of data we've sent that hasn't yet been ack'd.  The "window use" is the fraction of the window we've filled with data.

It pays to get a computer to do your math for you: in fact the largest unacked in this transaction is 8192 bytes, not 4096, but at any rate a tiny fraction of the available window size.  This was recorded from an 's3cmd put' operation.
Attached file people-window.analysis
I ran the same analysis on the connection from

---
import urllib2
data = 'abcd' * (1024*1024/4)
urllib2.urlopen('http://people.mozilla.com/~dustin', data)
---

Which is cheating on the HTTP level, but as far as the kernel is concerned 1MB of incoming data is just fine.

This uses a much larger "unacked", maxing out at 32842 bytes.
I ran the same script, but pointing at http://s3-us-west-2.amazonaws.com/.  This time, max_unacked is even larger - 391780.

The IPs are nearly the same in either case - 54.231.161.{16,34}.

Maybe s3cmd is the culprit?
Blocks: 1165567
I am working on pulling all the options together and putting them in a new ticket for puppett implementation. However, the quick summary is after aggressively optimizing the TCP/IP settings no significant changes were evident in USW2 communications. I was thinking about differences between the Linux stack and windows. When the idea that this may be a socket issue after some research I found settings for winsock that finally changed the transfer times.  I added the registry entries:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\AFD\Parameters]
"DefaultSendWindow"=dword:00256000
"DefaultReceiveWindow"=dword:00256000

After that change I started to see 10MB/s up from 200kbps I then updated to:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\AFD\Parameters]
"DefaultSendWindow"=dword:00512000
"DefaultReceiveWindow"=dword:00512000

That change lead to speeds of 60MB/s to usw2 I ran Dustin's test script for a few hours and got times in s3_times2.txt which is attached to the bug.
Attached file s3_times2.txt
Here is an export of all the working settings. I am working on putting these into reg and files and netsh scripts.


[netsh]
TCP Window Auto-Tuning=4
Congestion Control Provider=0
TCP Chimney Offload=2
Receive-Side Scaling State=1
EnableDCA=2
EnableTCPA=1
ECN Capability=1
Windows Scaling heuristics=2
[SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
DefaultTTL=-1
Tcp1323Opts=3
DisableTaskOffload=1
DisableTaskOffload_SEL=1
MaxUserPort=-1
TcpTimedWaitDelay=-1
SynAttackProtect=-1
SynAttackProtect_SEL=0
TCPMaxDataRetransmissions=-1
[SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_MAXCONNECTIONSPER1_0SERVER]
explorer.exe=10
iexplore.exe=10
[SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_MAXCONNECTIONSPERSERVER]
explorer.exe=10
iexplore.exe=10
[SYSTEM\CurrentControlSet\Services\Tcpip\ServiceProvider]
LocalPriority=499
HostsPriority=500
DnsPriority=2000
NetbtPriority=2001
[SOFTWARE\Policies\Microsoft\Windows\Psched]
NonBestEffortLimit=-2
[SOFTWARE\Microsoft\Windows NT\CurrentVersion\Multimedia\SystemProfile]
NetworkThrottlingIndex=10
NetworkThrottlingIndex_SEL=10
SystemResponsiveness=10
SystemResponsiveness_SEL=10
[SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters]
Size=3
Size_SEL=3
[SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
LargeSystemCache=1
LargeSystemCache_SEL=1
[SOFTWARE\Microsoft\MSMQ\Parameters]
TCPNoDelay=-1
TCPNoDelay_SEL=0
[SYSTEM\CurrentControlSet\Services\Dnscache\Parameters]
NegativeCacheTime=-1
NetFailureCacheTime=-1
NegativeSOACacheTime=-1
[Local Area Connection]
MTU=0
[SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\{4BD23CF6-2015-463F-BCA8-5486BFB6ABF0}]
TcpAckFrequency=-1
TcpAckFrequency_SEL=0
TcpDelAckTicks=-1
TcpDelAckTicks_SEL=0
TCPNoDelay=-1
TCPNoDelay_SEL=0
[Local Area Connection 2]
MTU=1500
[SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\{E2AEA1D2-699E-4C38-A83D-BC710FA215BA}]
TcpAckFrequency=-1
TcpAckFrequency_SEL=0
TcpDelAckTicks=-1
TcpDelAckTicks_SEL=0
TCPNoDelay=-1
TCPNoDelay_SEL=0
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\AFD\Parameters]
DefaultSendWindow=00512000
DefaultReceiveWindow=00512000
It should be notes that TCP Window Auto-Tuning=4 is questionable it is labeled "experimental" it is for FAT pipes with higher than 8 ms latency it allows TCP window sizes larger than 16MB. It may be better to set it to "normal".

Q
Blocks: 1124303
Awesome!  I'm curious what the values of DefaultSendWindow and DefaultReceiveWindow were before the change -- 8k?
Can I like a bug and comments therein? I'm liking this.
After doing some math if we want to get gigabit speed out of s3 transfer the value should be 128000000 ((1000000 kbps / 8) * 1024) . I am going to do some experimentation here and see if the larger initial size affects slower communication points. It should be noted that this value if often overridden in software during socket construction. 

Dustin you are correct the default value was 8k.


Q
I cleaned up the linux instance I was testing with.  Still to clean up before closing this: mozilla-releng-slowwin-test-* buckets, extra route table, extra subnets.  Q, I can do that when you let me know you're ready, if you'd prefer.
I should be done testing optimal settings today. I would like to identify upper and lower limits for both the buffer and the window negotiation. Then I can kill the last instance I have up and the other components.
Attached patch BUG1165314.patch (obsolete) — Splinter Review
Attachment #8608498 - Flags: review?(dustin)
Attachment #8608498 - Flags: feedback?
Comment on attachment 8608498 [details] [diff] [review]
BUG1165314.patch

This is the initial patch to get the performance of datacenter machines up to par. I am planning on building in tweaks for 2008 in AWS onto this patch, and as well as future network tweaks for other Windows platforms.
Attachment #8608498 - Flags: feedback?(rthijssen)
Attachment #8608498 - Flags: feedback?(q)
Attachment #8608498 - Flags: feedback?
wrong window wrong bug for comment 40 and 41.
Attachment #8608498 - Attachment is obsolete: true
Attachment #8608498 - Flags: review?(dustin)
Attachment #8608498 - Flags: feedback?(rthijssen)
Attachment #8608498 - Flags: feedback?(q)
Whiteboard: [windows]
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: