Skip to content

Slow uploads with high latency #603

@erikdubbelboer

Description

@erikdubbelboer

I have noticed that uploads to disco get slower when the latency gets higher.

Since it's normal HTTP over TCP I was wondering if Disco does something special when reading from the TCP socket that could slow things down when the latency is higher. Or does Disco somehow makes the TCP receive buffer very small resulting in a lot of packet round tips.

The below tests are done using curl to make sure python wasn't the bottleneck. Originally I found this problem while using the ddfs tool.

This is a 100MB /dev/urandom file being transfered from a server in Salt Lake City to a server in Singapore. First I upload the file to nginx to show which speed is reachable. Then I upload the file to Disco:

$ curl -v -X POST -d @random.bin 'http://singapore-dev-1:80' > out.log
* Connected to singapore-dev-1 (119.81.66.224) port 80 (#0)
> POST / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: singapore-dev-1
> Accept: */*
> Content-Length: 51983412
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
} [data not shown]
 84 49.5M    0     0   84 41.8M      0  6354k  0:00:07  0:00:06  0:00:01 8341k
< HTTP/1.1 200 OK
* Server nginx is not blacklisted
< Server: nginx
< Date: Sun, 30 Nov 2014 07:18:03 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: close
< X-Powered-By: PHP/5.3.3
<
{ [data not shown]
100 49.5M    0     5  100 49.5M      0  6389k  0:00:07  0:00:07 --:--:-- 10.3M
* Closing connection 0

As you can see it's reaching 10.3MB/s.

But when I upload the same random file to disco I only get 0.2MB/s:

$ curl -v -X PUT -d @random.bin 'http://singapore-dev-1:8990/ddfs/test1$589-25437-55e0c' > out.log
* Connected to singapore-dev-1 (119.81.66.224) port 8990 (#0)
> PUT /ddfs/test1$589-25437-55e0c HTTP/1.1
> User-Agent: curl/7.35.0
> Host: singapore-dev-1:8990
> Accept: */*
> Content-Length: 51983412
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* Server MochiWeb/1.0 (Any of you quaids got a smint?) is not blacklisted
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Fri, 28 Nov 2014 05:31:25 GMT
} [data not shown]
100 49.5M    0     0  100 49.5M      0  40932  0:21:09  0:21:09 --:--:-- 0.3M
< HTTP/1.1 201 Created
* Server MochiWeb/1.0 (Any of you quaids got a smint?) is not blacklisted
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Fri, 28 Nov 2014 05:52:35 GMT
< content-type: application/json
< Content-Length: 65
<
{ [data not shown]
100 49.5M    0    65  100 49.5M      0  40918  0:21:10  0:21:10 --:--:-- 0.2M
$
$ cat out.log
"disco://singapore-dev-1/ddfs/vol0/blob/43/test1$589-25437-55e0c"

The latency between the servers is as followed:

$ ping singapore-dev-1
PING singapore-dev-1 (119.81.66.224) 56(84) bytes of data.
64 bytes from singapore-dev-1 (119.81.66.224): icmp_seq=1 ttl=50 time=209 ms
...
64 bytes from singapore-dev-1 (119.81.66.224): icmp_seq=251 ttl=50 time=209 ms
--- singapore-dev-1 ping statistics ---
251 packets transmitted, 251 received, 0% packet loss, time 250119ms
rtt min/avg/max/mdev = 208.663/209.109/209.523/0.366 ms

When I do exactly the same but from the local host (or from a different server in the same datacenter) I get a very high speed again (meaning disco is not always slow):

$ curl -v -X PUT -d @random.bin 'http://singapore-dev-1:8990/ddfs/test2$589-26115-21b9a' > out.log
* About to connect() to singapore-dev-1 port 8990 (#0)
*   Trying 119.81.66.224... connected
* Connected to singapore-dev-1 (119.81.66.224) port 8990 (#0)
> PUT /ddfs/test2$589-26115-21b9a HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: singapore-dev-1:8990
> Accept: */*
> Content-Length: 51983412
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
< HTTP/1.1 100 Continue
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Fri, 28 Nov 2014 06:25:41 GMT
} [data not shown]
< HTTP/1.1 201 Created
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Fri, 28 Nov 2014 06:25:42 GMT
< content-type: application/json
< Content-Length: 65
<
{ [data not shown]
100 49.5M    0    65  100 49.5M     95  72.9M --:--:-- --:--:-- --:--:-- 76.5M
$
$ cat out.log
"disco://singapore-dev-1/ddfs/vol0/blob/a3/test2$589-26115-21b9a"

Both logs show the the upload succeeded normally:

2014-11-27 23:31:25.912 [info] <11312.166.0> PUT BLOB: "/ddfs/test1$589-25437-55e0c" ("51983412" bytes) on 'disco_8989_slave@singapore-dev-1'
2014-11-27 23:52:35.854 [info] <11312.166.0> PUT BLOB done with "/ddfs/test1$589-25437-55e0c" (51983412) on 'disco_8989_slave@singapore-dev-1'
...
2014-11-28 00:25:41.448 [info] <11312.52.0> PUT BLOB: "/ddfs/test2$589-26115-21b9a" ("51983412" bytes) on 'disco_8989_slave@singapore-dev-1'
2014-11-28 00:25:41.655 [info] <11312.52.0> PUT BLOB done with "/ddfs/test2$589-26115-21b9a" (51983412) on 'disco_8989_slave@singapore-dev-1'

When I try the same with a upload from a server in Amsterdam to a server in London I get a slower upload speed again. This time not as slow because the latency between the servers is lower:

$ curl -v -X PUT -d @random.bin 'http://london-2:8990/ddfs/test1$589-4f976-1c96' > out.log
* About to connect() to london-2 port 8990 (#0)
> PUT /ddfs/test1$589-4f976-1c96 HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: london-2:8990
> Accept: */*
> Content-Length: 25957455
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
} [data not shown]
 98 24.7M    0     0   98 24.3M      0   896k  0:00:28  0:00:27  0:00:01  0.9K
< HTTP/1.1 201 Created
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Sun, 30 Nov 2014 05:41:25 GMT
< content-type: application/json
< Content-Length: 57
<
{ [data not shown]
100 24.7M  100    57  100 24.7M      1   888k  0:00:57  0:00:28  0:00:29  0.9M
* Connection #0 to host london-2 left intact
* Closing connection #0
$
$ ping london-2
PING london-2 (37.130.227.148) 56(84) bytes of data.
64 bytes from 2582e394.rdns.100tb.com (37.130.227.148): icmp_req=1 ttl=55 time=6.44 ms
...
64 bytes from 2582e394.rdns.100tb.com (37.130.227.148): icmp_req=35 ttl=55 time=6.31 ms
--- london-2 ping statistics ---
35 packets transmitted, 35 received, 0% packet loss, time 34003ms
rtt min/avg/max/mdev = 6.216/6.356/6.547/0.139 ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions