Mistakes
This page describes some of the poor decisions I have made.
Not using a CDN
I stopped using a CDN (AWS Cloudfront in front of an S3 bucket) recently, instead opting to use Blot's server and attached SSD harddrive to serve all the static files for each site. The purpose of this was to reduce the build latency for new posts and the number of moving parts in Blot's infrastructure. I'd also like to learn more about serving static files effectively at scale.
A few weeks after this change I noticed that images were loading slowly when served by Blot. Even small ones. Really slowly. To verify slow load times I used curl. Here are a few numbers from Blot:
curl -Lo /dev/null -skw "%{speed_download}\n" http://portfolio.blot.im/_image_cache/a4f12fbe-fceb-42be-8903-f742e3c3c10b.jpg
- 180278.000
- 230055.000
- 182060.000
- 218688.000
- 156734.000
And by comparison, here are some from flickr:
curl -Lo /dev/null -skw "%{speed_download}" https://farm9.staticflickr.com/8624/16031381354_11bae45c3e_b.jpg
- 794999.000
- 894288.000
- 1017110.000
- 987376.000
- 324241.000
- 1097168.000
Where to begin to diagnose this problem? It seems like there's a bottleneck. I needed to work out where the bottleneck is. First, I used fio to measure disk read/write:
aggrb=8639KB/s, minb=8639KB/s, maxb=8639KB/s
I used iperf to measure network throughput out of AWS instance to my machine on a fast commercial internet connection. Here are a few results:
35.9 Mbits/sec 28.4 Mbits/sec
I installed iftop to visualize bandwidth
I thought it might be something to do with proxy buffering but since NGINX was serving the image (and not node) the proxying was not the culprit
Content download for an image on Blot is about 424.657534247 KB/s. What gives? Imgur is giving me 2989.071038251KB/s for a js file.
Finally I discovered that AWS offers a service called enhanced networking. It was installed but not activated on my instance:
I read more about enhanced networking on reddit. Then AWS's guide to determining whether it is enabled.
I installed the aws cli and created a new user 'aws-cli' to manage the instances, I configured it for oregon (us-west-2). modinfo needed to be run as root.
Will enhanced networking help? What are the downsides? Why isn't it enabled by default?
I tried adjusting the MTU (MAXIMUM TRANSFER UNIT) size.
Initially, the instance had jumbo frames enabled (mtu=9001), which I reduced to 1500, then did not observe any improved throughput.
Finally I span up a linode in California to see if the issue might be latency related. From CA
- 4057573.000
- 4070404.000
- 4065736.000
- 4077597.000
- 4024071.000
I then used iperf to verify the differnce:
west-coast linode -> oregon-ec2 instance: 251 Mbits/sec east-coast linode -> oregon-ec2 instance: 59 Mbits/sec
Aparently latency affects bandwidth for TCP. Ah. Back to using a CDN... and I need to read the RFC for TCP. Oh well, at least I now have some concrete numbers for why people use CDNS. The Bandwidth-delay product seems to be the mathematics underpinning the problem I encountered: