In order to reduce the costs associated with bandwidth, I wrote a simple download script a while ago that caches the most recently accessed files from Amazon S3 on a less expensive hosting plan. This script was written using a combination of Python, Nginx, and Flask, but the same effect could be accomplished in any language / framework that supports setting response headers.

At the time of writing, Amazon charges at least 12 cents per gigabyte of data transfer versus the terabytes of data transfer routinely offered by dedicated server companies for less than $100 per month. In my particular case, 32TB of traffic was clustered around the most popular files on any given day. Managing the storage on each individual server would be impractical without using S3 for a backend, but it didn’t make financial sense to pay disproportional costs for a small subset of files that changed predictably.

Quickstart

If you have docker installed, you can quickly see my script in action by running

docker run -p 8000:80 -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e S3_BUCKET=$S3_BUCKET -t bluelaguna/s3cache

Where $AWS_ACCESS_KEY_ID, $AWS_SECRET_ACCESS_KEY, and $S3_BUCKET are stored in environment variables or replaced with their actual values.

You should then be able to access any file stored in $S3_BUCKET using http://localhost:8000/download/path/to/file. This will redirect to Amazon S3 the first time, but send the file directly the second time.

Setting up the download script

If you want to follow along with what I did, or create your own version of this script, you’ll need to install a few things first. This all assumes a fairly recent version of Ubuntu / Debian, but the same packages can also be found on other operating systems.

Pre-requisites

sudo apt-get install build-essential python-pip python-devsudo pip install flask boto uwsgi

download.py

Next, create a new python file in the directory of your choice. I decided to go with /home/s3cache/download.py. You should probably place this app in its own folder for now.

Setting Up the Nginx web server

The nginx web server now comes packaged with most Linux distributions. In Ubuntu / Debian, you’ll want to run the following command to install the version that includes all the features we need.

sudo apt-get install nginx-full

Site Config

Create a new file in /etc/nginx/site-enabled (for example /etc/nginx/sites-enabled/s3cache.conf)

Be sure to adjust the value for server_name and change “/home/s3cache” to match CACHE_ROOT in download.py From there, restart nginx or reload the configuration for the changes to take effect.
sudo /etc/init.d/nginx restart

However, if you try to access the site now, you’ll get an error. If you look through the configuration, you’ll notice a reference to a socket connection (s3cache.sock). We’ll need to configure uwsgi to get it going by creating uswgi.ini in the same folder as our app (/home/s3cache)

uwsgi.ini

To start the app, simply run uwsgi uwsgi.ini in the app folder. You should then be able to access any file stored in $S3_BUCKET using http://localhost/download/path/to/file. This will redirect to Amazon S3 the first time, but send the file directly the second time. In a production environment, you would want to adjust uwsgi.ini run as another user and setup a uwsgi emperor