Hosting Django static images with Amazon Cloudfront (CDN) using django-static

July 9, 2010
4 comments Django

About a month ago I add a new feature to django-static that makes it possible to define a function that all files of django-static goes through.

First of all a quick recap. django-static is a Django plugin that you use from your templates to reference static media. django-static takes care of giving the file the optimum name for static serving and if applicable compresses the file by trimming all whitespace and what not. For more info, see The awesomest way possible to serve your static stuff in Django with Nginx

The new, popular, kid on the block for CDN (Content Delivery Network) is Amazon Cloudfront. It's a service sitting on top of the already proven Amazon S3 service which is a cloud file storage solution. What a CDN does is that it registers a domain for your resources such that with some DNS tricks, users of this resource URL download it from the geographically nearest server. So if you live in Sweden you might download myholiday.jpg from a server in Frankfurk and if you live in North Carolina, USA you might download the very same picture from Virgina, USA. That assures the that the distance to the resource is minimized. If you're not convinced or sure about how CDNs work check out THE best practice guide for faster webpages by Steve Sounders (it's number two)

A disadvantage with Amazon Cloudfront is that it's unable to negotiate with the client to compress downlodable resources with GZIP. GZIPping a resource is considered a bigger optimization win than using CDN. So, I continue to serve my static CSS and Javascript files from my Nginx but put all the images on Amazon Cloudfront. How to do this with django-static? Easy: add this to your settings:


DJANGO_STATIC = True
...other DJANGO_STATIC_... settings...
# equivalent of 'from cloudfront import file_proxy' in this PYTHONPATH
DJANGO_STATIC_FILE_PROXY = 'cloudfront.file_proxy'

Then you need to write that function that get's a chance to do something with every static resource that django-static prepares. Here's a naive first version:


# in cloudfront.py

conversion_map = {} # global variable
def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
    if filepath and (new or changed):
        if filepath.lower().split('.')[-1] in ('jpg','gif','png'):
            conversion_map[uri] = _upload_to_cloudfront(filepath)
    return conversion_map.get(uri, uri)

The files are only sent through the function _upload_to_cloudfront() the first time they're "massaged" by django-static. On consecutive calls nothing is done to the file since django-static remembers, and sticks to, the way it dealt with it the first time if you see what I mean. Basically, when you have restarted your Django server the file is prepared and checked for a timestamp but the second time the template is rendered to save time it doesn't check the file again and just passes through the resulting file name. If this is all confusing you can start with a much simpler proxy function that looks like this:


def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
    print "Debugging and learning"
    print uri
    print "New", new,
    print "Filepath", filepath,
    print "Changed", changed,
    print "Other arguments:", kwargs
    return uri

The function to upload to Amazon Cloudfront is pretty straight forward thanks to the boto project. Here's my version:


import re
from django.conf import settings
import boto

_cf_connection = None
_cf_distribution = None

def _upload_to_cloudfront(filepath):
   global _cf_connection
   global _cf_distribution

   if _cf_connection is None:
       _cf_connection = boto.connect_cloudfront(settings.AWS_ACCESS_KEY,
                                                settings.AWS_ACCESS_SECRET)

   if _cf_distribution is None:
       _cf_distribution = _cf_connection.create_distribution(
           origin='%s.s3.amazonaws.com' % settings.AWS_STORAGE_BUCKET_NAME,
           enabled=True,
           comment=settings.AWS_CLOUDFRONT_DISTRIBUTION_COMMENT)

   # now we can delete any old versions of the same file that have the
   # same name but a different timestamp
   basename = os.path.basename(filepath)
   object_regex = re.compile('%s\.(\d+)\.%s' % \
       (re.escape('.'.join(basename.split('.')[:-2])),
        re.escape(basename.split('.')[-1])))
   for obj in _cf_distribution.get_objects():
       match = object_regex.findall(obj.name)
       if match:
           old_timestamp = int(match[0])
           new_timestamp = int(object_regex.findall(basename)[0])
           if new_timestamp == old_timestamp:
               # an exact copy already exists
               return obj.url()
           elif new_timestamp > old_timestamp:
               # we've come across the same file but with an older timestamp
               #print "DELETE!", obj_.name
               obj.delete()
               break

   # Still here? That means that the file wasn't already in the distribution

   fp = open(filepath)

   # Because the name will always contain a timestamp we set faaar future
   # caching headers. Doesn't matter exactly as long as it's really far future.
   headers = {'Cache-Control':'max-age=315360000, public',
              'Expires': 'Thu, 31 Dec 2037 23:55:55 GMT',
              }

   #print "\t\t\tAWS upload(%s)" % basename
   obj = _cf_distribution.add_object(basename, fp, headers=headers)
   return obj.url()

Moving on, unfortunately this isn't good enough. You see, from the time you have issued an upload to Amazon Cloudfront you immediately get a full URL for the resource but if it's a new distribution it will take a little while until the DNS propagates and becomes globally available. Therefore, the URL that you get back will most likely yield you a 404 Page not found if you try it immediately.

So to solve this problem I wrote a simple alternative to the Python dict() type that works roughly the same except that myinstance.get(key) will depend on time. 1 hour in this case. So it works something like this:


>>> slow_map = SlowMap(10)
>>> slow_map['key'] = "Value"
>>> print slow_map['key']
None
>>> from time import sleep
>>> sleep(10)
>>> print slow_map['key']
"Value"

And here's the code for that:


from time import time

class SlowMap(object):
   """
   >>> slow_map = SlowMap(60)
   >>> slow_map[key] = value
   >>> print slow_map.get(key)
   None

   Then 60 seconds goes past:
   >>> slow_map.get(key)
   value

   """
   def __init__(self, timeout_seconds):
       self.timeout = timeout_seconds

       self.guard = dict()
       self.data = dict()

   def get(self, key, default=None):
       value = self.data.get(key)
       if value is not None:
           return value

       value, expires = self.guard.get(key)

       if expires < time():
           # good to release
           self.data[key] = value
           del self.guard[key]
           return value
       else:
           # held back
           return default

   def __setitem__(self, key, value):
       self.guard[key] = (value, time() + self.timeout)

With all of that ready willing and able you should now be able to serve your images from Amazon Cloudfront simply by doing this in your Django templates:


{% staticfile "/img/mysprite.gif" %}

To test this I've deployed this technique on my money making site code guinea pig Crosstips. Go ahead, visit that site and use Firebug or view the source and check out the URLs used for the images. They look something like this: http://dpv9al5z7o7rq.cloudfront.net/ctw-screenshot.1242930552.png

If you want to look at my code used for Crosstips download this file. It's pretty generic to anybody who wants to achieve the same thing.

Have fun and happy CDN'ing!

Hosting Django static images with Amazon Cloudfront (CDN) using django-static Here's a screenshot of the wonderful Amazon AWS Console

People's reactions to Gates and Buffet's $600 billion challenge

June 17, 2010
0 comments Politics

Isn't it amazingly positive news that Warren Buffet and Melinda and Bill Gates have put up the $600 billion challenge which is "asking the nation's billionaires to pledge to give at least half their net worth to charity". And if you haven't already read about it, Warren Buffet pledges 99% of this company stock to charity. All good news but what's really interesting is reading peoples comments on the CNN page. A handful pick:

"Interesting article. It is saddening, however, to ponder just how much of this crowd's wealth was made through unfair business practices, worker exploitation, price fixing, etc. I suppose philanthropy on the back end is a nice afterthought, though, and certainly earns more praise from the public than would lessening their profit margins at the get-go."

"Pay their taxes first, then contribute with after tax money."

"If I may be cynical. Perhaps these super rich people should have done more for the people that worked for them so that they made more money and the leaders made a little less. Buffet owns companies that make goods in second and third world countries at some of the lowest possible wages."

"$1000 in the hands of ONE could be investment money. $1000 distributed $1 to ONE THOUSAND could get each a Coke (no fries)."

But also, there are some more "positive" comments:

"There is sooo much negativity in this country! I don't care what anyone of you says...Bill and Melinda came from Blue Collar....and now they are giving back and I think it's awesome!"

"I think what they are doing is very admirable. The Gates Foundation is the reason I was able to pay for college. People need to not criticize what they do with their money, at least they are trying to make a difference."

In conclusion from skimming the comments it's pretty obvious that people in the USA are angry and bitter. What is there to complain about? Really? Poor Obama, he's doing a great job but with all this resentment sizzling around it's going to be very hard if even "extreme philanthropy" gets butchered like this.

TfL Traffic cameras on a Google map

June 16, 2010
4 comments Web development

TfL Traffic cameras on a Google map Yesterday I found out that Transport for London lifted all restrictions for commercial use of its data that it has made available for developers.

In lack of better imagination I decided to attack the Live Traffic Cameras data and whipped up this little app: tflcameras.peterbe.com

It basically shows a map of London and then shows all the spots where traffic cameras are installed so that you can click on them. The data is updated every 3 hours I think but I haven't checked that claim yet. Use this if you're a London commuter and want to check the traffic before you hit the road.

Oh, and this app uses the geo location stuff so that I know where to zoom in first. But if you're not based in London it zooms in over Trafalgar square by default.

Correction: running Django tests with MongoDB is NOT slow

May 30, 2010
1 comment Django, MongoDB

At Euro DjangoCon I met lots of people and talked a lot about MongoDB as the backend. I even did a presentation on the subject which led to a lot of people asking me more questions about MongoDB.

I did mention to some people that one of the drawbacks of using MongoDB which doesn't have transactions is that you have to create and destroy the collections (like SQL tables) each time for every single test runs. I thought this was slow. It's not

Today I've been doing some more profiling and testing and debugging and I can conclude that it's not a problem. Creating the database has a slight delay but it's something you only have to do once and actually it's very fast. Here's how I tear down the collections in between each test:


class BaseTest(TestCase):

   def tearDown(self):
       for name in self.database.collection_names():
           if name not in ('system.indexes',):
               self.database.drop_collection(name)

For example, running test of one of my apps looks like this:


$ ./manage.py test myapp
...........lots.............
----------------------------------------------------------------------
Ran 55 tests in 3.024s

So, don't fear writing lots of individual unit tests. MongoDB will not slow you down.

Muted conversations in Gmail

May 29, 2010
0 comments Misc. links

Muted conversations in Gmail Having lived under a rock for a while I've managed to miss this great new feature in Gmail: Muting or ignoring conversions

From their help text:

"you've no doubt been subjected to the 'thread that just won't die!' If you're part of a long message conversation that isn't relevant, you can mute the conversation to keep all future additions out of your inbox."

That is such a smart feature. Interestingly I didn't even think there was a solution to that problem. I sure I have many times needed something like this. Now, let's hope I can remember to actually use this feature.