UPPER vs. ILIKE

April 19, 2010
4 comments Web development

I have a Zope web app that uses hand coded SQL (PostgreSQL) statements. Similar to the old PHP. My only excuse for not using an ORM was that this project started in 2005 and at the time SQLAlchemy seemed like a nerdy mess with more undocumented quirks than it was worth losing hair over.

Anyway, several statements use ILIKE to get around the problem of making things case insensitive. Something like the Django ORM uses UPPER to get around it. So I wonder how much the ILIKE statement slows down compared to UPPER and the indexed equal operator. Obviously, neither ILIKE or UPPER can use an index.

Long story short, here are the numbers for selecting on about 10,000 index records:


# ONE
EXPLAIN ANALYZE SELECT ... FROM ... WHERE name = 'Abbaye';
Average: 0.14 milliseconds

# TWO
EXPLAIN ANALYZE SELECT ... FROM ... WHERE  UPPER(name::text) = UPPER('Abbaye');
Average: 18.89 milliseconds

# THREE
EXPLAIN ANALYZE SELECT ... FROM ... WHERE  name ILIKE 'Abbaye';
Average: 24.63 milliseconds

UPPER vs. ILIKE

First of all, the conclusion is to use UPPER instead of ILIKE if you don't need to do regular expressions. Secondly, if at all possible try to use the equal operator first and foremost and only reside on the case insensitive one if you really need to.

Lastly, in PostgreSQL is there a quick and easy to use drop in alternative to make an equality operatorish thing for varchars that are case insensitive?

UPDATE 2014

There is an answer to the question just above: citext

Who was logged in during a Django exception

April 15, 2010
5 comments Django

In lack of a fancier solution here's how I solved a problem of knowing who was logged in when an error occurred. I'm building a Intranet like system for a close group of people and if an error occurs I get an email that reminds me to add more tests. So I fix the bugs and upgrade the server. But I often want to know what poor sucker was logged in at the time the exception happened so that I can email them and say something like "Hi! I noticed your stumbled across a bug. My bad. Just wanted to let you know I've fixed that now"

So to do this I installed a silly little piece of middleware:


from django.conf import settings
class ExceptionExtraMiddleware(object):
   def process_exception(self, request, exception):
       if settings.DEBUG:
           return
       try:
           logged_in_info = ''
           if request.user and request.user.is_authenticated():
               logged_in_info = "%s" % request.user
               if request.user.email:
                   logged_in_info += ' %s' % request.user.email
               if request.user.first_name or request.user.last_name:
                   logged_in_info += ' (%s %s)' % \
                     (request.user.first_name, request.user.last_name)
           if logged_in_info:
               request.META['ERROR-X-LOGGED-IN'] = logged_in_info
       except:
           # don't make matters worse in these sensitive times
           logging.debug("Unable to debug who was logged in", exc_info=True)

This means that when I get an email with the traceback and snapshot of the request object I get this included:


...
'ERROR-X-LOGGED-IN': u'anita (Anita Test)',
...

UPDATE

The code above had a bug in it. Doing an if on request.user will return true even if there is no logged in user. The safest thing is to change it to:


if request.user and request.user.is_authenticated():

fcgi vs. gunicorn vs. uWSGI

April 9, 2010
29 comments Python, Django, Linux

uwsgi is the latest and greatest WSGI server and promising to be the fastest possible way to run Nginx + Django. Proof here But! Is it that simple? Especially if you're involving Django herself.

So I set out to benchmark good old threaded fcgi and gunicorn and then with a source compiled nginx with the uwsgi module baked in I also benchmarked uwsgi. The first mistake I did was testing a Django view that was using sessions and other crap. I profiled the view to make sure it wouldn't be the bottleneck as it appeared to take only 0.02 seconds each. However, with fcgi, gunicorn and uwsgi I kept being stuck on about 50 requests per second. Why? 1/0.02 = 50.0!!! Clearly the slowness of the Django view was thee bottleneck (for the curious, what took all of 0.02 was the need to create new session keys and putting them into the database).

So I wrote a really dumb Django view with no sessions middleware enabled. Now we're getting some interesting numbers:


fcgi (threaded)              640 r/s
fcgi (prefork 4 processors)  240 r/s (*)
gunicorn (2 workers)         1100 r/s
gunicorn (5 workers)         1300 r/s
gunicorn (10 workers)        1200 r/s (?!?)
uwsgi (2 workers)            1800 r/s
uwsgi (5 workers)            2100 r/s
uwsgi (10 workers)           2300 r/s

(* this made my computer exceptionally sluggish as CPU when through the roof)

fcgi vs. gunicorn vs. uwsgi If you're wondering why the numbers appear to be rounded it's because I ran the benchmark multiple times and guesstimated an average (also obviously excluded the first run).

Misc notes

  • For gunicorn it didn't change the numbers if I used a TCP (e.g. 127.0.0.1:9000) or a UNIX socket (e.g. /tmp/wsgi.sock)
  • On the upstream directive in nginx it didn't impact the benchmark to set fail_timeout=0 or not.
  • fcgi on my laptop was unable to fork new processors automatically in this test so it stayed as 1 single process! Why?!!
  • when you get more than 2,000 requests/second the benchmark itself and the computer you run it on becomes wobbly. I managed to get 3,400 requests/second out of uwsgi but then the benchmark started failing requests.
  • These tests were done on an old 32bit dual core Thinkpad with 2Gb RAM :(
  • uwsgi was a bitch to configure. Most importantly, who the hell compiles source code these days when packages are so much much more convenient? (Fry-IT hosts around 100 web servers that need patching and love)
  • Why would anybody want to use sockets when they can cause permission problems? TCP is so much more straight forward.
  • changing the number of ulimits to 2048 did not improve my results on this computer
  • gunicorn is not available as a Debian package :(
  • Adding too many workers can actually damage your performance. See example of 10 workers on gunicorn.
  • I did not bother with mod_wsgi since I don't want to go near Apache and to be honest last time I tried I got really mysterious errors from mod_wsgi that I ran away screaming.

Conclusion

gunicorn is the winner in my eyes. It's easy to configure and get up and running and certainly fast enough and I don't have to worry about stray threads being created willy nilly like threaded fcgi. uwsgi definitely worth coming back to the day I need to squeeze few more requests per second but right now it just feels to inconvenient as I can't convince my sys admins to maintain compiled versions of nginx for the little extra benefit.

Having said that, the day uwsgi becomes available as a Debian package I'm all over it like a dog on an ass-flavored cookie.

And the "killer benefit" with gunicorn is that I can predict the memory usage. I found, on my laptop: 1 worker = 23Mb, 5 workers = 82Mb, 10 workers = 155Mb and these numbers stayed like that very predictably which means I can decide quite accurately how much RAM I should let Django (ab)use.

UPDATE:

Since this was publish we, in my company, have changed all Djangos to run over uWSGI. It's proven faster than any alternatives and extremely stable. We actually started using it before it was merged into core Nginx but considering how important this is and how many sites we have it's not been a problem to run our own Nginx package.

Hail uWSGI!

Voila! Now feel free to flame away about the inaccuracies and what multitude of more wheels and knobs I could/should twist to get even more juice out.

Cycling across England on Orange Snapshot

April 4, 2010
0 comments Photos

Cycling across England on Orange Snapshot I don't know who these guys are but I found their pictures on Orange Snapshot (Orange customers Twitter thing for posting pictures) and it appears that they're cycling across the country and documenting this by taking pictures and sticking them up on Twitter. I actually often check out Snapshot just to get a feeling what's going on in the country at the moment. It was great when the snow hit us.

Why is this important? Well, in these modern times it means that it's very easy to make a traveling photo documentary without a film crew or some obscure travel blog which forces you to write 1,000 words for every city you visit. Good work on you lads! Keep it up!

The awesomest way possible to serve your static stuff in Django with Nginx

March 24, 2010
19 comments Django

I'm the proud creator of django-static which is a Django app that takes care of how you serve your static media the best way possible. Although some of these things are subjective generally this is the ideal checklist of servicing your static media:

  1. Cache headers must be set to infinity
  2. URLs must be unique so that browsers never have to depend on refreshing
  3. The developer (who decided which media to include) should not have to worry himself with deployment
  4. The developer/artist (who makes the media) should not have to worry himself with deployment
  5. All Javascript and CSS must be whitespace optimized in a safe way and served with Gzip
  6. All images referenced inside CSS should be taken care of too
  7. It must be possible to combine multiple resources of Javascript or CSS into one
  8. It must be possible to easily test production deployment in development environment without too much effort
  9. A sysadmin shouldn't have to understand a developers Django application
  10. A development environment must be unhindered by this optimization
  11. Processing overhead of must be kept to a minimum
  12. Must be possible to easily say which resources can be whitespace optimized and which can not

So let's get started setting all of this up in your Django + Nginx environment. Let's start with the Django side of things.

Download and install django-static by first running something like easy_install django-static then add django_static to INSTALLED_APPS in your settings.py. and add this to enable 'django-static':


DJANGO_STATIC = True

Then edit your base.html template from this:


<html>
<link rel="stylesheet" href="/css/screen.css">
<link rel="stylesheet" href="/css/typo.css">
<link rel="stylesheet" href="/css/print.css" media="print">
<body>
<img src="/img/logo.png" alt="Logo">
{% block body %}{% endblock %}
<script type="text/javascript" src="/js/site.js"></script>
<script type="text/javascript">
window.onload = function() {
   dostuff();
};
</script>
</body>

To this new optimized version:


{% load django_static %}
<html>
{% slimall %}
<link rel="stylesheet" href="/css/screen.css">
<link rel="stylesheet" href="/css/typo.css">
<link rel="stylesheet" href="/css/print.css" media="print">
{% endslimall %}
<body>
<img src="{% staticfile "/img/logo.png" %}" alt="Logo">
{% block body %}{% endblock %}
<script type="text/javascript" src="{% slimfile "/js/site.js" %}"></script>
<script type="text/javascript">
{% slimcontent %}
window.onload = function() {
   dostuff();
};
{% endslimcontent %}
</script>
</body>
</html>

django_static when loaded offers you the following tags:

  1. staticfile <filename>
  2. slimfile <filename>
  3. slimcontent ... endslimcontent
  4. staticall ... endstaticall
  5. slimall ... endslimall

All the tags with the word slim are copies of the equivalent without; but on its way to publication it attempts to whitespace optimize the content. Now, rendering this, what do you get? It will look something like this if you view the rendered source:


<html>
<link rel="stylesheet" href="/css/screen_typo.1269174558.css">
<link rel="stylesheet" href="/css/print.1269178381.css" media="print">
<body>
<img src="/img/logo.1269170122.png" alt="Logo">
[[[ MAIN CONTENT SNIPPED ]]]
<script type="text/javascript" src="/js/site.1269198161.js"></script>
<script type="text/javascript">
indow.onload=function(){dostuff()};
</script>
</body>

As you can see timestamps are put into the URLs. These timestamps are the modification time of the files which means that you never run the risk of serving an old file by an already used name.

The next step is to wire this up in your Nginx. Here is the relevant rewrite rule:


location ^~ /css/  {
    root /var/mydjangosite/media;
    expires max;
    access_log off;
}
location ^~ /js/  {
    root /var/mydjangosite/media;
    expires max;
    access_log off;
}
location ^~ /img/  {
    root /var/mydjangosite/media;
    expires max;
    access_log off;
}

That wasn't particularly pretty. Besides as we haven't done any configuration yet this means that files like print.1269178381.css has been created inside your media files directory. Since these files are sooner or later going to be obsolete and they're never going to get included in your source control we probably want to put them somewhere else. Add this setting to your 'settings.py':


DJANGO_STATIC_SAVE_PREFIX = '/tmp/cache-forever'

That means that all the whitespace optimized files are put in this place instead. And the files that aren't whitespace optimized have symlinks into this directory.

The next problem with the Nginx config lines is that we're repeating ourselves for each prefix. Let's instead set a general prefix with this config:


DJANGO_STATIC_NAME_PREFIX = '/cache-forever'

And with that in place you can change your Nginx config to this:


location ^~ /cache-forever/  {
    root /tmp;
    expires max;
    access_log off;
}

django-static is wired up to depend on slimmer if available but you can use different ones, namely Yahoo! YUI Compressor and Google Closure Tools. So, let's use YUI Compressor for whitespace optimizing the CSS and Google Closure for the whitespace optimize the Javascript. Add this to your 'settings.py':


DJANGO_STATIC_CLOSURE_COMPILER = '/var/lib/stuff/compiler.jar'
DJANGO_STATIC_YUI_COMPRESSOR = '/var/lib/stuff/yuicompressor-2.4.2.jar'

Now we get the best possible whitespace optimization and a really neat Nginx configuration. Lastly (and this is optional) we might want to serve the static media from a different domain name as the browser won't download more than two resources at a time from the same domain. Or even better you might have a domain name dedicated that never accepts or sends any cookie headers (for example, Yahoo! uses yimg.com). This is accomplished by setting this setting:


DJANGO_STATIC_MEDIA_URL = 'http://static.peterbe.com' # no trailing slash

Now you're ready to go! Every single item on the big list above is checked. With a few easy steps and some modifications to your templates you can get the simplest yet best performing setup for your static media. As an example, study the static media URLs and headers of crosstips.org.

Some people might prefer to use a remote CDN to host the static media. This is something django-static is currently not able to do but I'm more than happy to accept patches and ideas from people who want to use it in production and who are eager to help. Everything else still applies. We would just need a callback function that can handle the network copy.

UPDATE

At the time of writing, version 1.3.7 has 93% test coverage. The number of lines of tests is double to the actual code itself. Code: 661 lines. Tests: 1358