Filtered by JavaScript, Python

Page 33

Reset

Difference between $.data('foo') and $.attr('data-foo') in jQuery

June 10, 2012
9 comments JavaScript

I learned something today thanks to my colleague Axel Hecht; the difference between $element.data('foo') and $element.attr('data-foo').

Basically, the .data() getter/setter is more powerful since it can do more things. For example:


<img id="image" data-number="42">

// numbers are turned to integers
assert($('#image').data('number') + 1 == 43);
// the more rudimentary way
assert($('#image').attr('data-number') + 1 == '421');

Integers is just one thing the .data() getter is able to parse. It can do other cool things too like booleans and JSON. Check out its docs

So, why would you NOT use .data()?

One reason is that with .data(name, value) the original DOM element is not actually modified. This can cause trouble if other pieces of Javascript depend on the value of a data- attribute further along in the page rendering process.

To see it in action check out: test.html

In conclusion: just be aware of it. Feel free to use the .data() getter/setter because it's way better but be aware of the potential risks.

How I deal with deferred image loading in Javascript

June 8, 2012
3 comments Web development, JavaScript

First of all, this technique is only really applicable to apps where there's only one big HTML template which is then shuffles, part hidden and part visible thanks to lots of Javascript. Those familiar with jQuery Mobile will have seen this.

On Around The World there are a lot of images. Majority of them you don't need to see immediately because only one screen is loaded at the time. The page structure looks like this:


<div class="section" id="page1">
  <h2>Page 1</h2>
  <img src="section-icon1.png">
</div>
<div class="section" id="page2" style="display:none">
  <h2>Page 2</h2>
  <img src="section-icon2.png">
</div>
<div class="section" id="page3" style="display:none">
  <h2>Page 3</h2>
  <img src="section-icon3.png">
</div>

So, if you load that you'll notice that your browser will download "section-icon1.png", "section-icon2.png" and "section-icon3.png" even though two of the images are not going to be displayed. Good for pre-loading the images when they're later needed but bad for the user experience since the browser will be busy downloading images rather than displaying the first visible section.

This is how I solve this; first I change the HTML to be this:


<div class="section" id="page1">
  <h2>Page 1</h2>
  <img src="." data-src="section-icon1.png" class="deferred">
</div>
<div class="section" id="page2" style="display:none">
  <h2>Page 2</h2>
  <img src="." data-src="section-icon2.png" class="deferred">
</div>
<div class="section" id="page3" style="display:none">
  <h2>Page 3</h2>
  <img src="." data-src="section-icon3.png" class="deferred">
</div>

And now for the magic that turns these img tags into real normal img tags. The truth is that the Javascript about loading individual sections is a bit more complicated but in its inner core it looks something like this:


// variable 'hash' is something like 'page2'
if ($(hash + '.section').size()) {
  $('.section:visible').hide();
  $(hash + '.section').show();
  $('img.deferred', hash).each(function() {
    var el = $(this);
    el.attr('src', el.data('src'));
    el.removeClass('deferred');
  });
  ...

It makes the HTML slightly more complicated but the end result is great. It's not just useful for the first-time load but also applicable every time someone reloads the page.

Secs sell! How I cache my entire pages (server-side)

May 10, 2012
1 comment Python, Django

I've blogged before about how this site can easily push out over 2,000 requests/second using only 6 WSGI workers excluding latency. The reason that's possible is because the whole page(s) can be cached server-side. What actually happens is that the whole rendered HTML blob is stored in the cache server (Redis in my case) so that no database queries are needed at all.

I wanted my site to still "feel" dynamic in the sense that once you post a comment (and it's published), the page automatically invalidates the cache and thus, the user doesn't have to refresh his browser when he knows it should have changed. To accomplish this I used a hacked cache_page decorator that makes the cache key depend on the content it depends on. Here's the code I actually use today for the home page:


def _home_key_prefixer(request):
    if request.method != 'GET':
        return None
    prefix = urllib.urlencode(request.GET)
    cache_key = 'latest_comment_add_date'
    latest_date = cache.get(cache_key)
    if latest_date is None:
        # when a blog comment is posted, the blog modify_date is incremented
        latest, = (BlogItem.objects
                   .order_by('-modify_date')
                   .values('modify_date')[:1])
        latest_date = latest['modify_date'].strftime('%f')
        cache.set(cache_key, latest_date, 60 * 60)
    prefix += str(latest_date)

    try:
        redis_increment('homepage:hits', request)
    except Exception:
        logging.error('Unable to redis.zincrby', exc_info=True)

    return prefix


@cache_page_with_prefix(60 * 60, _home_key_prefixer)
def home(request, oc=None):
    ...
    try:
        redis_increment('homepage:misses', request)
    except Exception:
        logging.error('Unable to redis.zincrby', exc_info=True)
    ...

And in the models I then have this:


@receiver(post_save, sender=BlogComment)
@receiver(post_save, sender=BlogItem)
def invalidate_latest_comment_add_dates(sender, instance, **kwargs):
    cache_key = 'latest_comment_add_date'
    cache.delete(cache_key)

So this means:

  • whole pages are cached for long time for fast access
  • updates immediately invalidates the cache for best user experience
  • no need to mess with ANY SQL caching

So, the next question is, if posting a comment means that the cache is invalidated and needs to be populated, what's the ratio of hits versus hits where the cache is cleared? Glad you asked. That's why I made this page:

www.peterbe.com/stats/

It allows me to monitor how often a new blog comment or general time-out means poor django needs to re-create the HTML using SQL.

At the time of writing, one in every 25 hits to the homepage requires the server to re-generate the page. And still the content is always fresh and relevant.

The next level of optimization would be to figure out whether a particular page update (e.g. a blog comment posting on a page that isn't featured on the home page) should or should not invalidate the home page. esp

Are WebSockets faster than AJAX? ...with latency in mind?

April 22, 2012
25 comments Web development, JavaScript

The advantage with WebSockets (over AJAX) is basically that there's less HTTP overhead. Once the connection has been established, all future message passing is over a socket rather than new HTTP request/response calls. So, you'd assume that WebSockets can send and receive much more messages per unit time. Turns out that that's true. But there's a very bitter reality once you add latency into the mix.

So, I created a simple app that uses SockJS and an app that uses jQuery AJAX to see how they would perform under stress. Code is here. All it does is basically, send a simple data structure to the server which echos it back. As soon as the response comes back, it starts over. Over and over till it's done X number of iterations.

Here's the output when I ran this on localhost here on my laptop:

# /ajaxtest (localhost)
start!
Finished
10 iterations in 0.128 seconds meaning 78.125 messages/second
start!
Finished
100 iterations in 0.335 seconds meaning 298.507 messages/second
start!
Finished
1000 iterations in 2.934 seconds meaning 340.832 messages/second

# /socktest (localhost)
Finished
10 iterations in 0.071 seconds meaning 140.845 messages/second
start!
Finished
100 iterations in 0.071 seconds meaning 1408.451 messages/second
start!
Finished
1000 iterations in 0.466 seconds meaning 2145.923 messages/second

Wow! It's so fast that the rate doesn't even settle down. Back-of-an-envelope calculation tells me the WebSocket version is 5 times faster roughly. Again; wow!

Now reality kicks in! It's obviously unrealistic to test against localhost because it doesn't take latency into account. I.e. it doesn't take into account the long distance the data has to travel from the client to the server.

So, I deployed this test application on my server in London, England and hit it from my Firefox here in California, USA. Same number of iterations and I ran it a number of times to make sure I don't get hit by sporadic hickups on the line. Here are the results:

# /ajaxtest (sockshootout.peterbe.com)
start!
Finished
10 iterations in 2.241 seconds meaning 4.462 messages/second
start!
Finished
100 iterations in 28.006 seconds meaning 3.571 messages/second
start!
Finished
1000 iterations in 263.785 seconds meaning 3.791 messages/second

# /socktest (sockshootout.peterbe.com) 
start!
Finished
10 iterations in 5.705 seconds meaning 1.752 messages/second
start!
Finished
100 iterations in 23.283 seconds meaning 4.295 messages/second
start!
Finished
1000 iterations in 227.728 seconds meaning 4.391 messages/second

Hmm... Not so cool. WebSockets are still slightly faster but the difference is negligable. WebSockets are roughly 10-20% faster than AJAX. With that small a difference I'm sure the benchmark is going to vastly effected by other factors that make it unfair for one or the the other such as quirks in my particular browser or the slightest hickup on the line.

What can we learn from this? Well, latency kills all the fun. Also, it means that you don't necessarily need to re-write your already working AJAX heavy app just to gain speed because even though it's ever so slightly faster, the switch from AJAX to WebSocket comes with other risks and challenges such as authentication cookies, having to deal with channel concurrency, load balancing on the server etc.

Before you say it, yes I'm aware than WebSocket web apps comes with other advantages such as being able to hold on to sockets and push data at will from the server. Those are juicy benefits but massive performance boosts ain't one.

Also, I bet that writing this means that peeps will come along and punch hole in my code and my argument. Something I welcome with open arms!

String length truncation optimization difference in Python

March 19, 2012
8 comments Python

We have a piece of code that is going to be run A LOT on a server infrastructure that needs to be fast. I know that I/O is much more important but because I had the time I wanted to figure out which is fastest:


def a(s, m):
    if len(s) > m:
        s = s[:m]
    return s

...or...


def b(s, m):
    return s[:m]

Truncated! Read the rest by clicking the link below.

When to __deepcopy__ classes in Python

March 14, 2012
9 comments Python

When using mutables in Python you have to be careful:


>>> a = {'value': 1}
>>> b = a
>>> a['value'] = 2
>>> b
{'value': 2}

So, you use the copy module from the standard library:


>>> import copy
>>> a = {'value': 1}
>>> b = copy.copy(a)
>>> a['value'] = 2
>>> b
{'value': 1}

That's nice but it's limited. It doesn't deal with the nested mutables as you can see here:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.copy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'else'}}

That's when you need the copy.deepcopy function:


>>> a = {'value': {'name': 'Something'}}
>>> b = copy.deepcopy(a)
>>> a['value']['name'] = 'else'
>>> b
{'value': {'name': 'Something'}}

Now, suppose we have a custom class that overrides the dict type. That's a very common thing to do. Let's demonstrate:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(name='Value')
>>> b = copy.copy(a)
>>> a['name'] = 'Other'
>>> b
{'name': 'Value'}

And again, if you have a nested mutable object you need copy.deepcopy:


>>> class ORM(dict):
...     pass
... 
>>> a = ORM(data={'name': 'Something'})
>>> b = copy.deepcopy(a)
>>> a['data']['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

But oftentimes you'll want to make your dict subclass behave like a regular class so you can access data with dot notation. Like this:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'

Now here's a problem. If you do that, you loose the ability to use copy.deepcopy since the class has now been slightly "abused".


>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/copy.py", line 172, in deepcopy
    copier = getattr(x, "__deepcopy__", None)
  File "<stdin>", line 3, in __getattr__
KeyError: '__deepcopy__'

Hmm... now you're in trouble and to get yourself out of it you have to define a __deepcopy__ method as well. Let's just do it:


>>> class ORM(dict):
...     def __getattr__(self, key):
...         return self[key]
...     def __deepcopy__(self, memo):
...         return ORM(copy.deepcopy(dict(self)))
... 
>>> a = ORM(data={'name': 'Something'})
>>> a.data['name']
'Something'
>>> b = copy.deepcopy(a)
>>> a.data['name'] = 'else'
>>> b
{'data': {'name': 'Something'}}

Yeah!!! Now we get what we want. Messing around with the __getattr__ like this is, as far as I know, the only time you have to go in and write your own __deepcopy__ method.

I'm sure hardcore Python language experts can point out lots of intricacies about __deepcopy__ but since I only learned about this today, having it here might help someone else too.

Persistent caching with fire-and-forget updates

December 14, 2011
4 comments Python, Tornado

I just recently landed some patches on toocool that implements and interesting pattern that is seen more and more these days. I call it: Persistent caching with fire-and-forget updates

Basically, the implementation is this: You issue a request that requires information about a Twitter user: E.g. http://toocoolfor.me/following/chucknorris/vs/peterbe The app looks into its MongoDB for information about the tweeter and if it can't find this user it goes onto the Twitter REST API and looks it up and saves the result in MongoDB. The next time the same information is requested, and the data is available in the MongoDB it instead checks if the modify_date or more than an hour and if so, it sends a job to the message queue (Celery with Redis in my case) to perform an update on this tweeter.

You can basically see the code here but just to reiterate and abbreviate, it looks like this:


tweeter = self.db.Tweeter.find_one({'username': username})
if not tweeter:
   result = yield tornado.gen.Task(...)
   if result:
       tweeter = self.save_tweeter_user(result)
   else:
       # deal with the error!
elif age(tweeter['modify_date']) > 3600:
   tasks.refresh_user_info.delay(username, ...)
# render the template!

What the client gets, i.e. the user using the site, is it that apart from the very first time that URL is request is instant results but data is being maintained and refreshed.

This pattern works great for data that doesn't have to be up-to-date to the second but that still needs a way to cache invalidate and re-fetch. This works because my limit of 1 hour is quite arbitrary. An alternative implementation would be something like this:


tweeter = self.db.Tweeter.find_one({'username': username})
if not tweeter or (tweeter and age(tweeter) > 3600 * 24 * 7):
    # re-fetch from Twitter REST API
elif age(tweeter) > 3600:
    # fire-and-forget update

That way you don't suffer from persistently cached data that is too old.

Python file with closing automatically

December 3, 2011
2 comments Python

Perhaps someone who knows more about the internals of python and the recent changes in 2.6 and 2.7 can explain this question that came up today in a code review.

I suggest using with instead of try: ... finally: to close a file that was written to. Instead of this:


dest = file('foo', 'w')
try:
   dest.write('stuff')
finally:
   dest.close()
print open('foo').read()  # will print 'stuff'

We can use this:


with file('foo', 'w') as dest: 
    dest.write('stuff')
print open('foo').read()  # will print 'stuff'

Why does that work? I'm guessing it's because the file() instance object has a built in __exit__ method. Is that right?

That means I don't need to use contextlib.closing(thing) right?

For example, suppose you have this class:


class Farm(object):
   def __enter__(self):
       print "Entering"
       return self
   def __exit__(self, err_type, err_val, err_tb):
       print "Exiting", err_type
       self.close()
   def close(self):
       print "Closing"

with Farm() as farm:
   pass
# this will print:
#   Entering
#   Exiting None
#   Closing

Another way to achieve the same specific result would be to use the closing() decrorator:


class Farm(object):
   def close(self):
       print "Closing"

from contextlib import closing
with closing(Farm()) as farm:
   pass
# this will print:
#   Closing

So the closing() decorator "steals" the __enter__ and __exit__. This last one can be handy if you do this:


from contextlib import closing
with closing(Farm()) as farm:
   raise ValueError

# this will print
#  Closing
#  Traceback (most recent call last):
#   File "dummy.py", line 16, in <module>
#     raise ValueError
#  ValueError

This is turning into my own loud thinking and I think I get it now. contextlib.closing() basically makes it possible to do what I did there with the __enter__ and __exit__ and it seems the file() built-in has a exit handler that takes care of the closing already so you don't have to do it with any extra decorators.

Trivial but powerful tips for nosetests

November 19, 2011
0 comments Python

I'm clearly still a nosetests beginner because it was only today that I figured out how to set certain plugins to always be on.

First of all you might like these plugins too:


$ pip install rudolf
$ pip install disabledoc

Docs: rudolf and disabledoc

To get these gorgeous little tricks into every run of nosetests edit the file ~/.noserc and add the following:


[nosetests]
with-disable-docstring=1
with-color=1

That should make your life a little easier.

UPDATE:

I've since managed to shoot myself in both legs with messing around with nosetests plugins because I heavily rely on django-nose in Django. Long story short: be careful if you get strange import related errors!

Going real simple on HTML5 audio

October 14, 2011
0 comments Web development, JavaScript

DoneCal users are to 80+% Chrome and Firefox users. Both Firefox and Chrome support the HTML <audio> element without any weird plugins and they both support the Ogg Vorbis (.ogg) file format. change log here

So, I used use the rather enterprisey plugin called SoundManager2 which attempts to abstract away all hacks into one single API. It uses a mix of browser sniffing, HTML5 and Flash. Although very promising, it is quite cumbersome. It doesn't work flawlessly despite their hard efforts. Unfortunately, using it also means a 30kb (optimized) Javascript file and a 3kb .swf file (if needed). So, instead of worrying about my very few Internet Explorer users I decided to go really dumb and simple on this.

The solution basically looks like this:


// somewhere.js
var SOUND_URLS = {
  foo: 'path/to/foo.ogg',
  egg: 'path/to/egg.ogg'
};

// play-sounds.js

/* Call to create and partially download the audo element.
 * You can all this as much as you like. */
function preload_sound(key) {
 var id = 'sound-' + key;
 if (!document.getElementById(id)) {
   if (!SOUND_URLS[key]) {
     throw "Sound for '" + key + "' not defined";
   } else if (SOUND_URLS[key].search(/\.ogg/i) == -1) {
     throw "Sound for '" + key + "' must be .ogg URL";
   }
   var a = document.createElement('audio');
   a.setAttribute('id', id);
   a.setAttribute('src', SOUND_URLS[key]);
   document.body.appendChild(a);
 }
 return id;
}

function play_sound(key) {
  document.getElementById(preload_sound(key)).play();
}

// elsewhere.js
$.lightbox.open({
   onComplete: function() {
      preload_sound('foo');
   }
});
$('#lightbox button').click(function() {
   play_sound('foo');
});

Basically, only Firefox, Chrome and Opera support .ogg but it's a good and open source encoding so I don't mind being a bit of an asshole about it. This little script could be slightly extended with some browser sniffing to work with Safari people but right now it doesn't feel like it's worth the effort.

This make me happy and I feel lean and light. A good feeling!