Filtered by Python

Page 22

Reset

CommandLineApp by Doug Hellmann

February 22, 2008
0 comments Python

I just read the feature article "Command line programs are classes, too!" by Doug Hellmann in the January 2008 issue of Python Magazine about his program CommandLineApp and I've tried it out on one of my old Python programs where I do the opt parsing manually with getopt. The results are beautiful and quick. It's sprinkled with Doug specific magic but I quickly got over that when I saw out easy it was to work with. There are still a few questions of things I didn't manage to work out but that will unfortunately have to wait.

If anything, the worst thing about this library is that it's not part of the standard library so either you have to tell people to sudo easy_install CommandLineApp in the instructions or include it yourself in your packages if you prefer to ship things with a kitchen sink included.

If you want to check it out in action, either subscribe to the magazine (and support the effort) or just download csvcat

String comparison function in Python (alpha)

December 22, 2007
7 comments Python

I was working on a unittest which when it failed would say "this string != that string" and because some of these strings were very long (output of a HTML lib I wrote which spits out snippets of HTML code) it became hard to spot how they were different. So I decided to override the usual self.assertEqual(str1, str2) in Python's unittest class instance with this little baby:


def assertEqualLongString(a, b):
   NOT, POINT = '-', '*'
   if a != b:
       print a
       o = ''
       for i, e in enumerate(a):
           try:
               if e != b[i]:
                   o += POINT
               else:
                   o += NOT
           except IndexError:
               o += '*'

       o += NOT * (len(a)-len(o))
       if len(b) > len(a):
           o += POINT* (len(b)-len(a))

       print o
       print b

       raise AssertionError, '(see string comparison above)'

It's far from perfect and doesn't really work when you've got Unicode characters that the terminal you use can't print properly. It might not look great on strings that are really really long but I'm sure that's something that can be solved too. After all, this is just a quick hack that helped me spot that the difference between one snippet and another was that one produced <br/> and the other produced <br />. Below are some examples of this utility function in action.

Truncated! Read the rest by clicking the link below.

Calculator in Python for dummies

December 17, 2007
17 comments Python

I need a mini calculator in my web app so that people can enter basic mathematical expressions instead of having to work it out themselfs and then enter the result in the input box. I want them to be able to enter "3*2" or "110/3" without having to do the math first. I want this to work like a pocket calculator such that 110/3 returns a 36.6666666667 and not 36 like pure Python arithmetic would. Here's the solution which works but works like Python:


def safe_eval(expr, symbols={}):
   return eval(expr, dict(__builtins__=None), symbols)

def calc(expr):
   return safe_eval(expr, vars(math))

assert calc('3*2')==6
assert calc('12.12 + 3.75 - 10*0.5')==10.87
assert calc('110/3')==36

Truncated! Read the rest by clicking the link below.

WSSE Authentication and Apache

December 13, 2007
1 comment Python

I recently wrote a Grok application that implements a REST API for Atom Publishing so that I can connect a website I have via my new Nokia phone has LifeBlog which uses the Atom API to talk to the server.

Anyway, the authentication on Atom is WSSE (good introduction article) which basically works like this:


PasswordDigest = Base64 \ (SHA1 (Nonce + CreationTimestamp + Password))

This is one of the pieces in a request header called Authorization which can look something like this:


Authorization: WSSE profile="UsernameToken"
X-WSSE: UsernameToken Username="bob", PasswordDigest="quR/EWLAV4xLf9Zqyw4pDmfV9OY=", 
Nonce="d36e316282959a9ed4c89851497a717f", Created="2003-12-15T14:43:07Z"

What I did was I wrote a simple Python script to mimic what the Nokia does but from a script. The script creates a password digest using these python modules: sha, binascii and base64 and then fires off a POST request. Here's thing, if you generate this header with base64.encodestring(ascii_string) you get something like this:


quR/EWLAV4xLf9Zqyw4pDmfV9OY=\n

Notice the extra newline character at the end of the base64 encoded string. This is perfectly valid and is decoded easily with base64.decodestring(base64_string) by the Grok app. Everything was working fine when I tried posting to http://localhost:8080/++rest++atompub/snapatom and my application successfully authenticated the dummy user. I was happy.

Then I set this up properly on atom.someotherdomain.com which was managed by Apache who internally rewrote the URL to a Grok on localhost:8080. The problem now was that the Authentication header value was broken into two lines because of the newline character and then the whole request was rejected by Apache because some header values came without a : semi-colon.

The solution was to not use base64.encodestring() and base64.decodestring() but to instead use base64.urlsafe_b64encode() and base64.urlsafe_b64decode(). Let me show you:


>>> import base64
>>> x = 'Peter'
>>> base64.encodestring(x)
'UGV0ZXI=\n'
>>> base64.urlsafe_b64encode(x)
'UGV0ZXI='
>>> base64.decodestring(base64.urlsafe_b64encode(x))
'Peter'

If you're still reading, then hopefully you won't make the same mistake as I did and wasting time on trying to debug Apache. The lesson learned from this is to use the URL safe base64 header values and not the usual ones.

geopy distance calculation pitfall

December 10, 2007
1 comment Python

Geopy is a great little Python library for working with geocoding and distances using various online services such as Google's geocoder API.

Today I spent nearly half an hour trying to debug what was going on with my web application since I was getting this strange error:


AttributeError: 'VincentyDistance' object has no attribute '_kilometers'

Truncated! Read the rest by clicking the link below.

Spellcorrector 0.2

September 24, 2007
3 comments Python

Unlike previous incarnations of Spellcorrector not it does not by default load the two huge language files for English and Swedish. Alternatively/additionally you can load your own language file. The difference between loading a language file and training on your own words is that trained words are always assumed to be correct.

Another major change with this release is that a pickle file is created once the language file or own training file has been parsed once. This works like a cache, if the original text file changes, the pickle file is recreated. The outcome of this is that the first time you create a Spellcorrector instance it takes a few seconds if the language files is large but on the second time it takes virtually no time at all.

Truncated! Read the rest by clicking the link below.

html2plaintext Python script to convert HTML emails to plain text

August 10, 2007
12 comments Python

From the doc string:


A very spartan attempt of a script that converts HTML to
plaintext.

The original use for this little script was when I send HTML emails out I also
wanted to send a plaintext version of the HTML email as multipart. Instead of 
having two methods for generating the text I decided to focus on the HTML part
first and foremost (considering that a large majority of people don't have a 
problem with HTML emails) and make the fallback (plaintext) created on the fly.

This little script takes a chunk of HTML and strips out everything except the
<body> (or an elemeny ID) and inside that chunk it makes certain conversions 
such as replacing all hyperlinks with footnotes where the URL is shown at the
bottom of the text instead. <strong>words</strong> are converted to *words* 
and it does a fair attempt of getting the linebreaks right.

As a last resort, it strips away all other tags left that couldn't be gracefully
replaced with a plaintext equivalent.
Thanks for Fredrik Lundh's unescape() function things like:
   'Terms &amp;amp; Conditions' is converted to
   'Termss &amp; Conditions'

It's far from perfect but a good start. It works for me for now.

Version at the time of writing this: 0.1.

I wouldn't be surprised if I've reinvented the wheel here but I did plenty of searches and couldn't really find anything like this.

Let's run this for a while until I stumble across some bugs or other inconsistencies which I haven't quite done yet. The one thing I'm really unhappy about is the way I extract the body from the BeautifulSoup parse object. I really couldn't find another better way in the few minutes I had to spare on this.

Feel free to comment on things you think are pressing bugs.

You can download the script here html2plaintext.py version 0.1

UPDATE

I should take a second look at Aaron Swartz's html2text.py script the next time I work on this. His script seems a lot more mature and Aaron is brilliant Python developer.

Spellcorrector

April 18, 2007
3 comments Python

Spellcorrector being used on my not-yet-released web app I think a lot of Python people have seen Peter Novig's beautiful article about How to Write a Spelling Corrector. So have I and couldn't wait to write my own little version of it to fit my needs.

The changes I added were:

  • Python 2.4 compatible
  • Uses a pickleable dict instead of a collection
  • Compiled a huge list of Swedish words
  • Skipped edit distances 2 of words longer than 10 characters
  • Added a function suggestions()
  • All Unicode instead
  • A class instead of a function
  • Ability to train on your own words and to save that training persistently

Truncated! Read the rest by clicking the link below.

is is not the same as equal in Python

December 1, 2006
8 comments Python

Don't do the silly misstake that I did today. I improved my code to better support unicode by replacing all plain strings with unicode strings. In there I had code that looked like this:


if type_ is 'textarea':
   do something

This was changed to:


if type_ is u'textarea':
   do something

And it no longer matched since type_ was a normal ascii string. The correct wat to do these things is like this:


if type_ == u'textarea':
    do something
elif type_ is None:
    do something else

Remember:


>>> "peter" is u"peter"
False
>>> "peter" == u"peter"
True
>>> None is None
True
>>> None == None
True