Filtered by Python

Page 13

Reset

hashin - a replacement for peepin

January 26, 2016
0 comments Python

tl;dr Stop using peepin. Start using hashin

Today I proudly release hashin (on PyPI). It's a replacement of peepin (on PyPI). Yes, I know that's confusing.

A couple of days ago my friend Erik Rose gloriously took his peep project and got it embedded in pip 8.0 proper so, as of that, the right thing to do is to upgrade to pip 8 and delete your peep.py.

With that change, it no longer makes sense to use peepin. It had a good run. Bye bye.

But much of the code lives on in hashin. It's basically a fork but with different logics on A) how it gets the hash and B) how it renders the automatic changes to your requirements file.

First, if you haven't already done so:

$ pip install -U peep pip
$ pip --version  # version 8 right?
$ peep port requirements.txt
$ pip uninstall peep
$ pip install --require-hashes -r requirements.txt

Check out Erik's guide.

Now, you can deal with the companion.

$ pip uninstall peepin
$ pip install hashin
$ touch /tmp/test.txt
$ hashin --verbose html2text simplejson /tmp/test.txt

What's Next?

If Erik managed to get peep into pip, surely I can get hashin into pip. Hoping for some encouragement from @dstufft and @jezdez :)

Headsupper.io

December 5, 2015
0 comments Python, Web development, Django, JavaScript, React

tl;dr

Headsupper.io is a free GitHub webhook service that emails people when commits have the configurable keyword "headsup" in it.

Introduction

Headsupper.io is great for when you have a GitHub project with multiple people working on it and when you make a commit you want to notify other people by email.

Basically, you set up a GitHub Webhook, on pushes, to push to https://headsupper.io and then it'll parse the incoming push and its commits and look for certain things in the commit message. By default, it'll look for the word "headsup". For example, a git commit message might look like this:

fixes #123 - more juice in the Saab headsup! will require updating

Or you can use the multi-line approach where the first line is short and sweat and after the break a bit more elaborate:

bug 1234567 - tea kettle upgrade 2.1

Headsup: Next time you git pull from master, remember to run 
peep install on the requirements.txt file since this commit 
introduces a bunch of crazt dependency changes.

Git commits that come through that don't have any match on this word will simply be ignored by Headsupper.

How you use it

Maybe paradoxically, you need to authenticate with your GitHub account but that's in read-only mode and does NOT set up the Webhook for you. The reason you have to authenticate to prepare a configuration on headsupper.io is to tie the configuration to a real user.

Once you've authenticated you get the option to create your first configuration, then you have to enter at least these three piece of information:

  1. The GitHub "full name". This is the org name, slash, repo name. E.g. peterbe/django-peterbecom or mozilla/socorro.
  2. Pick a secret. Remember what you typed, because you'll need to type in this same secret when you set up the Webhook on your GitHub project's Webhooks page. (This is used to checksum and verify the source of the Webhook push)
  3. Who to send to. A list of email addresses separated with a newline or a semi-colon.

Once you've set that up, you'll need to go to your GitHub project's Setting page and enter a new Webhook and the URL you need to type in is https://headsupper.io and for the "Secret" type in that secret you used earlier. That's it!

Rules and options

The word that triggers is configurable by you. The default is headsupper. And by default, it's case insensitive. You can change that so it's case sensitive. Also, the word has to be word delimited on the left (e.g. a space or a newline character) and on the right it needs to be a space, a : or a !. So this won't match: theheadsup: or headsupper.

Other optional things you can configure are:

  • Which git branch to trigger on (by default it's master)
  • Which emails to CC when it sends
  • Which emails to BCC when it sends
  • Only send when you make a tag

That last option, Only send when a new tag is created, is interesting. I added that option because at work, we make production server releases by pushing a git tag. When a tag is pushed, all those commits are sent to the continuous deployment service which makes a server upgrade. This means you get a chance to enter a heads up message to be emailed to the people who care about new deployments going out.

How it was built

It's a mix between Django and ReactJS. The whole client-side app it built statically with Webpack in ES6. It's served as static files through Nginx. But Nginx is making an exception on all URLs that start with /api or /accounts. The /api/* it used for loading and setting JSON. The /accounts/* is used for the GitHub OAuth endpoints.

What's interesting about this the architecture is that it's using HTTP cookies. Not API tokens. Cookies are quite good in that they're established and the browser does all the automated work of keeping it secure and making each request potentially authenticated.

Here's the relevant React code and here's the relevant Django code that processes the Webhook.

The whole project is available on: https://github.com/peterbe/headsupper.

Also, I made a demo at the November Mozilla Beer and Tell.

Django forms and making datetime inputs localized

December 4, 2015
2 comments Python, Django

tl;dr

To change from one timezone aware datetime to another, turn it into a naive datetime and then use pytz's localize() method to convert it back to the timezone you want it to be.

Introduction

Suppose you have a Django form where you allow people to enter a date, e.g. 2015-06-04 13:00. You have to save it timezone aware, because you have settings.USE_TZ on and it's just many times to store things in timezone aware dates.

By default, if you have settings.USE_TZ and no timezone information is in the string that the django.form.fields.DateTimeField parses, it will use settings.TIME_ZONE and that timezone might be different from what it really should be. For example, in my case, I have an app where you can upload a CSV file full of information about events. These events belong to a venue which I have in the database. Every venue has a timezone, e.g. Europe/Berlin or US/Pacific. So if someone uploads a CSV file for the Berlin location 2015-06-04 13:00 means 13:00 o'clock in Berlin. I don't care where the server is hosted and what its settings.TIME_ZONE is. I need to make that input timezone aware specifically for Berlin/Europe.

Examples

Suppose you have settings.TIME_ZONE == 'US/Pacific' and you let the django.form.fields.DateTimeField do its magic you get something you don't want:


>>> from django.conf import settings
>>> settings.TIME_ZONE
'US/Pacific'
>>> assert settings.USE_TZ
>>> from django.forms.fields import DateTimeField
>>> DateTimeField().clean('2015-06-04 13:00')
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)

See! That's wrong. Sort of. Not Django's fault. What I need to do is to convert that datetime object into one that is timezone aware on the Europe/Berlin timezone.

In old versions of pytz, specifically <=2014.2 you could do this:


>>> import pytz
>>> pytz.VERSION
'2014.2'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>)

But in modern versions of pytz you can't do that because if you don't use the pytz.timezone instance to localize it will use the default version which might be one of those crazy "Local Mean Time" which they used a 100 years ago. E.g.


>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> tz = pytz.timezone('Europe/Berlin')
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>)

See, it's that crazy LMT+0:53:00 that's oft talked of on Stackoverflow!

Here's the trick

The trick is to use pytz.timezone(MY TIME ZONE NAME).localize(MY NAIVE DATETIME OBJECT). When you use the .localize() method pytz can use the date to make sure it uses the right conversion for that named timezone.

And in the case of our overly smart django.form.fields.DateTimeField it means we need to convert it back into a naive datetime object and then localize it.


>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date = date.replace(tzinfo=None)
>>> date
datetime.datetime(2015, 6, 4, 13, 0)
>>> tz = pytz.timezone('Europe/Berlin')
>>> tz.localize(date)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CEST+2:00:00 DST>)

That was much harder than it needed to be. Timezones are hard. Especially when you have the human element of people typing in things and just, rightfully, expect the system to figure it out and get it right.

I hope this helps the next schmuck who has/had to set aside an hour to figure this out.

Whatsdeployed

November 11, 2015
4 comments Python, Web development, Mozilla

Whatsdeployed was a tool I developed for my work at Mozilla. I think many other organizations can benefit from using it too.

So, on many sites, what we do when deploying a site, is that we note which git sha was used and write that to a file which is then exposed via the web server. Like this for example. If you know that sha and what's at the tip of the master branch on the project's GitHub page, you can build up an interesting dashboard that allows you to see what's available and what's been deployed.

Sample Whatsdeployed screen for the Mozilla Socorro project
The other really useful case is when you have more than just one environment. For example, you might have a dev, stage and prod environment and, always lastly, the master branch on GitHub. Now you can see what code has been shipped on prod versus your staging environment for example.

This is one of those far too few projects that you build quickly one Friday afternoon and it turns out to be surprisingly useful to a lot of people. I for one, check various projects like this several times per day.

The code is on GitHub and it's basically a tiny bit of Flask with some jQuery doing a couple of AJAX requests. If you enjoy it and use it, please share.

UPDATE

Blogged about a facelift, Jan 2018

ElasticSearch, snowball analyzer and stop words

September 25, 2015
1 comment Python

Disclaimer: I'm an ElasticSearch noob. Go easy on me

I have an application that uses ElasticSearch's more_like_this query to find related content. It basically works like this:

>>> index(index, doc_type, {'id': 1, 'title': 'Your cool title is here'})
>>> index(index, doc_type, {'id': 2, 'title': 'About is a cool headline'})
>>> index(index, doc_type, {'id': 3, 'title': 'Titles are your big thing'})

Then you can pick one ID (1, 2 or 3) and find related ones.
We can tell by looking at these three silly examples, the 1 and 2 have the words "is" and "cool" in common. 1 and 3 have "title" (stemming taken into account) and "your" in common. However, is there much value in connected these documents on the words "is" and "your"? I think not. Those are stop words. E.g. words like "the", "this", "from", "she" etc. Basically words that are commonly used as "glue" between more unique and specific words.

Anyway, if you index something in ElasticSearch as a text field you get, by default, the "standard" analyzer to analyze the incoming stuff to be indexed. The standard analyzer just splits the words on whitespace. A more compelling analyzer is the Snowball analyzer (original here) which supports intelligent stemming (turning "wife" ~= "wives") and stop words.

The problem is that the snowball analyzer has a very different set of stop words. We did some digging and thought this was the list it bases its English stop words on. But this was wrong. Note that that list has words like "your" and "about" listed there.

The way to find out how your analyzer treats a string and turns it into token is to the the _analyze tool. For example:

curl -XGET 'localhost:9200/{myindexname}/_analyze?analyzer=snowball' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 5,
      "token": "about",
      "type": "<ALPHANUM>",
      "start_offset": 0,
      "position": 1
    },
    {
      "end_offset": 10,
      "token": "your",
      "type": "<ALPHANUM>",
      "start_offset": 6,
      "position": 2
    },
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

So what you can see is that it finds the tokens "about", "your", "special" and "word". But it stop word ignored "is", "a" and "the". Hmm... I'm not happy with that. I don't think "about" and "your" are particularly helpful words.

So, how do you define your own stop words and override the one in the Snowball analyzer? Well, let me show you.

In code, I use pyelasticsearch so the index creation is done in Python.


STOPWORDS = (
    "a able about across after all almost also am among an and "
    "any are as at be because been but by can cannot could dear "
    "did do does either else ever every for from get got had has "
    "have he her hers him his how however i if in into is it its "
    "just least let like likely may me might most must my "
    "neither no nor not of off often on only or other our own "
    "rather said say says she should since so some than that the "
    "their them then there these they this tis to too twas us "
    "wants was we were what when where which while who whom why "
    "will with would yet you your".split()
)

def create():
    es = get_connection()
    index = get_index()
    es.create_index(index, settings={
        'settings': {
            'analysis': {
                'analyzer': {
                    'extended_snowball_analyzer': {
                        'type': 'snowball',
                        'stopwords': STOPWORDS,
                    },
                },
            },
        },
        'mappings': {
            doc_type: {
                'properties': {
                    'title': {
                        'type': 'string',
                        'analyzer': 'extended_snowball_analyzer',
                    },
                }
            }
        }
    })

With that in place, now delete your index and re-create it. Now you can use the _analyze tool again to see how it analyzes text on this particular field. But note, to do this we need to know the name of the index we used. (so replace {myindexname} in the URL):

$ curl -XGET 'localhost:9200/{myindexname}/_analyze?field=title' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

Cool! Now we see that it considers "about" and "your" as stop words. Much better. This is handy too because you might have certain words that are globally not very common but within your application it's very repeated and not very useful.

Thank you willkg and Erik Rose for your support in tracking this down!

django-semanticui-form

September 14, 2015
2 comments Python, Django

I'm working on a (side)project in Django that uses the awesome Semantic UI CSS framework. This project has some Django forms that are rendered on the server and so I can't let Django render the form HTML or else the CSS framework can't do its magic.

The project is called django-semanticui-form and it's a fork from django-bootstrap-form.

It doesn't come with the Semantic UI CSS files at all. That's up to you. Semantic UI is available as a big fat bundle (i.e. one big .css file) but generally you just pick the components you want/need. To use it in your Django templates simply, create a django.forms.Form instance and render it like this:


{% load semanticui %}

<form>
  {{ myform | semanticui }}
</form>

The project is very quickly put together. The elements I intend to render seem to work but you might find that certain input elements don't work as nicely. However, if you want to help on the project, it's really easy to write tests and run tests. And Travis and automatic PyPI deployment is all set up so pull requests should be easy.

peepin - a great companion to peep

September 10, 2015
0 comments Python

I actually wrote peepin several months ago but forgot to blog about it.
It's a great library that accompanies peep which is a wrapper on top of pip. Actually, it's for pip install. When you normally do pip install -r requirements.txt the only check it does is on the version number, assuming your requirements.txt has lines in it like Django==1.8.4. With peep it does a checksum comparison of the wheel, tarball or zip file. It basically means that the installer will get EXACTLY the same package files as was used by the developer who decides to add it to requirements.txt.

If you're using pip and want strong reliability and much higher security, I strongly recommend you consider switching to peep.

Anyway, what peepin is, is a executable use to modify your requirements.txt automatically for you. It can do two things. At least one.

1) Automatically figure out what the right checksums should be.
2) It can figure out what is the latest version on PyPI.

For example:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form $)$ peepin --verbose django-bootstrap-form
* Latest version for 3.2
https://pypi.python.org/pypi/django-bootstrap-form/3.2
* Found URL https://pypi.python.org/packages/source/d/django-bootstrap-form/django-bootstrap-form-3.2.tar.gz#md5=1e95b05a12362fe17e91b962c41d139e
*   Re-using /var/folders/1x/2hf5hbs902q54g3bgby5bzt40000gn/T/django-bootstrap-form-3.2.tar.gz
*   Hash AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
* Editing requirements.txt

And once that's done...:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form *$)$ git diff
diff --git a/requirements.txt b/requirements.txt
index a6600f1..5f1374c 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -83,8 +83,8 @@ BeautifulSoup==3.2.1
 django_compressor==1.4
 # sha256: F3KVsUQkAMks22fo4Y-f9ZRvtEL4WBO50IN4I3IuoI0
 django-cronjobs==0.2.3
-# sha256: 2G3HpwzvCTy3dc1YE7H4XQH6ZN8M3gWpkVFR28OOsNE
-django-bootstrap-form==3.1
+# sha256: AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
+django-bootstrap-form==3.2
 # sha256: jiOPwzhIDdvXgwiOhFgqN6dfB8mSdTNzMsmjmbIBkfI
 regex==2014.12.24
 # sha256: ZY2auoUzi-jB0VMsn7WAezgdxxZuRp_w9i_KpCQNnrg
 

If you want to you can open up and inspect the downloaded package and check that no hacker has meddled with the package. Or, if you don't have time to do that, at least use the package locally and run your tests etc. If you now feel comfortable with the installed package you can be 100% certain that will be installed on your server once the code goes into production.

Be careful with using dict() to create a copy

September 9, 2015
9 comments Python

Everyone who's done Python for a while soon learns that dicts are mutable. I.e. that they can change.

One way of "forking" a dictionary into two different ones is to create a new dictionary object with dict(). E.g:


>>> first = {'key': 'value'}
>>> second = dict(first)
>>> second['key'] = 'other'
>>> first
{'key': 'value'}
>>> second
{'key': 'other'}

See, you can change the value of a key without affecting the dictionary it came from.

But, if one of the values is also mutable, beware!


>>> first = {'key': ['value']}
>>> second = dict(first)
>>> second['key'].append('second value')
>>> first
{'key': ['value', 'second value']}
>>> second
{'key': ['value', 'second value']}

This is where you need to use the built in copy.deepcopy.


>>> import copy
>>> first = {'key': ['value']}
>>> second = copy.deepcopy(first)
>>> second['key'].append('second value')
>>> first
{'key': ['value']}
>>> second
{'key': ['value', 'second value']}

Yay! Hope it helps someone avoid some possibly confusing bugs some day.

UPDATE

As ëRiC reminded me, there are actually three ways to make a "shallow copy" of a dictionary:

1) some_copy = dict(some_dict)

2) some_copy = some_dict.copy()

3) some_copy = copy.copy(some_dict) # after importing 'copy'

Introducing optisorl

August 18, 2015
0 comments Python

optisorl is a Python package for sorl-thumbnail which is a kick-ass Python package for Django. sorl-thumbnail is pretty popular and used by a lot of people who have images they want to display as thumbnails.

A problem you find is that oftentimes the PNG thumbnails aren't as optimized as they can be. A great tool for having a second optimization pass on an PNG file is pngquant. You basically, run it like this:

$ ls -l bugzilla.png
-rw-r--r--@ 1 peterbe  staff  12188 Dec 12  2014 bugzilla.png
$ pngquant bugzilla.png
:~/Downloads$ ls -l bugzilla-fs8.png
-rw-r--r--@ 1 peterbe  staff  6630 Aug 18 13:15 bugzilla-fs8.png

That's a 140x140 pixel PNG that became 5,558 bytes smaller (46% saving).

Anyway, this is where optisorl comes in. It's an extension to sorl-thumbnail that is able to execute pngquant on the PNG right after the thumbnail file has been created. It does so by calling out a sub-process command to pngquant. See the code here which is all the magic there is to it really.

The reason I built this was to reduce the images on Air Mozilla. At the time I did the measurement, the PNGs total weight on the home page was 129KB and after running them all through optisorl the total weight was only 65KB.

To install, it just pip install it like so:

$ pip install optisorl

And you need to install pngquant like brew install pngquant or apt-get install pngquant.

Then, to activate it you need to set this Django setting:


THUMBNAIL_BACKEND = 'optisorl.backend.OptimizingThumbnailBackend'

If you decide to put the pngquant executable somewhere not on the PATH you can add to your settings.py file something like this:


PNGQUANT_LOCATION = '/path/to/bin/pngquant'

There's a bunch of features it doesn't have but we can work together on that. For example, there are certain PNG images that you might want to display as thumbnails but due to something about the image, e.g. its use of Alpha channels, you might want to explicitly disable optimizations.

Premailer.io

July 8, 2015
4 comments Python, Web development, AngularJS, JavaScript

Premailer is a Python library for turning a HTML + CSS into HTML with all the CSS embedded as inline style attributes. This is sadly very necessary to ensure that your fancy HTML emails look spiffy across all email clients and email webapps.

So, last week I put together a little site to test the library via a browser: Premailer.io

It's just a simple webapp with a form where you can enter HTML in three different ways; textarea, by URL and by file upload.

You can also override all the possible advanced options that premailer supports.

What's kinda cool is that you can get a preview of how the HTML document will look like in an iframe that is dynamically loaded with the result from the conversion.

The webapp is of course open source and available on github.com/peterbe/premailer.io. The front-end is an AngularJS app and the build system is Lineman.js. The server is a Falcon server running on uWSGI via Nginx.

There's very little fancy here. There's no limitations or protections. I just hope it becomes handy for people to test premailer out.

The inspiration came from MailChimp's CSS Inliner Tool which is cute but very basic and doesn't allow you the same kinds of input.

If anybody with some AngularJS or highlight.js chops has time I'd love to help fix why the HTML is not syntax highlighted.