md5 is rather slow for this purpose. It also seems to me that to simply get a checksum over a file, deploying a hash algorithm worthy a component of sophisticated encryption is rather overkill.
You might be interested in zlib.adler32 and zlib.crc32 (a bit slower, but slightly less collisions).
If you use CRC32 then you can also include the contents of zip files by using the CRC value stored in infolist() instead of having to read the file from the zip and computing the CRC.
Comment
md5 is rather slow for this purpose. It also seems to me that to simply get a checksum over a file, deploying a hash algorithm worthy a component of sophisticated encryption is rather overkill.
You might be interested in zlib.adler32 and zlib.crc32 (a bit slower, but slightly less collisions).
Replies
Slow? It takes on this pc about 0.0027 seconds to get the checksum of a 350Kb file.
But, on that note, it takes 0.0009 seconds on average with zdlib.adler32()
I wrote a little benchmark script and got these results:
A 0.00166934132576
B 0.00266071277506
C 0.000866203977351
D 0.00112253580338
where...
def A(payload):
....return hash(payload)
def B(payload):
....return md5.new(payload).digest()
def C(payload):
....return zlib.adler32(payload)
def D(payload):
....return zlib.crc32(payload)
Thanks for the pointers Florian.
If you use CRC32 then you can also include the contents of zip files by using the CRC value stored in infolist() instead of having to read the file from the zip and computing the CRC.