Fast perceptual image hashing with Perl – Dimitrios Kechagias



Fast perceptual image hashing with Perl – Dimitrios Kechagias

Fast perceptual image hashing with Perl - Dimitrios Kechagias

Talk slides, images & code used: https://github.com/dkechag/ImagePHash-Talk

This talk will take you through a novel Perl implementation of perceptual hashes for “similar image” searches. I’ll start with the basics, if you don’t know what a Discrete Cosine Transform is, you’ll find out with a visual demo – no math required (although math is always encouraged!). After DCTs and p-hashing basics, I will go through the implementation’s new features that make it efficient and easy to integrate with our large MySQL-based image database.

At SpareRoom, the world’s largest roommate finding service, we handle tens of millions of images, mostly of customer rooms/properties. When tasked with adding an internal ability for a “similar image” search, I first looked at existing solutions and found out that some did not work as expected, while others were either slow or required changes to our infrastructure (mainly Perl, with the data – including the image index – on a large MySQL database).
In the end, I went with a solution initially based on Image::Hash, but with p-hashes reimplemented to fix collision issues and add speed (over 10x faster than pHash.org’s equivalent DCT-based hash). Then, I added features such as support for mirroring, as well as “index/reduced” hashes, in order to effectively use MySQL indices (and thus easily fit in our existing setup).
After an intro to the theory behind the DCT and perceptual hashes in general, I will discuss the abilites of this implementation and the experiences from developing and using it. The module will be released to CPAN before my talk, although the XS part responsible for the fast DCT calculation has been available for a while as Math::DCT.

https://tprc2022.sched.com/event/10Fn5