reverse image search via perceptual hashing #74

jeremybmerrill · 2018-06-19T20:01:29Z

it would be really neat to be able to search for ads that use an image (e.g. a news photo or a meme) OR group ads that use the same image.

The way to do this would likely be to -- at some point -- record each image's perceptual hash, then at search time, compare the given image to all images in the DB, returning those that are similar. dhash and phash are both perceptual hashing algorithms, but there may be other good choices.

For instance, hopefully, we could determine which ads use the Distracted Boyfriend meme like this one .

The text was updated successfully, but these errors were encountered:

jeremybmerrill · 2018-06-19T20:01:56Z

any interest, @imalsogreg ?

jeremybmerrill · 2018-06-19T21:22:01Z

Since my understanding is that these perceptual hash algorithms are primarily architected in Python, this could either happen as part of the classifier classify step in Python in this repo, or in a similar, but new, separate script that's also run on a frequent cron.

imalsogreg · 2018-06-20T16:41:33Z

That sounds like a lot of fun :) Let me see if I can get a proof of concept up.

jeremybmerrill · 2018-06-20T19:01:19Z

Awesome, holler if you have Qs or if there's anything I can help with.

yinleon · 2018-08-01T02:34:10Z

Hi, I just stumbled on this.

I have experience making reverse image search engines using features extracted from pre-trained neural networks, and then calculating distance using KNN. There's some more info (code, presentation, video) in this repo:
https://github.com/yinleon/pydata2017

Let me know if this method sounds interesting for this project
(sorry for the shameless self-promotion).

jeremybmerrill · 2018-08-01T03:37:34Z

Hi @yinleon, that sounds amazing. In my response to your other comment, I posted a link to download the entire dataset of US political ads.

This sounds potentially really promising. Are the extracted features interpretable (e.g. "this one has a picture of Nancy Pelosi")? Can you describe how well it works identifying nearly-identical images versus more heavily modified or cropped images? What about images with text overlays, like memes?

I'm not super familiar with Keras. Would it be possible for me or a colleague to test your implementation on an ordinary MacBook Pro? Or is it easier to set it up on a GPU-enabled AWS instance?

I really appreciate your taking the time to share about your research. You're of course welcome to download and check out our data and I'd love if we could find a way to work together. Happy to talk more about my qualitative observations of the ad images, if that'd be helpful.

yinleon · 2018-08-01T15:14:30Z

@jeremybmerrill I sent you an email to talk in depth about this.

imalsogreg · 2018-09-18T14:01:33Z

This sounds awesome!

Pre-trained ResNet probably has lots of features that are relevant for general image search (common objects, body parts), and lacks some that would specifically help categorize political ads (Nancy Pelosi neurons, e.g.). I wonder how hard it would be to retrain ResNet with lots of extra labeled examples, with those labels drawn from politically relevant topics. Your google image scraper could be useful there?

I'd always assumed image classifiers would be terrible for image search, but from your video, apparently it can work great, when you don't throw away all features but the strongest one :P Cool result with KNN search.

jeremybmerrill added the help wanted label Jun 26, 2018

imalsogreg mentioned this issue Jul 8, 2018

[WIP] Add preliminary phash ETL #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reverse image search via perceptual hashing #74

reverse image search via perceptual hashing #74

jeremybmerrill commented Jun 19, 2018

jeremybmerrill commented Jun 19, 2018

jeremybmerrill commented Jun 19, 2018

imalsogreg commented Jun 20, 2018

jeremybmerrill commented Jun 20, 2018

yinleon commented Aug 1, 2018

jeremybmerrill commented Aug 1, 2018

yinleon commented Aug 1, 2018

imalsogreg commented Sep 18, 2018

reverse image search via perceptual hashing #74

reverse image search via perceptual hashing #74

Comments

jeremybmerrill commented Jun 19, 2018

jeremybmerrill commented Jun 19, 2018

jeremybmerrill commented Jun 19, 2018

imalsogreg commented Jun 20, 2018

jeremybmerrill commented Jun 20, 2018

yinleon commented Aug 1, 2018

jeremybmerrill commented Aug 1, 2018

yinleon commented Aug 1, 2018

imalsogreg commented Sep 18, 2018