Skip to content

Image retrieval practices

Johnson Liang edited this page Aug 4, 2022 · 3 revisions

Media manager must support storing media files, as well as searching for matching stored files when a similar or duplicate media file (the query file) is provided.

In this article, we focus on the techniques to search for images efficiently, involving how to convert the image into appropriate format to make the easily searchable (usually as vectors), how to perform search using the format, how to derive simialrity between the query image and the matching images, etc. In computer science, this is called image retrieval (IR).

The big picture

Matsui et al, 2020 has provided a clear overview on modern image retireval preatices: 圖片 Source: https://www.youtube.com/watch?v=SKrHs03i08Q&list=PLKQB14e0EJUWaTnwgQogJ3nSLzEFNn9d8&t=13m00s

k-nearest neighbor

Given a dataset (in this case, it is a set of images to search from), finding the most similar images

NN and ANN

First of all, the scale of the dataset (number of images in the database to search for) is discussed. In a billion-scale database,

Hamming based

https://g0v.hackmd.io/xsDcMPySQM69vA0xHO8_dA#%E5%9C%96%E7%89%87-Hash-%E6%95%88%E6%9E%9C%E8%88%87%E6%90%9C%E5%B0%8B

Clone this wiki locally