While trying to implement reverse image search for my site, I came across the huge world of image search. Below are brief descriptions and use cases for some of the reverse / similar image search approaches.

Water, stones, sky
Dataset used

Perceptual hash

[ Colab ]

Detailed description of how phash works

From the images, we create hashes of a given length. The smaller the Hamming distance between two hashes, the more the similarity of the images.

Our dataset has 2 duplicates of the first image

RGB Histogram


RGB histogram

Linear search.  Compare histograms using the cv2 method.HISTCMP_INTERSECT (53ms)

Bruteforce knn (73ms) and hnsw (0.4ms) produce the same images
. approximate nearest neighbor search. hnswlib, Hierarchical Navigable Small World. 50-70ms, 0.4ms.


[SIFT Colab]




descs /= (descs.sum(axis=1, keepdims=True) + eps)
descs = np.sqrt(descs)

Crop search (30s)


NN features

[Colab ResNet50] [Colab CLIP]

model = ResNet50(weights='imagenet', include_top=False,input_shape=(2242243),pooling='max')

The image by which we are looking
Waterfalls (ResNet50)
Finding a Highly Pixelated Image (ResNet50)
Crop search (ResNet50)

Waterfalls (CLIP)
Search for highly pixelated image (CLIP)
Crop Search (CLIP)

t-SNE ResNet50 (10100x10100 7.91MB)
t-SNE CLIP (10100x10100 7.04MB)

CLIP text search

[Colab CLIP]




text_tokenized = clip.tokenize(["a picture of a windows xp wallpaper"]).to(device)
with torch.no_grad():
        text_features = model.encode_text(text_tokenized)
        text_features /= text_features.norm(dim=-1, keepdim=True)

"a picture of a sunset near the sea"
"a picture of a sunset near the sea"
"a picture of a fog near the mountains"
"a picture of a fog near the mountains"
"a picture of a windows xp wallpaper"
"a picture of a windows xp wallpaper"

