Image search

While trying to implement reverse image search for my site, I came across the huge world of image search. Below are brief descriptions and use cases for some of the reverse / similar image search approaches.





Water, stones, sky
Water, stones, sky

Dataset used





Perceptual hash

[ Colab ]





Detailed description of how phash works





From the images, we create hashes of a given length. The smaller the Hamming distance between two hashes, the more the similarity of the images.





- . N ( - threshold').





? ! Vantage-point tree, O(n log n) O(log n).





, , , vantage-point tree , for.   , 100 .   , ... , vptree . ? , vantage point tree PyPI, 1 - vptree. , - . vp-tree javascript . for-loop ,  vptree 10 . - , top N , . , vp-tree , . gist





- vp-tree, . , . / vp-tree c / , /.





Our dataset has 2 duplicates of the first image
2

































  • {transformation_name}





- . - , "" .





https://habr.com/ru/post/205398/

https://habr.com/ru/post/211773/





: phash , preview/thumbnail. .





RGB Histogram

[Colab]





RGB histogram
RGB

RGB , , , .





: . , .





Linear search.  Compare histograms using the cv2 method.HISTCMP_INTERSECT (53ms)
. cv2.HISTCMP_INTERSECT (53ms)

flatten, , 4096 .





k-nearest neighbor search, .





Bruteforce knn (73ms) and hnsw (0.4ms) produce the same images
Bruteforce knn (73ms) hnsw(0.4ms)

. approximate nearest neighbor search. hnswlib, Hierarchical Navigable Small World. 50-70ms, 0.4ms.





.

approximate nearest neighbor search - https://habr.com/ru/company/mailru/blog/338360/









  • ,





  • , phash









  • ( 16 RGB 4096)





  • ,





SIFT/SURF/ORB

[SIFT Colab]





SIFT.





, SURF ORB. SIFT - Root SIFT.





:





descs /= (descs.sum(axis=1, keepdims=True) + eps)
descs = np.sqrt(descs)
      
      



SIFT ~5 .

: SIFT features, Brute-Force Matcher(cv2.BFMatcher), matches.





Crop search (30s)
(30s)

:





  • SIFT , ,





:













  • ( , python)





NN features

[Colab ResNet50] [Colab CLIP]





. . ResNet50 - 2048. "" ResNet50, knn . .





model = ResNet50(weights='imagenet', include_top=False,input_shape=(2242243),pooling='max')
      
      



The image by which we are looking
Waterfalls (ResNet50)
(ResNet50)
Finding a Highly Pixelated Image (ResNet50)
(ResNet50)
Crop search (ResNet50)
(ResNet50)

- CLIP. , encode_image 512.





Waterfalls (CLIP)
(CLIP)
Search for highly pixelated image (CLIP)
(CLIP)
Crop Search (CLIP)
(CLIP)

CLIP c , - , 224 aspect ratio, Center Crop, . .





. t-SNE.





t-SNE ResNet50 (10100x10100 7.91MB)
t-SNE CLIP (10100x10100 7.04MB)

Features CLIP . , CLIP , , .





:









  • approximate nearest neighbor search





:





  • features GPU





CLIP text search

[Colab CLIP]





CLIP:





  • https://habr.com/ru/post/539312/





  • https://habr.com/ru/post/540312/





CLIP , , . knn search features , features . .





text_tokenized = clip.tokenize(["a picture of a windows xp wallpaper"]).to(device)
with torch.no_grad():
        text_features = model.encode_text(text_tokenized)
        text_features /= text_features.norm(dim=-1, keepdim=True)
      
      



"a picture of a sunset near the sea"
"a picture of a sunset near the sea"
"a picture of a fog near the mountains"
"a picture of a fog near the mountains"
"a picture of a windows xp wallpaper"
"a picture of a windows xp wallpaper"

Github with all laptops








All Articles