How image search works in Dropbox

If you need to find a photo taken at a picnic a few years ago, you can hardly remember the name that the camera automatically assigned to the file when it was taken , for example, 2017-07-04 12.37.54.jpg . You look through everything - photographs, their sketches, trying to identify objects or signs of what you are looking for - and it does not matter whether you are looking for a lost photo or want to find a decent snapshot in the archives for presenting a new project.





It would be great if Dropbox could independently view all the images and select the ones that best match the words in the description! This is exactly the task we set ourselves when creating the image search function .





, Dropbox , , , , Dropbox .






Image search results by keyword "picnic"
""

, , , Dropbox.





: , () q j, s , , :





s = f(q, j).





- , , . : .





, . , . 





:





  • , , ;





  • , , ;





  • , , - .





โ€” 2012 . Krizhevsky . ImageNet hallenge. , , , , Open Images ImageNet, , TensorFlow PyTorch, , . , :





Results of applying an image classifier to a typical non-staged photography

, , . , , , , , ? , ?





, , , .





. jc . C- , C โ€” ( ). , , , .





โ€” . โ€” word2vec โ€” Mikolov . 2013 . Word2vec , , , , . d- , d .





, word2vec. , , :





  1. q d- qw, . w, โ€” c.





  2. ciw. mฬ‚i = qw - ciw โ€” i- . -1 1 , . ( mi = max(0, mฬ‚i)), , .





  3. , qc = [m1 m2 ... mC], C- , , โ€” , , .





3 โ€” - , qc = qwC, C โ€” ciw.





qc โ€” s = qcjc.





. . - , s = qcJ, J jc , s โ€” .





. , โ€” : , , .





, . , [0,35โ€“0,62 0,70], .





, .





Projection of the vector of query words onto the category space

  EfficientNet, OpenImages. 8 500 . , . , Dropbox.





TensorFlow   ConceptNet Numberbatch. , , , , . : dog chien , .





AND, . , , beach ball, . , OR , beach ball (beach AND ball) OR (beach ball). , .





, , J . , , , , . ().





J , Dropbox Nautilus.





, Nautilus (forward index), (, ) , (inverted index), (posting list) , . :





Content of the search index for text search

, doc_1 doc_2 , . doc_3 , , . 





, , . doc_1 , doc_2, doc_1 .





. jc . .





Content search index to search images by content

, :





  1. qw C qc, . C โ€” , , .





  2. , qc, . , , .





  3. jc qc, s. , .





- , . 10 000 10 000 , 40 . , 10 000 . , 40 80 . , , !





, , , , mฬ‚i , 5 000 . , 10 .





, , , . s = qcjc, qc โ€” 10 000 , jc โ€” 10 000 , . , s





qc jc , . , 10 qc 50 jc . :





  • 10 000- 50 , 50 . ; 50 ( ) 50 ( ) 300 .





  • 10 000, 50 , 200 . , 500 80.





  • qc 10 , 10 โ€” . , .





, . , , , โ€” .





-. , OCR- , .





?

, , Dropbox. . , , . , , , , , " , " .





, , . , "Machine Learning Deep Learning", NVIDIA.





, :





  • Data Scientist





  • Data Analyst





  • Data Engineering









  • Fullstack- Python





  • Java-





  • QA- JAVA





  • Frontend-









  • C++





  • Unity





  • -





  • iOS-





  • Android-









  • Machine Learning





  • "Machine Learning Deep Learning"





  • " Data Science"





  • " Machine Learning Data Science"





  • "Python -"





  • " "









  • DevOps








All Articles