Why is it difficult to search for very short documents using regular full-text search and what to do if you want to do it.

Introduction
We are all constantly faced with the so-called full-text search - finding documents by a search phrase. The most famous example is Google search.
. , , Elasticsearch. .
DD Planet B2B- Elasticsearch. ( ), .
, Elasticsearch, — , , . .
:
T0=" »",
T1=" ",
T2=" ",
:
"": {0, 1}
"": {0}
"": {1, 2}
"": {2}
— , . , . , , « ». «» {2}, «» — {0}. , . , {0, 2} c ½. , , TF-IDF, .
, , , -, :
- .
: « » « » « » , , « » « », « ». , .
. : , . , , TF-IDF, . - .
— , , « 4», «4», « », « 4» . .
— Elasticsearch . , , .
- .
, . , « » « Windows» «» .
NLP
NLP . NLP (Natural Language Processing) — , .
NLP - , - . , .
«»
NLP — Paraphrase Identification — (, ) , ( ). : « 17:00» « ». ? , .
. . DeepPavlov.ai [1], , . , .
. ( ), . .. -.
, DeepPavlov, — , .
,
, . ? , , Elasticsearch . .
: , . .
, : — , , :
- , .
- — , , .
? (Nearest neighbor search) — . vantage-point tree, [2]. , , , , Kd-. , .
Vantage-point tree
, vantage-point tree [3]. ball-tree, . . , . (vantage-point) ( ).

, ( vantage-point), . — . . , S , . , .
, K ( ). , (, ). — . , .

, «» . . «» ? , ( X ), . , «» .

K , , «» . .
vantage-point tree :
— , . , , . cosine Doc2Vec — .
ε — .
. ? , , , float32. - . , , .
. . ,
x=" ", y=" ", z=" "
, - . , Doc2Vec — .
, , — , , . , : [2]. — , .
. ( ). , , ( ). , . «» .

( ), . .

. ? , . : , ? vantage-point tree , — vantage-point.

, [2], . , . .
« ». , . , .
. , . GitHub pip install nlp-text-search
.
[1] http://docs.deeppavlov.ai/en/master/.
[2] Yianilos (1993). Data structures and algorithms for nearest neighbor search in general metric spaces. Fourth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 311–321. pny93. http://web.cs.iastate.edu/~honavar/nndatastructures.pdf .
[3] http://stevehanov.ca/blog/?id=130