A universal platform for creating marketplaces, suitable for completely different business projects from grocery stores to recruiting services, requires constant improvement, since the increasing requirements for online services today already require the work of artificial intelligence. And the client's satisfaction with search results, and, consequently, the success of the business depends on the correct setup of AI systems.
As an example, we can take the contextual search used in search engines. Every search engine tries to win the competition and improve search quality, but in a rapidly changing world, the meaning of verbal expressions can change significantly. For example, when requesting the word "feed", the search engine previously offered the result "satin ribbon", while now the search results display information about a chain of stores and an information portal. But artificial intelligence comes to the rescue in the issuance of results, which helps to correct the results in record time.
Let's say a piece of text defines the word "ribbon" as part of a pretty package. For simplicity, we classify all meanings of this word without clarification, for example, due to ultra-precise networks - Convolutional Neural Networks (CNN). This CNN architecture uses the so-called ensemble of ultra-precise and recurrent networks and at the output gives a relevant definition of the word "tape" based on the semantic content of the text. The input data is a matrix with a fixed height n. Moreover, each row is a vector mapping of the identifier, that is, words in the feature space of dimension k... To create a feature space, it is convenient to use the distribution semantics tools FastText, Glove, Word2mVec. The matrix is processed by filters with a fixed width, which are equal to the dimension of the feature space. To select their sizes, the parameter of the height of adjacent rows h is selected . Accordingly, the size of the output matrix depends on the height of the filter and the original matrix. After this stage, the feature map is processed by the subsampling layer, reducing the dimension of the generated feature map, and is allocated for each convolution, and dominant information is extracted. Then feature maps are combined into a feature vector, which is included in the calculation of the final class labels.
, , , . , . CBOW, , . : softmax – , , . , k . (2k+1). .
skip-gram, , , .
, . , , – , , , – . : -, -. – , .
, – , , , 5 .
-
. , , , , .
, 45 , , , . , , . «», , , , , .
. , . , . . , . «» . . Negative Sampling. «» :
, , :
.
– .
. .
A balanced combination of ranking algorithms will improve the quality of the entire system. But we must not forget about the exceptions, because Google's ranking system has noted that their search engine is not yet ready to finally entrust ranking to machine learning algorithms. Automatically generated models can behave unpredictably on new classes of queries that are not like queries from the training set compared to models created by human experts.