Improving the markup of multimodal data: fewer assessors, more layers

Hello! We - scientists at ITMO's Machine Learning laboratory and the Core ML team on VKontakte - are doing joint research. One of the important tasks of VK is the automatic classification of posts: it is necessary not only to generate thematic feeds, but also to identify unwanted content. Assessors are involved for such processing of records. At the same time, the cost of their work can be significantly reduced using such a machine learning paradigm as active learning.



It is about its application for the classification of multimodal data that will be discussed in this article. We will tell you about the general principles and methods of active learning, the peculiarities of their application to the task, as well as the insights obtained during the research.



image



Introduction



โ€” machine learning, . , , , .



, (, Amazon Mechanical Turk, .) . โ€” reCAPTCHA, , , , โ€” Google Street View. โ€” .



. , Voyage โ€” , . , , . , .



Amazon DALC (Deep Active Learning from targeted Crowds). , . Monte Carlo Dropout ( ). โ€” noisy annotation. , ยซ , ยป, .



Amazon . : / . , , . , : , . .



โ€” ! , . pool-based sampling.



Figure:  1. General scheme of a pool-based scenario of active learning

. 1. pool-based



. , , ( ). : , .



, โ€” . (. โ€” query). , . ( , ) .



, , .





, โ€” . ( ). โ‰ˆ250 . . () 50 โ€” โ€” :



  1. , (. embedding), ;
  2. .


, (. . 2).



. 2 โ€”

. 2 โ€”





ML โ€” . , .



. , . , , , . , , early stopping. , .



. residual , highway , (. encoder). , (. fusion): , .

โ€” , . -.



, โ€” , . , .



. , (. 3):



. 3.

. 3.



. , . , , . , ( + ) โ€” .



, . 3, :



. 4.

. 4.



, , . , รณ , , .



, : ? :



  1. ;
  2. ;
  3. .


. : maximum likelihood , - . :



L=1ฯƒ12L1+1ฯƒ22L2+1ฯƒ32L3+logโกฯƒ1+logโกฯƒ2+logโกฯƒ3



L1,L2,L3 โ€” ( -), ฯƒ1,ฯƒ2,ฯƒ3 โ€” , .



Pool-based sampling



โ€” , . pool-based sampling :



  1. - .
  2. .
  3. , , .
  4. .
  5. ( ).
  6. 3โ€“5 (, ).


, 3โ€“6 โ€” .



, , :



  1. , . , : . , , , . . , 2โ€‰000.



  2. . , . ( ). , , . , . 20 .

    . , . โ€” , . 100 200.





, , , .



โ„–1: batch size



baseline , ( ) (. 5).



. 5.   baseline-  .

. 5. baseline- .



random state. .



. ยซยป , , .



, (. batch size). 512 โ€” - (50). , batch size . . :



  1. upsample, ;
  2. , .


batch size: (1).



current_batch_size=b+โŒŠnmodbโŒŠnbโŒ‹โŒ‹[1]



b โ€” batch size, n โ€” .

โ€œโ€ (. 6).



. 6.     batch size (passive  )   (passive + flexible  )

. 6. batch size (passive ) (passive + flexible )



: c . , , batch size . .



.



Uncertainty



โ€” uncertainty sampling. , , .



:



1. (. Least confident sampling)



, :



xLCโˆ—=argโกmaxx 1โˆ’Pฮธ(y^|x)[2]



y^=argโกmaxy Pฮธ(y|x) โ€” , y โ€” , x โ€” , xLCโˆ— โ€” , .



. , 1โˆ’y^. , . .



. , : {0,5; 0,49; 0,01}, โ€” {0,49; 0,255; 0,255}. , (0,49) , (0,5). , รณ : . , .



2. (. Margin sampling)



, , , :



xMโˆ—=argโกminx Pฮธ(y^1|x)โˆ’Pฮธ(y^2|x)[3]



y^1 โ€” x, y^2 โ€” .



, . , . , , MNIST ( ) โ€” , . .



3. (. Entropy sampling)



:



xHโˆ—=argโกmaxxโˆ’โˆ‘ Pฮธ(yi|x)logโกPฮธ(yi|x)[4]



yi โ€” i- x .



, , . :



  • , , ;
  • , .


, , . , entropy sampling .



(. 7).



. 7.      uncertainty sampling    ( โ€”    ,   โ€”    ,  โ€”    )

. 7. uncertainty sampling ( โ€” , โ€” , โ€” )



, least confident entropy sampling , . margin sampling .



, , : MNIST. , , entropy sampling , . , .



. O(plogโกq), p โ€” , q โ€” . , .



BALD



, , โ€” BALD sampling (Bayesian Active Learning by Disagreement). .



, query-by-committee (QBC). โ€” . uncertainty sampling. , . QBC Monte Carlo Dropout, .



, , โ€” . dropout . dropout , ( ). , dropout- (. 8). Monte Carlo Dropout (MC Dropout) . , . ( dropout) Mutual Information (MI). MI , , โ€” , . .



. 8.  MC Dropout   BALD

. 8. MC Dropout BALD



, QBC MC Dropout uncertainty sampling. , (. 9).



. 9.      uncertainty sampling   QBC       ( -    ,   -    ,  -    )

. 9. uncertainty sampling ( QBC ) ( โ€” , โ€” , โ€” )



BALD. , Mutual Information :



aBALD=H(y1,...,yn)โˆ’E[H(y1,...,yn|ฯ‰)][5]



E[H(y1,...,yn|w)]=1kโˆ‘i=1nโˆ‘j=1kH(yi|wj)[6]



n โ€” , k โ€” .



(5) , โ€” . , , . BALD . 10.



. 10.    BALD

. 10. BALD



, , .

query-by-committee BALD , . , uncertainty sampling. , โ€” O(kplogโก(q)), p โ€” , q โ€” , k โ€” , .



BALD tf.keras, . PyTorch, dropout , batch normalization , .



โ„–2: batch normalization



batch normalization. batch normalization โ€” , . , , , , batch normalization. , . , . BALD. (. 11).



. 11.   batch normalization   BALD

. 11. batch normalization BALD



, , .



batch normalization, . , .



Learning loss



. , . , .



, . โ€” . , . learning loss, . , (. 12).



. 12.   Learning loss

. 12. Learning loss



learning loss . .

. , . ยซยป learning loss: , , . ideal learning loss (. 13).



. 13.   ideal learning loss

. 13. ideal learning loss



, learning loss.

, . , , - , . :



  1. (2000 ), ;
  2. 10000 ( );
  3. ;
  4. ;
  5. 100 ;
  6. , , 1;
  7. .


, , . , ( margin sampling).



1.



p-value
loss -0,2518 0,0115
margin 0,2461 0,0136


, margin sampling โ€” , , , . c .



: ?

, , (. 14).



. 14.    ideal learning loss        ideal learning loss

. 14. ideal learning loss ideal learning loss



, MNIST :



2. MNIST



p-value
loss 0,2140 0,0326
0,2040 0,0418


ideal learning loss , (. 15).



Figure:  15. Actively training the character classifier from the MNIST dataset with the ideal learning loss strategy.  Blue graph - ideal learning loss, orange - passive learning

. 15. MNIST ideal learning loss. โ€” ideal learning loss, โ€”



, , , , . .



learning loss , uncertainty sampling: O(plogโกq), p โ€” , q โ€” . , , . , .





, . . , margin sampling โ€” . 16.

Figure:  16. Comparison of training on randomly selected data (passive training) and on data selected by the margin sampling strategy

. 16. ( ) , margin sampling



: ( โ€” margin sampling), โ€” , , . โ‰ˆ25 . . 25% โ€” .



, . , , .



, , . , :



  • batch size;
  • , , โ€” , batch normalization.



All Articles