Your face is not a king! Loss functions for face recognition problem



Still from the film "Ivan Vasilievich Changes His Profession"







? , - . — , . . — , , , .







- .







:







  • (backbone) — , , , — , . InsightFace — Open Source .
  • (embedding) — , . 128 512. , 512. : . , , , — . , ( ) 1, — -1 ( 0).
  • Embedding- — , , ( ) .
  • — , ( ) , — , — . WTX+b , X — ( ), W — , b — . softmax.


, , . — , , . — ( — ), — . .









: (metric learning) .









, . — Triplet Loss:











Loss=i=0N[||fiafip||22[||fiafin||22+α],







:







  • fa — anchor, , ;
  • fp — positive, ;
  • fn — negative, ;
  • α — , .


, , α - .









, ( — ), , , embedding- .







Triplet Loss , “ ”, , , SotA , . , Triplet Loss (fine-tuning) . , .









, , , , , . :

Classification

— , ( 512), — ( , ). — () , . , “ ” . : , , . . , :









: , , , . , . , , Triplet Loss — , , (hard sampling), .







— , , : Softmax Cross Entropy — . Softmax Loss.







Softmax Loss



Softmax, . :







  • W — ()
  • X — embedding-,
  • b — bias,


, ( , embedding- ): z=WTX+b . Softmax( σ ) :











σ(z)j=ezji=0Cezi







Cross Entropy, Softmax Loss :











LSoftmax=1Ni=1NlogeWyiTxi+byij=1CeWjTxi+bj,







yi — . , , 42, 42- .







— , .









: WTX+b — : b=0 , WTX . , . :







  • X F×B , B — ( ), F — ( 512).
  • W F×C , C — .


WT X C×B , .







, :











cos(θ)=dot(u,v)||u||||v||







, : — , . , , ( ||Wi||=1 ), X s (scale), :











WTX=scos(θ)







Scale is our first of two hyperparameters. Fixing it for all vectors leads to the fact that they are now located on the hypersphere. In a 2D version, it looks like this (one color - one class):



Image from the ArcFace article , a toy example for demonstration: each class is highlighted with its own color, each point on the circle is a separately taken image, the middle vector of each class is connected to the center for clarity. Note that the classes are connected "without gaps".






Let's rewrite Softmax Loss (now called Normalized Softmax Loss, N-Softmax) with these observations in mind:





LNSoftmax=1Ni=1Nlogescos(θyi)escos(θyi)+j=1,jyiCescos(θj)







. N-Softmax .







Margin-Based Loss



, , — . softmax loss, . , (decision boundary), ( ). . ? () , . — (decision margin). — margin — , : scale — . 2D ( ArcFace):









— margin, — margin

, .







, margin ( m ) :







  1. . cos(θ) cos(mθ) . : Large-Margin Softmax Loss SphereFace. , , Margin-based loss. :











    LSphereFace=1Ni=1Nlogescos(mθyi)escos(mθyi)+j=1,jyiCescos(θj)







  2. margin . cos(θ) cos(θ)m . : AM-Softmax CosFace , , , . :











    LCosFace=1Ni=1Nlogescos(θyi)mescos(θyi)m+j=1,jyiCescos(θj)







  3. margin : cos(θ) cos(θ+m) . ArcFace. ArcFace :











    LCosFace=1Ni=1Nlogescos(θyi+m)escos(θyi+m)+j=1,jyiCescos(θj)







  4. , ArcFace AirFace. Margin , ArcFace, , ( arccos(cos(θ))=θ ). , , ( — ), , θ , (π2θ)/π , :











    LAirFace=1Ni=1Nloges(π2(θyi+m))/πes(π2(θyi+m))/π+j=1,jyiCes(π2θj)/π









margin — , — ( ArcFace).







Margin & Scale



- , , — scale (s) margin (m), . , AM Softmax s=30 , m=0.35 , ArcFace — s=64 , a m=0.5 , CosFace (, , AM Softmax) s=64 , a m=0.35 . , , “ ”, .







AdaCos, — scale margin . :







  • Margin scale — , .
  • scale margin, margin scale.
  • scale .
  • — scale

    2 20 , , Y — , , X — . , , , :



    scale ( , , margin 0), — margin scale=30. , scale, “” , margin X. , — scale margin, - , ? AdaCos scale ( ). , : s=2ln(C1) , C — . s [10, 25], .



    AdaCos — scale. , scale , . , , .









    margin , , , ? . X , Y — ( N-softmax cos(θ) ):







    : CosFace N-softmax , ArcFace — . SphereFace, πmargin , π3 . ArcFace — target logit, , , . , , ( 5π6 ). , (, ) , :





    # cosine - cos(theta)
    # phi - cos(theta + m)
    # th - cos(math.pi - m)
    # mm - sin(math.pi - m) * m
    if easy_margin:
      phi = torch.where(cosine > 0, phi, cosine)  
    else:
      phi = torch.where(cosine > th, phi, cosine - mm)
          
          







    easy_margin , :









    Easy margin , π2 N-Softmax ( cos(θ) ), not easy margin . , , , π2 , “”, , , .







    , . — " " ArcFace. , ( 4 8 ):







    Loss LFW MegaFace, Rank1 @ 106 MegaFace, Tar @ Far 106
    AM-Softmax/CosFace 99.33 0.9833 0.9841
    ArcFace 99.83 0.9836 0.9848
    SphereFace 99.42 0.9743 0.9766


    ( 3 ), LFW, — - :







    Loss Resnet50-MSC MobileNet-MSC Resnet50-Casia MobileNet-Casia
    AM-Softmax/CosFace 99.3 97.65 99.34 98.46
    ArcFace 99.15 98.43 99.35 99.01
    SphereFace 99.02 96.86 99.1 97.83


    , ArcFace SotA .









    . (margin) , . AM Softmax ( ) ArcFace ( ). , , , AirFace.









    :



    SphereFace https://arxiv.org/abs/1704.08063

    AM Softmax https://arxiv.org/abs/1801.05599

    CosFace https://arxiv.org/abs/1801.09414

    ArcFace https://arxiv.org/abs/1801.07698

    AirFace https://arxiv.org/abs/1907.12256







    :



    Deep Face Recognition: A Survey https://arxiv.org/abs/1804.06655

    A Performance Evaluation of Loss Functions for Deep Face Recognition https://arxiv.org/abs/1901.05903








All Articles