FaceNet for dummies

The FaceNet algorithm (described in this article ) uses a convolutional neural network to represent an image in 128-dimensional Euclidean space.

While reading the article, I did not understand:

  • How does the loss function affect the convolutional network (in conventional networks, the weights are slightly changed to minimize losses) backpropagation - so what happens in this case?)

enter image description here

  • How are triplets selected?

    2.1. how i know negative image is tricky

    2.2. why am I using loss function to define negative image

    2.3. when I check my images for hardness relative to the anchor - I believe this is before I send the triplet to be processed by the network is correct.

enter image description here

+3


source to share


1 answer


Here are some of the answers that might categorize your doubts:

  • Even here the scales are adjusted to minimize loss, and simply losing tolerance is a little tricky. The loss is in two parts (separated by the + symbol in the equation), first part

    is a picture of a person versus another picture of the same person. second part

    - the image of a person in comparison with the image of another person. We want the loss to first part

    be less than the loss second part

    , and the loss equation essentially reflects this. So here you basically want to tune the scale to be same person error

    less and different person error

    more.

  • The term "loss" includes three images: the screen (anchor) x_a

    and its positive couple: x_p

    and its negative pair: x_n

    . hardest positive

    of x_a

    is the positive image that has the largest error compared to the other positive images. hardest negative

    of x_a

    is the closest image of another person. Thus, you want the most distant positives to be close to each other and push back the closest negatives. This is captured in the loss equation.

  • Facenet

    calculates its anchor during training (online). In each one minibatch

    (which is a set of 40 images) they select hardest negative

    to anchor and instead of selecting an image hardest positive

    they select all pairs anchored in the batch.



If you want to implement face recognition

it is better to consider this paper , which implements centre loss

, which is much easier to train and demonstrate in order to perform better.

+3


source







All Articles