Tensorflow: what is the exact formula applied in `tf.nn.sparse_softmax_cross_entropy_with_logits`?
I tried to manually recalculate the outputs of this function, so I created a minimal example:
logits = tf.pack(np.array([[[[0,1,2]]]],dtype=np.float32)) # img of shape (1, 1, 1, 3)
labels = tf.pack(np.array([[[1]]],dtype=np.int32)) # gt of shape (1, 1, 1)
softmaxCrossEntropie = tf.nn.sparse_softmax_cross_entropy_with_logits(logits,labels)
softmaxCrossEntropie.eval() # --> output is [1.41]
Now, according to my own calculations, I only get [1.23] When manually calculating, I just apply softmax
and cross-entropy:
where q(x) = sigma(x_j) or (1-sigma(x_j))
depends on whether j is the correct true earth class or not, and p(x) = labels
which are then unidirectional
I'm not sure where this difference comes from. I can't imagine some epsilon making such a big difference. Does anyone know where I can look for what is the exact formula used by tensorflow? Is the source code for this exact piece available?
I could only find nn_ops.py
, but it only uses another function called gen_nn_ops._sparse_softmax_cross_entropy_with_logits
which I could not find on github ...
source to share
Okay, usually p(x)
in the cross-entropy equation, the true distribution is, and q(x)
is the distribution obtained from softmax. So, if p(x)
is hot (and it is, otherwise the sparse cross-entropy cannot be applied), the cross-entropy is just a negative log for the probability of the true category.
In your example softmax(logits)
, it is a vector with values [0.09003057, 0.24472847, 0.66524096]
, so the loss -log(0.24472847) = 1.4076059
, which is what you got as a result.
source to share