Semantic tensorfield segmentation gives zero loss
I am training a model to segment the printed text of a machine from images. Images can contain barcodes and handwritten text. Ground truth images are processed so that 0 represents the machine fingerprint and 1 represents the rest. And I am using a 5-layer CNN with extension which ends up outputting 2 maps.
And my loss is calculated like this:
def loss(logits, labels):
logits = tf.reshape(logits, [-1, 2])
labels = tf.reshape(labels, [-1])
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
And I have some images that only contain handwritten text and their respective ground truths are blank pages that are represented by 1s.
When I train the model, for these images I get a loss of 0 and a training accuracy of 100%. It's right? How can this loss be zero? For other images containing barcodes or machine prints, I get some loss and they converge as expected.
And when I test this model the barcodes are correctly ignored. But it outputs both a printing press and handwriting where I only need to print the machine.
Can anyone guide me to where I am going wrong, please!
UPDATE 1:
I've used a learning rate of 0.01 before and changed it to 0.0001, gave me some loss and seemed to converge, but not very well. But how then will a high level of training give a loss of 0?
When I use the same model in Caffe with a learning rate of 0.01, it gave some loss and it converges well compared to Tensorflow.
source to share