Keras + Tensorflow: Debug NaNs

Here's a big question on how to find the first occurrence of Nan in a tensorflow graph:

Debugging nans in reverse pass

The answer is quite helpful, here is the code from it:

train_op = ...
check_op = tf.add_check_numerics_ops()

sess = tf.Session()
sess.run([train_op, check_op])  # Runs training and checks for NaNs

      

Apparently running training and numerical validation at the same time will result in an error message as soon as Nan is encountered for the first time.

How do I integrate this into Keras? In the documentation, I cannot find anything similar to this.

I checked the code too. The upgrade step is done here: https://github.com/fchollet/keras/blob/master/keras/engine/training.py

There is a function named _make_train_function

where the operation of calculating losses and applying updates is created. This is later called for network training.

I could change a code like this (always assuming we are working on a tf server):

check_op = tf.add_check_numerics_ops()

self.train_function = K.function(inputs, 
    [self.total_loss] + self.metrics_tensors + [check_op],
    updates=updates, name='train_function', **self._function_kwargs)

      

I am currently trying to set this correctly and am not sure if the code actually works. Maybe there is an easier way?

+3


source to share





All Articles