Keras CTC Loss Input
I'm trying to use CTC for speech recognition using keras and tried the CTC example here . In this example, the input for the CTC level Lambda
is the output of the level y_pred
( y_pred
). Lambda
a layer calls ctc_batch_cost
which internally calls Tensorflow ctc_loss
, but the Tensorflow documentationctc_loss
says the ctc_loss
function ctc_loss
does softmax internally, so you don't need to type softmax first. I think the correct use is to pass the inner
value to the layer Lambda
so you only apply softmax once in the ctc_loss
internal use function . I tried an example and it works. Should I follow the example or Tensorflow documentation?
source to share
The loss used in the code you posted is different from the one you linked. The loss used in the code is here
The keras code does some preprocessing before calling ctc_loss
which makes it appropriate for the required format. In addition to requiring the input not to be softmax-ed, tensor ctc_loss
also expects dims to be NUM_TIME, BATCHSIZE, FEATURES
. ctc_batch_cost
does both of these things on this line.
It does log (), which gets rid of the softmax scaling, and also reshuffles dull colors to get the correct shape. When I say I get rid of the softmax scaling, it obviously does not restore the original tensor, but rather softmax(log(softmax(x))) = softmax(x)
. See below:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
x = [1,2,3]
y = softmax(x)
z = np.log(y) # z =/= x (obviously) BUT
yp = softmax(z) # yp = y #####
source to share