Keras CTC Loss Input

I'm trying to use CTC for speech recognition using keras and tried the CTC example here . In this example, the input for the CTC level Lambda

is the output of the level y_pred

( y_pred

). Lambda

a layer calls ctc_batch_cost

which internally calls Tensorflow ctc_loss

, but the Tensorflow documentationctc_loss

says the ctc_loss

function ctc_loss

does softmax internally, so you don't need to type softmax first. I think the correct use is to pass the inner

value to the layer Lambda

so you only apply softmax once in the ctc_loss

internal use function . I tried an example and it works. Should I follow the example or Tensorflow documentation?

+7


source to share


1 answer


The loss used in the code you posted is different from the one you linked. The loss used in the code is here

The keras code does some preprocessing before calling ctc_loss

which makes it appropriate for the required format. In addition to requiring the input not to be softmax-ed, tensor ctc_loss

also expects dims to be NUM_TIME, BATCHSIZE, FEATURES

. ctc_batch_cost

does both of these things on this line.



It does log (), which gets rid of the softmax scaling, and also reshuffles dull colors to get the correct shape. When I say I get rid of the softmax scaling, it obviously does not restore the original tensor, but rather softmax(log(softmax(x))) = softmax(x)

. See below:

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()


x = [1,2,3]
y = softmax(x)
z = np.log(y) # z =/= x (obviously) BUT
yp = softmax(z) # yp = y #####

      

+6


source







All Articles