How does the basic keras optimizer work?

Here is some of the code get_updates

from SGD

from keras

( source )

moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
    v = self.momentum * m - lr * g  # velocity
    self.updates.append(K.update(m, v))

      

Comment:

Since the variable moments

is a list of tensors of zeros. Each m

in for loop

represents a zero tensor with a shape p

. Then self.momentum * m

, in the first line of the loop, is just scalar multiplication by the zero tensor, which results in the zero tensor.

Question

What am I missing here? Thank!

+3


source to share


1 answer


Yes - during the first iteration of this loop it m

is 0. But then it is updated with the current value v

in this line:

self.updates.append(K.update(m, v))

      

So in the next iteration, you will:



v = self.momentum * old_velocity - lr * g  # velocity

      

where old_velocity

is the previous value v

.

+2


source







All Articles