How does the basic keras optimizer work?
Here is some of the code get_updates
from SGD
from keras
( source )
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
Comment:
Since the variable moments
is a list of tensors of zeros. Each m
in for loop
represents a zero tensor with a shape p
. Then self.momentum * m
, in the first line of the loop, is just scalar multiplication by the zero tensor, which results in the zero tensor.
Question
What am I missing here? Thank!
source to share
Yes - during the first iteration of this loop it m
is 0. But then it is updated with the current value v
in this line:
self.updates.append(K.update(m, v))
So in the next iteration, you will:
v = self.momentum * old_velocity - lr * g # velocity
where old_velocity
is the previous value v
.
source to share