Different learning rates affect the setting of the batNorm. What for?
I am using BatchNorm layer. I know the sense of an installation use_global_stats
that installs often false
for training and true
for testing / deployment. This is my setup during testing phase.
layer {
name: "bnorm1"
type: "BatchNorm"
bottom: "conv1"
top: "bnorm1"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "scale1"
type: "Scale"
bottom: "bnorm1"
top: "bnorm1"
bias_term: true
scale_param {
filler {
value: 1
}
bias_filler {
value: 0.0
}
}
}
In solver.prototxt, I used Adam's method. I found an interesting problem that happens in my case. If I choose base_lr: 1e-3
then I got good performance when I installed use_global_stats: false
during the testing phase. However, if I chose base_lr: 1e-4
, then I got good performance when I installed use_global_stats: true
during the testing phase. base_lr
Does he demonstrate what influences the setting of batnorm (even I used Adam's method)? Could you suggest any reasons for this? Thanks everyone
source to share
AFAIK learning rate does not directly affect the learned layer parameters "BatchNorm"
. Indeed, the caffe forces lr_mult
for all internal parameters of this layer are equal to zero regardless of base_lr
or the type
solver.
However, you may run into a situation where adjacent layers converge to different points according to the one used base_lr
, and indirectly this leads to the fact that it "BatchNorm"
behaves differently.
source to share