Different learning rates affect the setting of the batNorm. What for?

Question

Different learning rates affect the setting of the batNorm. What for?

I am using BatchNorm layer. I know the sense of an installation use_global_stats

that installs often false

for training and true

for testing / deployment. This is my setup during testing phase.

layer {
  name: "bnorm1"
  type: "BatchNorm"
  bottom: "conv1"
  top: "bnorm1"
  batch_norm_param {
    use_global_stats: true
  }
}
layer {
  name: "scale1"
  type: "Scale"
  bottom: "bnorm1"
  top: "bnorm1"
  bias_term: true
  scale_param {
    filler {
      value: 1
    }    
    bias_filler {
      value: 0.0
    }
  }
}

In solver.prototxt, I used Adam's method. I found an interesting problem that happens in my case. If I choose base_lr: 1e-3

then I got good performance when I installed use_global_stats: false

during the testing phase. However, if I chose base_lr: 1e-4

, then I got good performance when I installed use_global_stats: true

during the testing phase. base_lr

Does he demonstrate what influences the setting of batnorm (even I used Adam's method)? Could you suggest any reasons for this? Thanks everyone

+3

deep-learning machine-learning neural-network caffe

KimHee May 29 '17 at 12:08

source to share

1 answer

Shai · Answer 1 · 2017-05-29T12:53:34+0000

AFAIK learning rate does not directly affect the learned layer parameters "BatchNorm"

. Indeed, the caffe forces lr_mult

for all internal parameters of this layer are equal to zero regardless of base_lr

or the type

solver.
However, you may run into a situation where adjacent layers converge to different points according to the one used base_lr

, and indirectly this leads to the fact that it "BatchNorm"

behaves differently.

Different learning rates affect the setting of the batNorm. What for?

More articles: