Can I use CNN layer normalization?
I see that level normalization is a modern normalization technique, not standard batch normalization, and it is very easy to code in Tensorflow. But I think layer normalization is for RNN and batch normalization for CNN. Can I use CNN level normalization that handles the image classification task? What are the criteria for choosing batch or layer normalization?
source to share
You can use Layer normalisation
CNN, but I don't think it's more "modern" than Batch Norm
. They both normalize in different ways. Layer norm
normalizes all activations of the same level from the batch, collecting statistics from each unit within the layer, and Batch Norm
normalizes the entire batch for each individual activation, where statistics are collected for each individual item across the entire batch.
Batch Norm
generally preferable Layer norm
because it tries to normalize each activation to a unit Gaussian distribution, but Layer norm
tries to "average" all activations to a unit Gaussian distribution . But if the batch size is too small to collect reasonable statistics, it is preferable Layer norm
.
source to share
I would also like to add, as stated in the original paper for Layer Norm, page 10, section 6.7 , Layer Norm is not recommended to be used and the authors will tell "more research needed" for CNN
Also, Heads-Up is for RNN, Layer Rate seems to be a better choice than Batch Norm, because training cases can be of different lengths in the same mini camera.
source to share