Why was there a convolution layer on the VGG16 64 network? And how was it determined?

Can someone explain to me why the VGG16 network width is 64 in the first convolutional layer? I understand that the layers are sized all over the web, but I'm not sure how 64 was defined in the beginning.

+3


source to share


1 answer


The input to the first convolutional layer in VGG16 is a 224x224x3 image. The output volume of the first convolutional layer is 224x244x64 (x3 for each channel in the input image). A value of 64 is the depth (or channels - in paper they call the width, which IMO confuses) of the new volume as a result of the convolution operation on each of the 64 filters over the input volume (image) - think about each filter stacking a new layer on that volume. The choice of 64 filters in conv1_1 was a design decision, which they do not explain, but has to do with managing the number of trained parameters.

Doubling the number of filters (64, 128, 256 ...) is also a constructive solution. Some say that the rule of thumb is to increase the number of filters by inverting the multiplier in the downsampling of the merge layer. In the VGG16 architecture, they use step 2 in their pool. So they roughly lower the WxH of the input volume by 50% according to this rough equation (width and height are equal):

Width output= (Width input - FilterSize + 2 * Padding) / Stride +1



In VGG16 pool1 output:

Width output= (224 - 3 + 2 * 0) / 2 +1 = 111.5 & cong; 112

Merge the cut samples by 50% (224/2) so let's filter twice in the next convolutional layer (64 * 2)

+1


source







All Articles