Convolutional neural networks: aren't central neurons over-represented at the output?

[This question has also been asked on Cross Validated]

Briefly question

I am studying Convolutional Neural Networks and I find that these networks do not process every input neuron (pixel / parameter) equivalently. Imagine we have a deep mesh (many layers) that applies convolution on some input image. The neurons in the โ€œmiddleโ€ of the image have many unique pathways for many of the deeper layer neurons, which means that a small change in the middle neurons has a strong effect on output. However, the neurons at the edge of the image have only one way (or, depending on the exact implementation, order 1) of the ways in which their information travels through the graph. They appear to be "underrepresented".

This worries me as this edge neuron discrimination scales exponentially with the depth (number of layers) of the network. Even adding a layer with the maximum pool will not stop the exponential increase, only a full connection brings all neurons on an equal footing. I'm not sure if my reasoning is correct, so my questions are:

  • Is it correct that this effect takes place in deep convolutional networks?
  • Is there any theory about this ever mentioned in the literature?
  • Are there ways to overcome this effect?

Since I'm not sure if this provides enough information, I'll go into a little more detail about the problem, and why I believe it is a concern.

More detailed explanation

Imagine we have a deep neural network that takes an image as input. Suppose we apply a 64x64 pixel convolutional filter on the image, where each time we switch the convolution window by 4 pixels. This means that each neuron at the input sends it an activation in 16x16 = 265 neurons in layer 2. Each of these neurons can send its activation to another 265, so that our topmost neuron is represented in 265 ^ 2 output neurons, etc. This, however, is not true for neurons at the edges: they can only be represented in a small number of convolution windows, which makes them activate (of the order of) only 1 neuron in the next layer. Using tricks like edge mirroring won't help: the second level neurons projected so far are still on the edges, which meansthat minor neurons will be underrepresented (thus limiting the importance of our marginal neurons). As you can see, this discrepancy grows exponentially with the number of layers.

I created an image to visualize the problem, which can be found here (I am not allowed to include images in the post). This network has a convolution window of size 3. The numbers next to the neurons indicate the number of paths down to the deepest neuron. The image resembles Pascal's Triangle .

https://www.dropbox.com/s/7rbwv7z14j4h0jr/deep_conv_problem_stackxchange.png?dl=0

Why is this a problem?

This effect does not seem to be a problem at first glance: in principle, the weights should be automatically adjusted so that the network works. In addition, the edges of the image are not that important in image recognition. This effect may not be noticeable in everyday image recognition tests, but it still concerns me for two reasons: 1) generalization to other applications and 2) problems that arise in the case of very deep networks. 1) There may be other applications, such as speech or sound recognition, where it is not true that the most important neurons are the most important. Convolution is often done in this area, but I couldn't find a single document that mentions the effect I'm experiencing. 2) Very deep networks will notice the exponentially poor discrimination effect of borderline neurons, which meansthat central neurons can be overrepresented by several orders of magnitude (imagine we have 10 layers, so the above example will give 265 ^ 10 ways central neurons can project their information). As the number of layers increases, one of them must fall into the limit where the weights cannot really compensate for this effect. Now imagine that we are immune to all neurons by a small amount. Central neurons will result in a larger change in output by several orders of magnitude compared to marginal neurons. I believe that for general applications and for very deep networks it is necessary to find ways around my problem?so the above example will give 265 ^ 10 ways the central neurons can project their information). As the number of layers increases, one of them must fall into the limit where the weights cannot really compensate for this effect. Now imagine that we are immune to all neurons by a small amount. Central neurons will result in a larger change in output by several orders of magnitude compared to marginal neurons. I believe that for general applications and for very deep networks it is necessary to find ways around my problem?so the above example will give 265 ^ 10 paths central neurons can project their information). As the number of layers increases, one of them must fall into the limit where the weights cannot really compensate for this effect. Now imagine that we are immune to all neurons by a small amount. Central neurons will result in a larger change in output by several orders of magnitude compared to marginal neurons. I believe that for general applications and for very deep networks it is necessary to find ways around my problem?Central neurons will result in a larger change in output by several orders of magnitude compared to marginal neurons. I believe that for general applications and for very deep networks it is necessary to find ways around my problem?Central neurons will result in a larger change in output by several orders of magnitude compared to marginal neurons. I believe that for general applications and for very deep networks it is necessary to find ways around my problem?

+3


source to share


1 answer


I will give your suggestions and below I will write my answers.



  • I am correct that this effect occurs in deep convolution networks.

    • I think you are wrong altogether, but correct according to your example of 64 by 64 convolution filters. As long as you structure the sizes of the convolution level filters, they will never be larger than what you are looking for in your images. In other words - if your images are 200by200 and you are collapsed for 64-bit patches, you say that these 64-bit patches will learn some parts, or it is this particular patch for the image that identifies your category. The idea in the first layer is to learn the edge partial important images, not the whole cat or car.
  • Is there any theory about this ever mentioned in the literature? and are there any ways to overcome this effect?

    • I have never seen it in any paper that I have reviewed so far. And I don't think this will be a problem even for very deep networks.

    • There is no such effect. Let's assume your first layer that learned the 64by64 fixes is in action. If there is a patch in the upper left corner that will be launched (become active), it will be displayed as 1 in the upper left corner of the following layers, hence the information will be distributed over the network.

  • (not cited). Do not think that "a pixel is useful in more neurons when it gets closer to the center." Think of a 64x64 filter with 4 steps:

    • if the pattern your 64x64 filter is looking for is in the top left corner of the image then it will propagate to the next top corner of the top layer, otherwise there will be nothing in the next layer.

    • The idea is to keep the significant parts of the image alive by suppressing the nonessential, dull parts and merging those significant parts in the following layers. In the case of teaching "capital aA", view only the images in the oldest 1980 Fukushima article ( http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf ). Figure 7 and 5. Hence there is no pixel value, what matters is the image patch value, which is the size of your convolution layer.

  • Central neurons will result in a larger change in output by several orders of magnitude compared to marginal neurons. I believe that for general applications and for very deep networks, one should find ways around my problem?

    • Suppose you are looking for a car in the image,

    • And suppose in your 1st example the car is definitely in the 64-bi-left upper left corner of your 200by200 image, in the second example the car is definitely in the lower right-hand side of your 200by200 part. picture

    • In the second layer, all of your pixel values โ€‹โ€‹will be almost 0, except for one in the top-left corner for the first image, and for the second image except for one in the bottom-right corner.

    • Now the center part of the image doesn't mean anything to my forward and back propagation because the values โ€‹โ€‹will already be 0. But the angle values โ€‹โ€‹will never be dropped and will affect my training weights.

0


source







All Articles