Dropout training

How many layers averaged due to fallout? And which scales should be used during the testing phase? I am really confused about this. As each thinned layer recognizes a different set of weights. So backpropagation is done separately for each of the refined networks? And how exactly are the weights distributed among these refined nets? Because during testing, only one neural network and one set of weights are used. So which set of weights is used?

It is said that for each training program a different refined network is trained. What exactly does a case study mean? Do you mean that each redirect and roll-back leads to a different network of intelligence? Then the next redirect and rollover back train another network with a thinned network? How are weights known?

+3


source to share


1 answer


During training:

In dropouts, you simply force some number (dropout probability) of activations / outputs of this layer to be zero. Usually a boolean mask is created to remove these activations. These masks are used for redistribution. Thus, gradients are applied to weights that are only used in the forward pointer.

During testing:

All weights are used. All neurons are preserved (no dropout), but the activations / outputs of this layer are scaled p (dropout probability) to normalize all output from that layer.



enter image description here

Its only one network as shown in the picture above (used here: https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf )

Problems. I don't understand what you mean due to sparse networks.

Hope this helps.

+2


source







All Articles