Difficulty with the initial preparation of Neural Networks

Question

Difficulty with the initial preparation of Neural Networks

I am studying Artificial Neural Networks (ANN). I am trying to train many different ANNs with a major research focus that is the correlation between structure change and predictive speed.

I noticed that it is quite common for learning algorithms to converge in the first 100 or so iterations to near the initial state due to the training step being too small. I have no clear idea why this will happen. Has anyone faced the same problem? What could be causing this? Is there a better way to overcome the problem than just making the iterative circuits work their way through the beginning where the problem seems to lie?

I am training my networks in Octave using fmincg and fminunc. Backprop. to get the gradient and cost function is the same as logistic regression. There was a problem with the network structure of 10 neurons in the first and 10 neurons in the second hidden layer. The MNIST database is used for both training and test cases.

Addendum: Fminunc doesn't seem to perform well on a 3-layer ANN, but under some random variables, the 2-layer ANN seems to converge without issue. The blend gradient appears to work if forced through the initial phase.

Could the problem be the random initialization of the weights? May have too low volatility [-0.12; 0; 12] causing the problem?

Edit: Created the network structure a little clearer.

+3

algorithm machine-learning neural-network octave gradient-descent

Joonatan samuel Apr 19 15 at 10:51

source to share