Deep Learning: Small Dataset with Keras: Local Minima
For my thesis I am using a 4-tiered network for consecutive translation 150 x Conv (64.5) x GRU (100) x softmax activation in the last step with loss = 'categorical_crossentropy'.
The loss of learning and convergence accuracy is optimally quite fast where as the loss of confidence and accuracy seems stuck in the val_acc range of 97 to 98.2, unable to go further.
Is my model a refit?
Tried dropping 0.2 between layers.
Output after drop-out
Epoch 85/250
[==============================] - 3s - loss: 0.0057 - acc: 0.9996 - val_loss: 0.2249 - val_acc: 0.9774
Epoch 86/250
[==============================] - 3s - loss: 0.0043 - acc: 0.9987 - val_loss: 0.2063 - val_acc: 0.9774
Epoch 87/250
[==============================] - 3s - loss: 0.0039 - acc: 0.9987 - val_loss: 0.2180 - val_acc: 0.9809
Epoch 88/250
[==============================] - 3s - loss: 0.0075 - acc: 0.9978 - val_loss: 0.2272 - val_acc: 0.9774
Epoch 89/250
[==============================] - 3s - loss: 0.0078 - acc: 0.9974 - val_loss: 0.2265 - val_acc: 0.9774
Epoch 90/250
[==============================] - 3s - loss: 0.0027 - acc: 0.9996 - val_loss: 0.2212 - val_acc: 0.9809
Epoch 91/250
[==============================] - 3s - loss: 3.2185e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
Epoch 92/250
[==============================] - 3s - loss: 0.0020 - acc: 0.9991 - val_loss: 0.2239 - val_acc: 0.9792
Epoch 93/250
[==============================] - 3s - loss: 0.0047 - acc: 0.9987 - val_loss: 0.2163 - val_acc: 0.9809
Epoch 94/250
[==============================] - 3s - loss: 2.1863e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
Epoch 95/250
[==============================] - 3s - loss: 0.0011 - acc: 0.9996 - val_loss: 0.2190 - val_acc: 0.9809
Epoch 96/250
[==============================] - 3s - loss: 0.0040 - acc: 0.9987 - val_loss: 0.2289 - val_acc: 0.9792
Epoch 97/250
[==============================] - 3s - loss: 2.9621e-04 - acc: 1.0000 - val_loss: 0.2360 - val_acc: 0.9792
Epoch 98/250
[==============================] - 3s - loss: 4.3776e-04 - acc: 1.0000 - val_loss: 0.2437 - val_acc: 0.9774
source to share
The case that you presented is really difficult. To answer your question, if there is actually retraining happening in your case, you need to answer two questions:
- Are the results obtained in the validation set satisfactory? - the main purpose of a set of validations is to provide you with information about what will happen when new data appears. If you are comfortable with the accuracy in the validation set, then you should consider that your model is not overproducing too much.
- Should I be worried about the extremely high accuracy of your model on the training set? - you can easily notice that your model is almost perfect on the training set. This may mean that he has learned a few patterns by heart. Usually - there is always some noise in your data, and your model's property is perfect for the data - which means that it is probably using some of its ability to learn bias. To check that I usually prefer to check the positive examples with the lowest score or the negative ones with the highest score - since outliers are usually in these two groups (the model fights to push them above / below the
0.5
threshold).
So - after checking these two problems, you might get an answer if your model add-on. The behavior you presented is really pretty - and what could be the actual reason for multiple patterns in the validation set that are not correctly covered in the workout set. But this is something you should always keep in mind when designing a Machine Learning solution.
source to share