Deep Learning: Small Dataset with Keras: Local Minima

For my thesis I am using a 4-tiered network for consecutive translation 150 x Conv (64.5) x GRU (100) x softmax activation in the last step with loss = 'categorical_crossentropy'.

The loss of learning and convergence accuracy is optimally quite fast where as the loss of confidence and accuracy seems stuck in the val_acc range of 97 to 98.2, unable to go further.

Is my model a refit?

Tried dropping 0.2 between layers.

Output after drop-out
    Epoch 85/250
    [==============================] - 3s - loss: 0.0057 - acc: 0.9996 - val_loss: 0.2249 - val_acc: 0.9774
    Epoch 86/250
    [==============================] - 3s - loss: 0.0043 - acc: 0.9987 - val_loss: 0.2063 - val_acc: 0.9774
    Epoch 87/250
    [==============================] - 3s - loss: 0.0039 - acc: 0.9987 - val_loss: 0.2180 - val_acc: 0.9809
    Epoch 88/250
    [==============================] - 3s - loss: 0.0075 - acc: 0.9978 - val_loss: 0.2272 - val_acc: 0.9774
    Epoch 89/250
    [==============================] - 3s - loss: 0.0078 - acc: 0.9974 - val_loss: 0.2265 - val_acc: 0.9774
    Epoch 90/250
    [==============================] - 3s - loss: 0.0027 - acc: 0.9996 - val_loss: 0.2212 - val_acc: 0.9809
    Epoch 91/250
    [==============================] - 3s - loss: 3.2185e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 92/250
    [==============================] - 3s - loss: 0.0020 - acc: 0.9991 - val_loss: 0.2239 - val_acc: 0.9792
    Epoch 93/250
    [==============================] - 3s - loss: 0.0047 - acc: 0.9987 - val_loss: 0.2163 - val_acc: 0.9809
    Epoch 94/250
    [==============================] - 3s - loss: 2.1863e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 95/250
    [==============================] - 3s - loss: 0.0011 - acc: 0.9996 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 96/250
    [==============================] - 3s - loss: 0.0040 - acc: 0.9987 - val_loss: 0.2289 - val_acc: 0.9792
    Epoch 97/250
    [==============================] - 3s - loss: 2.9621e-04 - acc: 1.0000 - val_loss: 0.2360 - val_acc: 0.9792
    Epoch 98/250
    [==============================] - 3s - loss: 4.3776e-04 - acc: 1.0000 - val_loss: 0.2437 - val_acc: 0.9774

      

+3


source to share


2 answers


The case that you presented is really difficult. To answer your question, if there is actually retraining happening in your case, you need to answer two questions:

  • Are the results obtained in the validation set satisfactory? - the main purpose of a set of validations is to provide you with information about what will happen when new data appears. If you are comfortable with the accuracy in the validation set, then you should consider that your model is not overproducing too much.
  • Should I be worried about the extremely high accuracy of your model on the training set? - you can easily notice that your model is almost perfect on the training set. This may mean that he has learned a few patterns by heart. Usually - there is always some noise in your data, and your model's property is perfect for the data - which means that it is probably using some of its ability to learn bias. To check that I usually prefer to check the positive examples with the lowest score or the negative ones with the highest score - since outliers are usually in these two groups (the model fights to push them above / below the 0.5

    threshold).


So - after checking these two problems, you might get an answer if your model add-on. The behavior you presented is really pretty - and what could be the actual reason for multiple patterns in the validation set that are not correctly covered in the workout set. But this is something you should always keep in mind when designing a Machine Learning solution.

+2


source


No, this is not overkill. Overriding occurs only when the learning loss is low and the validation loss is high. It can also be seen as a large difference between training accuracy and confidence (in the case of classification).



+1


source







All Articles