When to stop studying at the cafe?

I am using bvlc_reference_caffenet for training. I do both teaching and testing. Below is an example of my learning network log:

I0430 11:49:08.408740 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:21.221074 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:34.038710 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:46.816813 23343 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:49:56.630870 23334 solver.cpp:397]     Test net output #0: accuracy = 0.932502
I0430 11:49:56.630940 23334 solver.cpp:397]     Test net output #1: loss = 0.388662 (* 1 = 0.388662 loss)
I0430 11:49:57.218236 23334 solver.cpp:218] Iteration 71000 (0.319361 iter/s, 62.625s/20 iters), loss = 0.00146191
I0430 11:49:57.218300 23334 solver.cpp:237]     Train net output #0: loss = 0.00146191 (* 1 = 0.00146191 loss)
I0430 11:49:57.218308 23334 sgd_solver.cpp:105] Iteration 71000, lr = 0.001
I0430 11:50:09.168726 23334 solver.cpp:218] Iteration 71020 (1.67357 iter/s, 11.9505s/20 iters), loss = 0.000806865
I0430 11:50:09.168778 23334 solver.cpp:237]     Train net output #0: loss = 0.000806868 (* 1 = 0.000806868 loss)
I0430 11:50:09.168787 23334 sgd_solver.cpp:105] Iteration 71020, lr = 0.001
I0430 11:50:21.127496 23334 solver.cpp:218] Iteration 71040 (1.67241 iter/s, 11.9588s/20 iters), loss = 0.000182312
I0430 11:50:21.127539 23334 solver.cpp:237]     Train net output #0: loss = 0.000182314 (* 1 = 0.000182314 loss)
I0430 11:50:21.127562 23334 sgd_solver.cpp:105] Iteration 71040, lr = 0.001
I0430 11:50:33.248086 23334 solver.cpp:218] Iteration 71060 (1.65009 iter/s, 12.1206s/20 iters), loss = 0.000428604
I0430 11:50:33.248260 23334 solver.cpp:237]     Train net output #0: loss = 0.000428607 (* 1 = 0.000428607 loss)
I0430 11:50:33.248272 23334 sgd_solver.cpp:105] Iteration 71060, lr = 0.001
I0430 11:50:45.518955 23334 solver.cpp:218] Iteration 71080 (1.62989 iter/s, 12.2707s/20 iters), loss = 0.00108446
I0430 11:50:45.519006 23334 solver.cpp:237]     Train net output #0: loss = 0.00108447 (* 1 = 0.00108447 loss)
I0430 11:50:45.519011 23334 sgd_solver.cpp:105] Iteration 71080, lr = 0.001
I0430 11:50:51.287315 23341 data_layer.cpp:73] Restarting data prefetching from start.
I0430 11:50:57.851781 23334 solver.cpp:218] Iteration 71100 (1.62169 iter/s, 12.3328s/20 iters), loss = 0.00150949
I0430 11:50:57.851828 23334 solver.cpp:237]     Train net output #0: loss = 0.0015095 (* 1 = 0.0015095 loss)
I0430 11:50:57.851837 23334 sgd_solver.cpp:105] Iteration 71100, lr = 0.001
I0430 11:51:09.912184 23334 solver.cpp:218] Iteration 71120 (1.65832 iter/s, 12.0604s/20 iters), loss = 0.00239335
I0430 11:51:09.912330 23334 solver.cpp:237]     Train net output #0: loss = 0.00239335 (* 1 = 0.00239335 loss)
I0430 11:51:09.912340 23334 sgd_solver.cpp:105] Iteration 71120, lr = 0.001
I0430 11:51:21.968586 23334 solver.cpp:218] Iteration 71140 (1.65888 iter/s, 12.0563s/20 iters), loss = 0.00161807
I0430 11:51:21.968646 23334 solver.cpp:237]     Train net output #0: loss = 0.00161808 (* 1 = 0.00161808 loss)
I0430 11:51:21.968654 23334 sgd_solver.cpp:105] Iteration 71140, lr = 0.001

      

The loss confuses me. I was going to stop training my network when the loss goes below 0.0001, but there are two losses: loss of training and loss of test. The training loss appears to be around 0.0001, but the test loss is 0.388, which is above the threshold. Which one do I use to stop learning?

+2


source to share


2 answers


Having such a large gap between the test results and the train could mean that you overload your data.
The purpose of the checkout kit is to make sure you haven't refitted. You should use the performance in the validation set to decide whether to stop training or precede.



+1


source


In general, you want to stop learning when your validation accuracy hits a plateau. Your data above shows that you have actually retrained your model.

Ideally, the error of training, testing and validation should be approximately the same. In practice, this rarely happens.



Note that loss is not a good metric, unless the function of loss and weight is the same for all steps of the assessment. For example, GoogleNet brings the training loss function to three levels, but the validation only cares about the final accuracy.

0


source







All Articles