What exactly does "tf.contrib.rnn.DropoutWrapper" do in tensorflow? (three questions)
As I know DropoutWrapper is used like this
__init__( cell, input_keep_prob=1.0, output_keep_prob=1.0, state_keep_prob=1.0, variational_recurrent=False, input_size=None, dtype=None, seed=None )
...
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
The only thing I know is that it is used for dropouts during training. Here are my three questions.
-
What are input_keep_prob, output_keep_prob and state_keep_prob respectively? (I assume they determine the probability of dropping out each part of the RNN, but exactly where?)
-
Is the exception in this context applicable to RNN not only in training but also in forecasting? If this is true, is there any way to decide if I am doing or not using an exception when predicting?
- Like API docs on tensorflow webpage if variational_recurrent = True dropout works according to "Y. Gal, Z Ghahramani" method on paper. A theoretically based application of dropout in repetitive neural networks. https://arxiv.org/abs/1512.05287 "I got this article roughly, When I train RNN, I use a" batch "and not separate time series. In this case, the tensorflow will automatically assign different discard mask for different time series in party?
+6
source to share
2 answers
- input_keep_prob is the dropout level (likelihood of inclusion) added when selecting weights for objects. output_keep_prob is the dropout level added for each output of the RNN block. state_keep_prob is a hidden state that is passed to the next level.
- You can initialize each of the above parameters like this:
import tensorflow as tf dropout_placeholder = tf.placeholder_with_default(tf.cast(1.0, tf.float32)) tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicRNNCell(n_hidden_rnn), input_keep_prob = dropout_placeholder, output_keep_prob = dropout_placeholder, state_keep_prob = dropout_placeholder)
The dropout rate will default to 1 during prediction or whatever else we can feed during training.
- Masking is performed for the set weights, not for the sequences included in the package. As far as I know, this is done for the whole party.
0
source to share