Tensorflow stores the final state of LSTM in dynamic_rnn for prediction

I want to save the final state of my LSTM in such a way that it is included when I restore the model and can use it for prediction. As explained below, Saver is only aware of the final state when I use tf.assign

. However, this throws an error (also explained below).

During training, I always return the final state of the LSTM to the network as described in this post . Here are the important parts of the code:

When plotting a graph:

            self.init_state = tf.placeholder(tf.float32, [
                self.n_layers, 2, self.batch_size, self.n_hidden
            ])

            state_per_layer_list = tf.unstack(self.init_state, axis=0)

            rnn_tuple_state = tuple([
                tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
                                              state_per_layer_list[idx][1])

                for idx in range(self.n_layers)
            ])

            outputs, self.final_state = tf.nn.dynamic_rnn(
                cell, inputs=self.inputs, initial_state=rnn_tuple_state)

      

And during training:

        _current_state = np.zeros((self.n_layers, 2, self.batch_size,
                                   self.n_hidden))

            _train_step, _current_state, _loss, _acc, summary = self.sess.run(
                [
                    self.train_step, self.final_state,
                    self.merged
                ],
                feed_dict={self.inputs: _inputs,
                           self.labels:_labels, 
                           self.init_state: _current_state})

      

When I later restore my model from a checkpoint, the final state is not restored either. As stated in this post , the problem is Saver doesn't know the new state. The post also suggests a solution based on tf.assign

. Unfortunately, I cannot use the suggested

            assign_op = tf.assign(self.init_state, _current_state)
            self.sess.run(assign_op)

      

because self.init is not a variable but a placeholder. I get an error

AttributeError: "Tensor" object has no attribute "assign"

I've been trying to solve this problem for a few hours now, but I can't seem to get it to work.

Any help is appreciated!

EDIT:

I changed self.init_state to

            self.init_state = tf.get_variable('saved_state', shape=
            [self.n_layers, 2, self.batch_size, self.n_hidden])

            state_per_layer_list = tf.unstack(self.init_state, axis=0)

            rnn_tuple_state = tuple([
                tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
                                              state_per_layer_list[idx][1])

                for idx in range(self.n_layers)
            ])

            outputs, self.final_state = tf.nn.dynamic_rnn(
                cell, inputs=self.inputs, initial_state=rnn_tuple_state)

      

And during training, I am not feeding a value for self.init_state:

            _train_step, _current_state, _loss, _acc, summary = self.sess.run(
                [
                    self.train_step, self.final_state,
                    self.merged
                ],
                feed_dict={self.inputs: _inputs,
                           self.labels:_labels})

      

However, I still cannot complete the op assignment. Know what i get

TypeError: Expected float32 passed to 'op' Assign 'parameter value, received (LSTMStateTuple (c = array ([[0.07291573, -0.06366599, -0.23425588, ..., 0.05307654,

+3


source to share


1 answer


To save the final state, you can create a separate TF variable, then before saving the graph, run assign

op to assign your last state to that variable, then save the graph. The only thing you need to keep in mind is to declare this variable before you declare Saver

; otherwise it will not be included in the schedule.

This is discussed in detail here, including working code: TF LSTM: Save State from Training Session for Prediction Sessions Later

*** UPDATE: Answers to the following questions:

It looks like you are using BasicLSTMCell

, with state_is_tuple=True

. In the previous discussion, I mentioned you to use GRUCell

with state_is_tuple=False

. The details between the two are slightly different, but the general approach might be similar, so hopefully this should work for you:

During training, you first load the zeros as initial_state

in dynamic_rnn

and then store re-feeding your own output back as the input as initial_state

. So the LAST output state of our call dynamic_rnn

is what you want to save later. Since this is the result of the call sess.run()

, it is essentially a numpy array (not a tensor, not a placeholder). So the question boils down to "how to store the numpy array as a Tensorflow variable along with the rest of the variables in the graph". This is why you are assigning the final state to a variable whose sole purpose is that.



So the code looks something like this:

    # GRAPH DEFINITIONS:
    state_in = tf.placeholder(tf.float32, [LAYERS, 2, None, CELL_SIZE], name='state_in')
    l = tf.unstack(state_in, axis=0)
    state_tup = tuple(
        [tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
        for idx in range(NLAYERS)])
    #multicell = your BasicLSTMCell / MultiRNN definitions
    output, state_out = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=state_tup)

    savedState = tf.get_variable('savedState', shape=[LAYERS, 2, BATCHSIZE, CELL_SIZE])
    saver = tf.train.Saver(max_to_keep=1)

    in_state = np.zeros((LAYERS, 2, BATCHSIZE, CELL_SIZE))

    # TRAINING LOOP:
    feed_dict = {X: x, Y_: y_, batchsize: BATCHSIZE, state_in:in_state}
    _, out_state = sess.run([training_step, state_out], feed_dict=feed_dict)
    in_state = out_state

    # ONCE TRAINING IS OVER:
    assignOp = tf.assign(savedState, out_state)
    sess.run(assignOp)
    saver.save(sess, pathModel + '/my_model.ckpt')

    # RECOVERING IN A DIFFERENT PROGRAM:

    gInit = tf.global_variables_initializer().run()
    lInit = tf.local_variables_initializer().run()
    new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
    new_saver.restore(sess, pathModel + 'my_model.ckpt')
    # retrieve State and get its LAST batch (latest obervarions)
    savedState = sess.run('savedState:0') # this is FULL state from training
    state = savedState[:,:,-1,:]  # -1 gets only the LAST batch of the state (latest seen observations)
    state = np.reshape(state, [state.shape[0], 2, -1, state.shape[2]]) #[LAYERS, 2, 1 (BATCH), SELL_SIZE]
    #x = .... (YOUR INPUTS)
    feed_dict = {'X:0': x, 'state_in:0':state}
    #PREDICTION LOOP:
    preds, state = sess.run(['preds:0', 'state_out:0'], feed_dict = feed_dict)
    # so now state will be re-fed into feed_dict with the next loop iteration

      

As mentioned, this is a modified approach to what works well for me from GRUCell

where state_is_tuple = False

. I adapted it to try it BasicLSTMCell

with state_is_tuple=True

. It works, but not quite as accurately as the original approach. I don't know yet if it's just because GRU is better for me than LSTM, or for some other reason. See if this works for you ...

Also keep in mind that, as you can see with the recovery and prediction code, your predictions are likely to be based on a different batch size than your training cycle (package 1, I think?). So you need to think about how to handle the restored state - just grab the last batch? Or something different? This code only accepts the last level of saved state (i.e. the most recent observations from training), because that is what was important to me ...

+1


source







All Articles