LSTM pyramid for predicting sequential data

I wrote simple code using pybrain to predict simple sequential data. For example, a sequence of 0,1,2,3,4 should receive output 5 from the network. The dataset defines the remaining sequence. Below are my code implementations

from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.datasets import SequentialDataSet
from pybrain.structure import SigmoidLayer, LinearLayer
from pybrain.structure import LSTMLayer

import itertools
import numpy as np

INPUTS = 5
OUTPUTS = 1
HIDDEN = 40

net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=LinearLayer, recurrent=True, bias=True) 

ds = SequentialDataSet(INPUTS, OUTPUTS)
ds.addSample([0,1,2,3,4],[5])
ds.addSample([5,6,7,8,9],[10])
ds.addSample([10,11,12,13,14],[15])
ds.addSample([16,17,18,19,20],[21])

net.randomize()

trainer = BackpropTrainer(net, ds)

for _ in range(1000):
    print trainer.train()

x=net.activate([0,1,2,3,4])
print x 

      

The output on my screen keeps showing [0.99999999 0.99999999 0.9999999 0.99999999] every idle time. What am I missing? Isn't learning enough? Because trainer.train ()

shows the result 86.625 ..

+3


source to share


1 answer


The pybrain sigmoid layer implements the sigmoid compression feature, which you can see here:

sigmoid screening function code

The relevant part is this:

def sigmoid(x):
""" Logistic sigmoid function. """
return 1. / (1. + safeExp(-x))

      



So, no matter the value of x, it will ever return values ​​between 0 and 1. For this reason and for others, it is a good idea to scale your input and output values ​​between 0 and 1. For example, divide all your inputs by the maximum value (assuming the minimum is at least 0), and the same for your outputs. Then do the opposite with the result (for example, multiply by 25 if you divided by 25 at the beginning).

Also, I'm not a pyramid expert, but I'm wondering if you need OUTPUTS = 4? It looks like you only have one output in your data, so I'm wondering if you can just use OUTPUTS = 1.

You can also try scaling the inputs and outputs to a specific portion of the sigmoid curve (for example, between 0.1 and 0.9) to make it easier to work with pyramids, but this makes scaling before and after a little more difficult.

+2


source







All Articles