How to enter new predictive text in keras when using inline dataset

I am looking at examples in keras and I gave an example of using LSTM to classify sentiments against the built-in imdb dataset ( https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py ).

When validating the data, each survey is depicted as an array of numbers, which I consider them to be an index from a dictionary built using that dataset.

My question, however, is how can I enter a new piece of text (something I am doing) into this model to get a prediction? How do I access this vocabulary of words?

After that, I could preprocess the input text into an array of numbers and feed it. Thank!

+3


source to share


2 answers


The dataset also provides the word index used to encode the sequences:

word_index = reuters.get_word_index(path="reuters_word_index.pkl")



It also returns a dictionary where the key is words (str) and the values ​​are indices (integer). eg. word_index["giraffe"]

may return 1234.

+2


source


When predicting new text, you must follow the same step you took for training.

  • Pre-process this new proposal.
  • Convert text to vector using word_index
  • Place the vector with the same length as during training.
  • Flatten the array and pass it as input to your model.


sentences = clean_text(text)

word_index = imdb.get_word_index()

x_test = [[self.word_index[w] for w in sentences if w in self.word_index]]

x_test = pad_sequences(x_test, maxlen=maxlen) # Should be same which you used for training data

vector = np.array([x_test.flatten()])

model.predict_classes(vector)    

      

+4


source







All Articles