Data types, data forms, and pad_sequences

Question

Data types, data forms, and pad_sequences

I cannot understand the error message I am getting in this code. Part c is x_train

taken from a working example showing how to use LSTM in Keras.

Part c mytrain

is just an example I've played with to understand the various functions.

As you can see from the posts, x_train

both mytrain

are of the same type and shape.

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
import numpy as np

max_features = 80
maxlen = 5

# from the example
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print('x_train type: ', type(x_train))
print('x_train shape:', x_train.shape)
sequence.pad_sequences(x_train, maxlen=maxlen)

# my test code
mytrain = np.ones_like(x_train)
print('mytrain type:', type(mytrain))
print('mytrain shape:', mytrain.shape)
mytrain2 = sequence.pad_sequences(mytrain, maxlen=maxlen)

Output:

D:\python\python.exe D:/workspace/YYYY/test/test_sequences.py
Using TensorFlow backend.
x_train type:  <class 'numpy.ndarray'>
x_train shape: (25000,)
Traceback (most recent call last):
  File "D:/workspace/YYYY/test/test_sequences.py", line 22, in <module>
    mytrain2 = sequence.pad_sequences(mytrain, maxlen=10)
  File "D:\python\lib\site-packages\keras\preprocessing\sequence.py", line 42, in pad_sequences
    'Found non-iterable: ' + str(x))
mytrain type: <class 'numpy.ndarray'>
ValueError: `sequences` must be a list of iterables. Found non-iterable: 1
mytrain shape: (25000,)

It works if I use for example mytrain = np.asarray([[1, 2, 3]])

(list of iterations), but I can't figure out what is different between x_train

and mytrain

in the previous code.

+3

python keras

Antonio Sesto May 29 '17 at 8:44

source to share

1 answer

michetonu · Accepted Answer · 2017-05-29T10:01:34+0000

Problem:

When you type x_train

, you get:

[ [1, 14, 22, 16, 43, 2, 2, 2, 2, 65, 2, 2, 66, 2, 4, 2, 36, 2, 5, 25, 2, 43, 2, 2, 50, 2, 2, 9, 35, 2, 2, 5, 2, 4, 2, 2, 2, 2, 2, 2, 39, 4, 2, 2, 2, 17, 2, 38, 13, 2, 4, 2, 50, 16, 6, 2, 2, 19, 14, 22, 4, 2, 2, 2, 4, 22, 71, 2, 12, 16, 43, 2, 38, 76, 15, 13, 2, 4, 22, 17, 2, 17, 12, 16, 2, 18, 2, 5, 62, 2, 12, 8, 2, 8, 2, 5, 4, 2, 2, 16, 2, 66, 2, 33, 4, 2, 12, 16, 38, 2, 5, 25, 2, 51, 36, 2, 48, 25, 2, 33, 6, 22, 12, 2, 28, 77, 52, 5, 14, 2, 16, 2, 2, 8, 4, 2, 2, 2, 15, 2, 4, 2, 7, 2, 5, 2, 36, 71, 43, 2, 2, 26, 2, 2, 46, 7, 4, 2, 2, 13, 2, 2, 4, 2, 15, 2, 2, 32, 2, 56, 26, 2, 6, 2, 2, 18, 4, 2, 22, 21, 2, 2, 26, 2, 5, 2, 30, 2, 18, 51, 36, 28, 2, 2, 25, 2, 4, 2, 65, 16, 38, 2, 2, 12, 16, 2, 5, 16, 2, 2, 2, 32, 15, 16, 2, 19, 2, 32]
 ...,
 [1, 17, 6, 2, 2, 7, 4, 2, 22, 45, 2, 8, 2, 14, 2, 4, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 39, 14, 2, 4, 2, 9, 2, 50, 2, 12, 47, 4, 2, 5, 2, 7, 38, 2, 2, 2, 7, 4, 2, 2, 9, 24, 6, 78, 2, 17, 2, 2, 21, 27, 2, 2, 5, 2, 2, 2, 2, 4, 2, 7, 4, 2, 42, 2, 2, 35, 2, 2, 29, 2, 27, 2, 8, 2, 12, 2, 21, 2, 2, 9, 6, 66, 78, 2, 4, 2, 2, 5, 2, 2, 2, 2, 6, 2, 8, 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 2, 21, 60, 27, 2, 9, 43, 2, 2, 2, 10, 10, 12, 2, 40, 4, 2, 20, 12, 16, 5, 2, 2, 72, 7, 51, 6, 2, 22, 4, 2, 2, 9]]

Where each item is a list. While mytrain

:

[1 1 1 ..., 1 1 1]

This is just a list of integers.

DECISION:

This should give you what you need:

mytrain = []
for i in range(0,x_train.shape[0]):
    mytrain.append(np.ones(len(x_train[i])))
mytrain = np.asarray(mytrain)

Really:

('x_train type: ', <type 'numpy.ndarray'>)
('x_train shape:', (25000,))
('mytrain type:', <type 'numpy.ndarray'>)
('mytrain shape:', (25000,))

Data types, data forms, and pad_sequences

More articles: