Neural network regression predicts the same value for all test samples

Question

Neural network regression predicts the same value for all test samples

My neural network regression model predicts one value for all test samples . Play with hyper parameters like epochs, batch_size, number of layers, hidden units, learning rate, etc. Only changes the prediction values to a new constant.

For testing purposes, if I test the training data itself, I get near accurate results and an RMSE of ~ 1.

Note. The challenge is to predict the remaining life of the machine from its time series. I used the tsfresh library to generate 1045 functions from raw time series data with only 24 functions.

What should be causing this behavior? How should I visualize the development of a neural network model to make sure everything is going in the right direction?

print "Shape of training_features is", train_X.shape
print "Shape of train_labels is", train_Y.shape
print "Shape of test_features is", test_X.shape
print "shape of test_labels is", test_Y.shape

input_dim = train_X.shape[1]
# Function to create model, required for KerasRegressor
def create_model(h1=50, h2=50, act1='sigmoid', act2='sigmoid', init='he_normal', learn_rate=0.001, momentum=0.1, loss='mean_squared_error'):
    # create model
    model = Sequential()
    model.add(Dense(h1, input_dim=input_dim, init=init, activation=act1))
    model.add(Dense(h2, init=init, activation=act2))
    model.add(Dense(1, init=init))
    # Compile model
    optimizer = SGD(lr=learn_rate, momentum=momentum)
    model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
    return model

''' THE REAL THING '''
# create model
model = KerasRegressor(build_fn=create_model, verbose=0)

# SCORING FUNCTION
grid_scorer = make_scorer(mean_squared_error, greater_is_better=False)
# Grid Search
batch_size = [8]
epochs = [500]
init_mode = ['glorot_uniform']
learn_rate = [0.0001]
momentum = [0.1]

hidden_layer_1 = [75]
activation_1 = ['sigmoid']
hidden_layer_2 = [15]
activation_2 = ['sigmoid']

param_grid = dict(batch_size=batch_size, nb_epoch=epochs, init=init_mode, h1=hidden_layer_1, h2=hidden_layer_2, act1 = activation_1, act2=activation_2, learn_rate=learn_rate, momentum=momentum)

print "\n...BEGIN SEARCH..."
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring=grid_scorer, verbose=1)

print "\nLet fit the training data..."
grid_result = grid.fit(train_X, train_Y)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

predicted = grid.predict(test_X)  
print "\nPrediction array is\n", predicted
rmse = numpy.sqrt(((predicted - test_Y) ** 2).mean(axis=0))
print "Test RMSE is", rmse

Output:

Shape of training_features is (249, 1045)
Shape of train_labels is (249,)
Shape of test_features is (248, 1045)
shape of test_labels is (248,)

...BEGIN SEARCH...

Let fit the training data...
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Best: -891.761863 using {'learn_rate': 0.0001, 'h2': 15, 'act1': 'sigmoid', 'act2': 'sigmoid', 'h1': 75, 'batch_size': 8, 'init': 'glorot_uniform', 'nb_epoch': 500, 'momentum': 0.1}
-891.761863 (347.253351) with: {'learn_rate': 0.0001, 'h2': 15, 'act1': 'sigmoid', 'act2': 'sigmoid', 'h1': 75, 'batch_size': 8, 'init': 'glorot_uniform', 'nb_epoch': 500, 'momentum': 0.1}

Prediction array is
[ 295.72067261  295.72067261  295.72067261  295.72067261  295.72067261
  295.72067261  295.72067261  ...
                              295.72067261  295.72067261  295.72067261
  295.72067261  295.72067261  295.72067261]
Test RMSE is 95.0019297411