H2OGeneralizedLinearEstimator () - Forecast error

I am trying to predict test time in Compag Kaggle using H2OGeneralizedLinearEstimator function. The model works fine on line 3 and the numbers are all reasonable. However, when I come to the prediction step, I get an error even though the test data frame matches the train data frame.

Has anyone seen this error before?

 h2o_glm = H2OGeneralizedLinearEstimator()

 h2o_glm.train(training_frame=train_h2o,y='y')

 h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame()

 test_pred = pd.read_csv('test.csv')[['ID']]
 test_pred['y'] = h2o_glm_predictions
 test_pred.to_csv('h2o_glm_predictions.csv',index=False)

      

glm Build Progress Model: | █████████████████████████████████████████ ██████ | one hundred%

glm prediction progress: | (failed)

OSError Traceback (most recent call last) in () 3 h2o_glm.train(training_frame=train_h2o,y='y') 4 ----> 5 h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame() 6 7 test_pred = pd.read_csv('test.csv')[['ID']]

/Applications/anaconda/lib/python3.6/site-packages/h2o/model/model_base.py in predict(self, test_data) 130 j = H2OJob(h2o.api("POST /4/Predictions/models/%s/frames/%s" % (self.model_id, test_data.frame_id)), 131 self._model_json["algo"] + " prediction") --> 132 j.poll() 133 return h2o.get_frame(j.dest_key) 134

/Applications/anaconda/lib/python3.6/site-packages/h2o/job.py in poll(self) 71 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)): 72 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: " ---> 73 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"])) 74 else: 75 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))

      

OSError: Working with key $ 03017f00000132d4ffffffff $ _868312f4c32f683871930a1145c1476a failed with exception: DistributedException from /127.0.0.1:54321: 'null' thrown by java.lang.ArrayIndexOutOfBoundsException "stacktrace" / 544.0 "stacktrace. java.lang.ArrayIndexOutOfBoundsException at water.MRTask.getResult (MRTask.java:478) at water.MRTask.getResult (MRTask.java:486) at water.MRTask.doAll (MRTask.java:390) at water.MRTask.doAll (MRTask.java:396) at hex.glm.GLMModel.predictScoreImpl (GLMModel.java:1215) at hex.Model.score (Model.java:1077) at water.api.ModelMetricsHandler $ 1.compute2 (ModelMetricsHandler.java:351 ) in water. H2O $ H2OCountedCompleter.compute (H2O.java:1349) at jsr166y.CountedCompleter.exec (CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec (ForkJoinTask.java:263) at jsr166y.ForkJoinPool $ WorkQueue.runTask (ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker (ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run (ForkJoinWorker) Reason jsr166y.ForkJoinThread.run (ForkJoin

+3


source to share


1 answer


To summarize the comments above, the current solution is to add a response column (with fake data if it doesn't exist) to the frame test_data

. However, this is a bug that needs to be fixed. JIRA is here .



+2


source







All Articles