Spark: how to perform a prediction using a prepared dataset (MLLIB: SVMWithSGD)
I am new to Spark. I can train DataSet. But you cannot use a prepared dataset for forecasting.
Here is some code to train the data that composes an 1800x4000 matrix.
import org.apache.spark.mllib.classification.SVMWithSGD
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data
val data = sc.textFile("data/mllib/ridge-data/myfile.txt")
val parsedData = data.map { line =>
val parts = line.split(' ')
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}
val firstDataPoint = parsedData.take(1)(0)
// Building the model
val numIterations = 100
val model = SVMWithSGD.train(parsedData, numIterations)
//val model = LinearRegressionWithSGD.train(parsedData,numIterations)
val labelAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count
println("Training Error = " + trainErr)
Now I load the data that will be used to perform the prediction: Data is a vector of 1800 values
val test = sc.textFile("data/mllib/ridge-data/data.txt")
But not sure how to make a forecast using this data. Please, help.
source to share
First load the tagged points from the textbox (remember that you had to save the RDD with saveAsTextFile):
JavaRDD<LabeledPoint> test = MLUtils.loadLabeledPoints(init.context, "hdfs://../test/", 30).toJavaRDD();
JavaRDD<Tuple2<Object, Object>> scoreAndLabels = test.map(
new Function<LabeledPoint, Tuple2<Object, Object>>() {
public Tuple2<Object, Object> call(LabeledPoint p) {
Double score = model.predict(p.features());
return new Tuple2<Object, Object>(score, p.label());
}
}
);
Now collect the estimates and iterate over them:
List<Tuple2<Object, Object>> scores = scoreAndLabels.collect();
for(Tuple2<Object, Object> score : scores){
System.out.println(score._1 + " \t" + score._2);
}
It's in Java, but maybe you can convert it :)
But the prediction values ββdon't make sense: -18.841544889249917 0.0 168.32916035523283 1.0 420.67763915879794 1.0 -974.1942589201286 0.0 71.73602841256813 1.0 233.13636224524993 1.0 -1000.5902168199027 0.0 Does anyone know what they mean?
source to share