Converting Spark Java to Spark scala
I am trying to convert Java to scala code in Spark but found it very difficult. Can the following Java code be converted to scala? Thank!
JavaPairRDD<String,Tuple2<String,String>> newDataPair = newRecords.mapToPair(new PairFunction<String, String, Tuple2<String, String>>() {
private static final long serialVersionUID = 1L;
@Override
public Tuple2<String, Tuple2<String, String>> call(String t) throws Exception {
MyPerson p = (new Gson()).fromJson(t, MyPerson.class);
String nameAgeKey = p.getName() + "_" + p.getAge() ;
Tuple2<String, String> value = new Tuple2<String, String>(p.getNationality(), t);
Tuple2<String, Tuple2<String, String>> kvp =
new Tuple2<String, Tuple2<String, String>>(nameAgeKey.toLowerCase(), value);
return kvp;
}
});
I tried the following, but I'm sure I missed a lot. And actually it is not clear to me how to make the override function in scala ... Please suggest or share some examples. Thank!
val newDataPair = newRecords.mapToPair(new PairFunction<String, String, Tuple2<String, String>>() {
@Override
public val call(String t) throws Exception {
val p = (new Gson()).fromJson(t, MyPerson.class);
val nameAgeKey = p.getName() + "_" + p.getAge() ;
val value = new Tuple2<String, String>(p.getNationality(), t);
val kvp =
new Tuple2<String, Tuple2<String, String>>(nameAgeKey.toLowerCase(), value);
return kvp;
}
});
source to share
Literal translations from Spark-Java to Spark-Scala usually don't work because Spark-Java introduces a lot of artifacts to deal with the constrained type system in Java. Examples in this case: mapToPair
in Java it's just map
in Scala. Tuple2
has a more concise syntax(a,b)
Applying this (and a few more) to a snippet:
val newDataPair = newRecords.map{t =>
val p = (new Gson()).fromJson(t, classOf[MyPerson])
val nameAgeKey = p.getName + "_" + p.getAge
val value = (p.getNationality(), t)
(nameAgeKey.toLowerCase(), value)
}
It can be made a little more concise, but I wanted to keep the same structure as the Java counterpart to make it easier to understand.
source to share