Converting Spark Java to Spark scala

I am trying to convert Java to scala code in Spark but found it very difficult. Can the following Java code be converted to scala? Thank!


JavaPairRDD<String,Tuple2<String,String>> newDataPair = newRecords.mapToPair(new PairFunction<String, String, Tuple2<String, String>>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Tuple2<String, Tuple2<String, String>> call(String t) throws Exception {
                MyPerson p = (new Gson()).fromJson(t, MyPerson.class);

                String nameAgeKey = p.getName() + "_" + p.getAge() ;

                Tuple2<String, String> value = new Tuple2<String, String>(p.getNationality(), t);
                Tuple2<String, Tuple2<String, String>> kvp =
                    new Tuple2<String, Tuple2<String, String>>(nameAgeKey.toLowerCase(), value);
                return kvp;
            }
        });

      


I tried the following, but I'm sure I missed a lot. And actually it is not clear to me how to make the override function in scala ... Please suggest or share some examples. Thank!

val newDataPair = newRecords.mapToPair(new PairFunction<String, String, Tuple2<String, String>>() {

        @Override
        public val call(String t) throws Exception {
            val p = (new Gson()).fromJson(t, MyPerson.class);
            val nameAgeKey = p.getName() + "_" + p.getAge() ;
            val value = new Tuple2<String, String>(p.getNationality(), t);
            val kvp =
                new Tuple2<String, Tuple2<String, String>>(nameAgeKey.toLowerCase(), value);
            return kvp;
        }
    });

      

+3


source to share


1 answer


Literal translations from Spark-Java to Spark-Scala usually don't work because Spark-Java introduces a lot of artifacts to deal with the constrained type system in Java. Examples in this case: mapToPair

in Java it's just map

in Scala. Tuple2

has a more concise syntax(a,b)

Applying this (and a few more) to a snippet:



val newDataPair = newRecords.map{t =>                
    val p = (new Gson()).fromJson(t, classOf[MyPerson])
    val nameAgeKey = p.getName + "_" + p.getAge
    val value = (p.getNationality(), t)
    (nameAgeKey.toLowerCase(), value)
}

      

It can be made a little more concise, but I wanted to keep the same structure as the Java counterpart to make it easier to understand.

+3


source







All Articles