Cryo in Apache Spark

The Spark documentation states that all you have to do is register your class and add two variables to the conf variable:

import com.esotericsoftware.kryo.Kryo
import org.apache.spark.serializer.KryoRegistrator

class MyRegistrator extends KryoRegistrator {
  override def registerClasses(kryo: Kryo) {
    kryo.register(classOf[MyClass1])
    kryo.register(classOf[MyClass2])
  }
}

val conf = new SparkConf().setMaster(...).setAppName(...)
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrator", "mypackage.MyRegistrator")
val sc = new SparkContext(conf)

      

I have implemented this in my code, but when trying to sort the key / value sequence file (text, text) I still get serialization errors. My version of MyRegistrator looks like this:

class MyRegistrator extends KryoRegistrator {
  override def registerClasses(kryo: Kryo) {
    kryo.register(classOf[Text])
  }
}

      

I also added an entry to MyRegistrator and I don't see any log statements. I also deliberately mistakenly wrote the name MyRegistrator and the job didn't crash. It should be more than what the documentation allows. Anything else I need to do?

I am using Apache Spark 1.0.2.

thank

+3


source to share


1 answer


I managed to figure out how to solve this problem. I updated Apache Spark version to 1.1.0 and it started working. I haven't changed any code at all, the only thing I changed is my POM. To prove that it worked, I commented out all the Kryo references in my code and repeated. Serialization error failed.



+3


source







All Articles