Cryo in Apache Spark
The Spark documentation states that all you have to do is register your class and add two variables to the conf variable:
import com.esotericsoftware.kryo.Kryo
import org.apache.spark.serializer.KryoRegistrator
class MyRegistrator extends KryoRegistrator {
override def registerClasses(kryo: Kryo) {
kryo.register(classOf[MyClass1])
kryo.register(classOf[MyClass2])
}
}
val conf = new SparkConf().setMaster(...).setAppName(...)
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrator", "mypackage.MyRegistrator")
val sc = new SparkContext(conf)
I have implemented this in my code, but when trying to sort the key / value sequence file (text, text) I still get serialization errors. My version of MyRegistrator looks like this:
class MyRegistrator extends KryoRegistrator {
override def registerClasses(kryo: Kryo) {
kryo.register(classOf[Text])
}
}
I also added an entry to MyRegistrator and I don't see any log statements. I also deliberately mistakenly wrote the name MyRegistrator and the job didn't crash. It should be more than what the documentation allows. Anything else I need to do?
I am using Apache Spark 1.0.2.
thank
source to share
I managed to figure out how to solve this problem. I updated Apache Spark version to 1.1.0 and it started working. I haven't changed any code at all, the only thing I changed is my POM. To prove that it worked, I commented out all the Kryo references in my code and repeated. Serialization error failed.
source to share