Why doesn't spark-shell load the imported RDD class file?
I am using Spark 2.1.1 with Scala 2.11.8.
Internally spark-shell
I am using command :load
to load class with methods with RDD.
When I try to load the class, I get the following compilation error:
error: not found: type RDD
Why? I have an import statement.
This is the code I am working with
source to share
This seems like a feature :load
in spark-shell
. The solution is to move import org.apache.spark.rdd.RDD
(no dot and underline) to your class definition.
This does not apply to the class RDD
, but all imported classes. It won't work if the operator is import
not defined inside the class itself.
With that said, the following won't work due to the import being outside the class.
import org.apache.spark.rdd.RDD
class Hello {
def get(rdd: RDD[String]): RDD[String] = rdd
}
scala> :load hello.scala
Loading hello.scala...
import org.apache.spark.rdd.RDD
<console>:12: error: not found: type RDD
def get(rdd: RDD[String]): RDD[String] = rdd
^
<console>:12: error: not found: type RDD
def get(rdd: RDD[String]): RDD[String] = rdd
^
You can see what's going on under the covers using the flag -v
:load
.
scala> :load -v hello.scala
Loading hello.scala...
scala>
scala> import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD
scala> class Hello {
| def get(rdd: RDD[String]): RDD[String] = rdd
| }
<console>:12: error: not found: type RDD
def get(rdd: RDD[String]): RDD[String] = rdd
^
<console>:12: error: not found: type RDD
def get(rdd: RDD[String]): RDD[String] = rdd
^
This got me guessing what importing inside a class definition might help. And it was! (much to my surprise)
class Hello {
import org.apache.spark.rdd.RDD
def get(rdd: RDD[String]): RDD[String] = rdd
}
scala> :load -v hello.scala
Loading hello.scala...
scala> class Hello {
| import org.apache.spark.rdd.RDD
| def get(rdd: RDD[String]): RDD[String] = rdd
| }
defined class Hello
You can also use the command :paste
to insert a class into spark-shell
. There's so-called raw mode where you can define classes in your own package.
package mypackage
class Hello {
import org.apache.spark.rdd.RDD
def get(rdd: RDD[String]): RDD[String] = rdd
}
scala> :load -v hello.scala
Loading hello.scala...
scala> package mypackage
<console>:1: error: illegal start of definition
package mypackage
^
scala>
scala> class Hello {
| import org.apache.spark.rdd.RDD
| def get(rdd: RDD[String]): RDD[String] = rdd
| }
defined class Hello
scala> :paste -raw
// Entering paste mode (ctrl-D to finish)
package mypackage
class Hello {
import org.apache.spark.rdd.RDD
def get(rdd: RDD[String]): RDD[String] = rdd
}
// Exiting paste mode, now interpreting.
source to share