Why "error: not found: StructType value" when creating sql schema?

I have CDH5 version 1.0.0 Spark installed on CentOS 6.2 and working without error.

When trying to run some Spark SQL, I am getting an error. I'm starting my spark shell fine ...

spark-shell --master spark://mysparkserver:7077

      

then I run one of the sample Scala scripts from the Programming Guide in the Spark SQL Programming Guide .

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)

scala> val vehicle = sc.textFile("/tmp/scala.csv")

scala> val schemaString = "year manufacturer model class engine cylinders fuel consumption clkm hlkm cmpg hmpg co2lyr co2gkm"

scala> import org.apache.spark.sql._

scala > val schema = 
StructType
( 
schemaString.split(" ").map(fieldName => 
    StructField(fieldName, StringType, true))
)

      

But the import statement didn't seem to work? Since the last line gives an error that

scala> StructType
<console>:14: error: not found: value StructType
              StructType
              ^

      

I know there StructType

is org.apache.spark.sql.api.java.StructType

. And if I replace StructType

in the schema line with a fully qualified name, the error changes.

Has anyone else encountered this error? Is there an extra step I am missing?

+3


source to share


1 answer


Your problem is that you are reading the programming manual for the latest version of Spark and testing it against Spark 1.0.0. Alas, it org.apache.spark.sql.api.java.StructType

was introduced in Spark 1.1.0, just like in the "Programmatically Specifying the Schema" section.

So without upgrading, you cannot do this, unless you can use the methods in section 1.0.0 of the "Running SQL on RDD" manual, which is in 1.1. 0 is called "Outputting a Circuit Using Reflection". (Basically, if you can tolerate a fixed circuit.)



If you look at the various documentation URLs you want to replace latest

with 1.0.0

. When in doubt, I like to cite multiple versions of the API document and search. I notice that, like javadoc, scaladoc is annotated @since

to make this information clearer in the API docs, but it is not used in the Spark API docs.

+3


source







All Articles