EnableHiveSupport throws error in java spark code
I have a very simple application that tries to read the orc file from / src / main / resources using spark. I keep getting this error:
Unable to instantiate Hive-enabled SparkSession because no Hive classes were found.
I tried to add dependency
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
as recommended here: Unable to instantiate Hive enabled SparkSession because no Hive classes were found
however, no matter what I added, I still get this error.
I am running this on my local Windows machine through the NetBeans IDE.
my code:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.*;
public class Main {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.enableHiveSupport()
.appName("Java Spark SQL basic example")
.getOrCreate();
Dataset<Row> df = spark.read().orc("/src/main/resources/testdir");
spark.close();
}
}
source to share
If you are working in IDE
, I recommend using .master("local")
in your facility SparkSession
.
The next important point is that the spark hive version must match the spark core and intrinsic safety versions . for safety you can define the dependency as
<properties>
<spark.version>2.0.0</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
source to share