Fitting the beehive tables into a spark

I have a Hive 0.13 installation and have custom databases created. I have sparks 1.1.0 single node cluster built using mvn -hive option. I want to access tables in this database in spark app using hivecontext. But hivecontext always reads the local metastar created in the spark directory. I copied the hive-site.xml in the spark / conf directory.
Do I need to do any other configuration?


source to share

1 answer

Step 1: Install SPARK with the latest version ....

$ cd $SPARK_Home; ./sbt/sbt -Phive assembly
$ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly


By running this you will download multiple jar files and mistakenly add no need to add ....

Step 2:
Copy hive-site.xml

from your Hive cluster to yours $SPARK_HOME/conf/dir

and edit the XML file and add these properties to this file which is listed below:

    <description>JDBC connect string for a JDBC metastore</description>
    <description>Driver class name for a JDBC metastore/description>
    <description>Username to use against metastore database/description>
    <description>Password to use against metastore database/description>


Step 3: Download the MYSQL JDBC connector and add it to the SPARK CLASSPATH. Run this command bin /
  and add the following line for the following script.



How to get data from HIVE to SPARK ....

Step 1:
Start all demons with the following command ...


Step 2:
Start the thrift server 2 with the following command ....

hive --service hiveserver2 & 


Step 3:
Start spark server with the following command .... 


Finally, check if they are running or not by running the following command ....



Step 4:
Start the wizard with the following command ....



To stop the wizard use the command below.



Step 5:
Open a new terminal ....
Run beeline in the following path ....

hadoop@localhost:/usr/local/hadoop/hive/bin$ beeline 


After it asks for input ... Pass the input which is listed below ....

!connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver 


After that, install SPARK with the following commands ....
Note: set these configurations in the conf file, so you don't need to always run ...

set spark.master=spark://localhost:7077; 
set hive.execution.engines=spark; 
set spark.executor.memory=2g; // set the memory depends on your server
set spark.serializer=org.apache.spark.serializer.kryoSerializer; 


After the request request .... Submit the request you want to receive ... and open a browser and check the url with the following command localhost: 8080 You can see the Running Jobs and Completed Jobs in the url ....



All Articles