Fitting the beehive tables into a spark
I have a Hive 0.13 installation and have custom databases created. I have sparks 1.1.0 single node cluster built using mvn -hive option. I want to access tables in this database in spark app using hivecontext. But hivecontext always reads the local metastar created in the spark directory. I copied the hive-site.xml in the spark / conf directory.
Do I need to do any other configuration?
source to share
Step 1: Install SPARK with the latest version ....
$ cd $SPARK_Home; ./sbt/sbt -Phive assembly
$ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly
By running this you will download multiple jar files and mistakenly add no need to add ....
Step 2:
Copy hive-site.xml
from your Hive cluster to yours $SPARK_HOME/conf/dir
and edit the XML file and add these properties to this file which is listed below:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore/description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>XXXXXXXX</value>
<description>Username to use against metastore database/description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>XXXXXXXX</value>
<description>Password to use against metastore database/description>
</property>
Step 3: Download the MYSQL JDBC connector and add it to the SPARK CLASSPATH. Run this command bin / compute-classpath.sh
and add the following line for the following script.
CLASSPATH="$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar
How to get data from HIVE to SPARK ....
Step 1:
Start all demons with the following command ...
start-all.sh
Step 2:
Start the thrift server 2 with the following command ....
hive --service hiveserver2 &
Step 3:
Start spark server with the following command ....
start-spark.sh
Finally, check if they are running or not by running the following command ....
RunJar
ResourceManager
Master
NameNode
SecondaryNameNode
Worker
Jps
JobHistoryServer
DataNode
NodeManager
Step 4:
Start the wizard with the following command ....
./sbin/start-master.sh
To stop the wizard use the command below.
./sbin/stop-master.sh
Step 5:
Open a new terminal ....
Run beeline in the following path ....
hadoop@localhost:/usr/local/hadoop/hive/bin$ beeline
After it asks for input ... Pass the input which is listed below ....
!connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver
After that, install SPARK with the following commands ....
Note: set these configurations in the conf file, so you don't need to always run ...
set spark.master=spark://localhost:7077;
set hive.execution.engines=spark;
set spark.executor.memory=2g; // set the memory depends on your server
set spark.serializer=org.apache.spark.serializer.kryoSerializer;
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
After the request request .... Submit the request you want to receive ... and open a browser and check the url with the following command localhost: 8080 You can see the Running Jobs and Completed Jobs in the url ....
source to share