Fitting the beehive tables into a spark

I have a Hive 0.13 installation and have custom databases created. I have sparks 1.1.0 single node cluster built using mvn -hive option. I want to access tables in this database in spark app using hivecontext. But hivecontext always reads the local metastar created in the spark directory. I copied the hive-site.xml in the spark / conf directory.
Do I need to do any other configuration?

+3


source to share


1 answer


Step 1: Install SPARK with the latest version ....

$ cd $SPARK_Home; ./sbt/sbt -Phive assembly
$ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly

      

By running this you will download multiple jar files and mistakenly add no need to add ....

Step 2:
Copy hive-site.xml

from your Hive cluster to yours $SPARK_HOME/conf/dir

and edit the XML file and add these properties to this file which is listed below:

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore/description>
</property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>XXXXXXXX</value>
    <description>Username to use against metastore database/description>
</property> 
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>XXXXXXXX</value>
    <description>Password to use against metastore database/description>
</property>

      

Step 3: Download the MYSQL JDBC connector and add it to the SPARK CLASSPATH. Run this command bin / compute-classpath.sh
  and add the following line for the following script.

CLASSPATH="$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar

      


How to get data from HIVE to SPARK ....

Step 1:
Start all demons with the following command ...

start-all.sh

      

Step 2:
Start the thrift server 2 with the following command ....

hive --service hiveserver2 & 

      

Step 3:
Start spark server with the following command ....



start-spark.sh 

      

Finally, check if they are running or not by running the following command ....

RunJar 
ResourceManager 
Master 
NameNode 
SecondaryNameNode 
Worker 
Jps 
JobHistoryServer 
DataNode 
NodeManager

      

Step 4:
Start the wizard with the following command ....

./sbin/start-master.sh 

      

To stop the wizard use the command below.

./sbin/stop-master.sh

      

Step 5:
Open a new terminal ....
Run beeline in the following path ....

hadoop@localhost:/usr/local/hadoop/hive/bin$ beeline 

      

After it asks for input ... Pass the input which is listed below ....

!connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver 

      

After that, install SPARK with the following commands ....
Note: set these configurations in the conf file, so you don't need to always run ...

set spark.master=spark://localhost:7077; 
set hive.execution.engines=spark; 
set spark.executor.memory=2g; // set the memory depends on your server
set spark.serializer=org.apache.spark.serializer.kryoSerializer; 
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec; 

      

After the request request .... Submit the request you want to receive ... and open a browser and check the url with the following command localhost: 8080 You can see the Running Jobs and Completed Jobs in the url ....

+2


source







All Articles