Spark & โ€‹โ€‹HCatalog?

I feel comfortable downloading HCatalog with Pig and was wondering if Spark could be used instead of Pig. Unfortunately I'm pretty new to Spark ...
Can you provide any input on how to get started? Are there Spark libraries to use? Any examples? I've done all the exercises at http://spark.apache.org/ but they focus on RDD and don't go any further.

Any help would be grateful ...
Regards, Pawel

+3


source to share


3 answers


You can refer to the following link for using the HCLog InputFormat wrapper with Spark; which was written before SparkSQL.
https://gist.github.com/granturing/7201912



+1


source


You can use spark SQL to read from Hive table instead of HCatalog.

https://spark.apache.org/sql/



You can apply the same transformations as Pig using Spark Java / Scala / Python language like filter, union, group by.

+1


source


Our systems have loaded both. Spark uses traits of the language you are using, Scala, Python ...,. For example, using Spark with Python, you can use many of the Python libraries in Spark.

0


source







All Articles