Find out the latest value of the beehive table with pyspark

Question

Find out the latest value of the beehive table with pyspark

I request hive table to find the last value of the unique column: id

. I do as below

frame=sqlContext.sql("select max(id) from database.table")

when i do frame.show()

+------+
|   _c0|
+------+
|276308|
+------+

Now I want to get it like lastval

For this I do

frame1=frame.map(lambda row: [str(c) for c in row]).collect()

lastval =''.join(frame1[0][0])

print lastval

276308

I am getting the expected result, but I am wondering if there is a better way to do this?

+3

python pandas apache-spark pyspark spark-dataframe

user7590556 26 Mar 17 at 2:17

source to share

1 answer

Sergey Bushmanov · Accepted Answer · 2017-03-26T05:40:15+0000

IIUYC.

Prepare some data:

pdf = pd.DataFrame({"id":[1,2,3]})
df = sqlContext.createDataFrame(pdf)
df.registerTempTable("tbl")
sqlContext.sql("select * from tbl").show()
+---+
| id|
+---+
|  1|
|  2|
|  3|
+---+

Select "as is":

sqlContext.sql("select max(id) from tbl").show()
+-------+
|max(id)|
+-------+
|      3|
+-------+

Select "pretty" from the Hive table:

sqlContext.sql("select max(id) as lastVal from tbl").show()
+-------+
|lastVal|
+-------+
|      3|
+-------+

Select "pretty" from Spark df

:

from pyspark.sql import functions as F
df.select(F.max("id").alias("lastVal")).show()
+-------+
|lastVal|
+-------+
|      3|
+-------+

If you want to transfer your data to pure Python for further use or analysis, you can proceed as follows:

lv = sqlContext.sql("select max(id) as lastVal from tbl").collect()
print(lv[0]["lastVal"])
3

lv = df.select(F.max("id").alias("lastVal")).collect()
print(lv[0]["lastVal"])
3

Find out the latest value of the beehive table with pyspark

More articles: