Find out the latest value of the beehive table with pyspark
I request hive table to find the last value of the unique column: id
. I do as below
frame=sqlContext.sql("select max(id) from database.table")
when i do frame.show()
+------+
| _c0|
+------+
|276308|
+------+
Now I want to get it like lastval
For this I do
frame1=frame.map(lambda row: [str(c) for c in row]).collect()
lastval =''.join(frame1[0][0])
print lastval
276308
I am getting the expected result, but I am wondering if there is a better way to do this?
+3
user7590556
source
to share
1 answer
IIUYC.
Prepare some data:
pdf = pd.DataFrame({"id":[1,2,3]})
df = sqlContext.createDataFrame(pdf)
df.registerTempTable("tbl")
sqlContext.sql("select * from tbl").show()
+---+
| id|
+---+
| 1|
| 2|
| 3|
+---+
Select "as is":
sqlContext.sql("select max(id) from tbl").show()
+-------+
|max(id)|
+-------+
| 3|
+-------+
Select "pretty" from the Hive table:
sqlContext.sql("select max(id) as lastVal from tbl").show()
+-------+
|lastVal|
+-------+
| 3|
+-------+
Select "pretty" from Spark df
:
from pyspark.sql import functions as F
df.select(F.max("id").alias("lastVal")).show()
+-------+
|lastVal|
+-------+
| 3|
+-------+
If you want to transfer your data to pure Python for further use or analysis, you can proceed as follows:
lv = sqlContext.sql("select max(id) as lastVal from tbl").collect()
print(lv[0]["lastVal"])
3
lv = df.select(F.max("id").alias("lastVal")).collect()
print(lv[0]["lastVal"])
3
+1
source to share