Pass / Accept with Spark SQL
How would one implement a skip / take request (typical server side network paging) using Spark SQL. I scour the web and can only find basic examples like this one here: https://databricks-training.s3.amazonaws.com/data-exploration-using-spark-sql.html
I don't see any concept of ROW_NUMBER () or OFFSET / FETCH as with T-SQL. Does anyone know how to do this?
Something like:
scala > csc.sql("select * from users skip 10 limit 10").collect()
source to share
I found that both sparksql and dataframe have no offset limit. It can be randomly distributed in the distributed data, so the constraint with an offset has only values ββin order. we can use a window function to implement it:
1. We believe that we want to get a product, of which the income level is from 2 to 5
2.
windowSpec = Window.partitionBy().orderBy(df.revenue.asc())
result = df.select ("product", "Category", "Income", row_number (). above (windowSpec) .alias ("row_number"), DENSE_RANK (). above (windowSpec) .alias ("rank")) result.show () result = result.filter ((col ("rank")> = start) and (col ("rank") <= end)) result.show ()
refer to https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
source to share