Pass / Accept with Spark SQL

How would one implement a skip / take request (typical server side network paging) using Spark SQL. I scour the web and can only find basic examples like this one here: https://databricks-training.s3.amazonaws.com/data-exploration-using-spark-sql.html

I don't see any concept of ROW_NUMBER () or OFFSET / FETCH as with T-SQL. Does anyone know how to do this?

Something like:

scala > csc.sql("select * from users skip 10 limit 10").collect()

      

+3


source to share


2 answers


Try something like this:



val rdd = csc.sql("select * from <keyspace>.<table>")
val rdd2 = rdd.view.zipWithIndex()
rdd2.filter(x => { x._2 > 5 && x._2 < 10;}).collect()
rdd2.filter(x => { x._2 > 9 && x._2 < 12;}).collect()

      

+2


source


I found that both sparksql and dataframe have no offset limit. It can be randomly distributed in the distributed data, so the constraint with an offset has only values ​​in order. we can use a window function to implement it:

1. We believe that we want to get a product, of which the income level is from 2 to 5

2.



windowSpec = Window.partitionBy().orderBy(df.revenue.asc())

      

result = df.select ("product", "Category", "Income", row_number (). above (windowSpec) .alias ("row_number"), DENSE_RANK (). above (windowSpec) .alias ("rank")) result.show () result = result.filter ((col ("rank")> = start) and (col ("rank") <= end)) result.show ()

refer to https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

0


source







All Articles