Pass / Accept with Spark SQL

Question

Pass / Accept with Spark SQL

How would one implement a skip / take request (typical server side network paging) using Spark SQL. I scour the web and can only find basic examples like this one here: https://databricks-training.s3.amazonaws.com/data-exploration-using-spark-sql.html

I don't see any concept of ROW_NUMBER () or OFFSET / FETCH as with T-SQL. Does anyone know how to do this?

Something like:

scala > csc.sql("select * from users skip 10 limit 10").collect()

+3

sql scala apache-spark apache-spark-sql datastax-enterprise

KingOfHypocrites May 15 '15 at 12:56

source to share

2 answers

I found that both sparksql and dataframe have no offset limit. It can be randomly distributed in the distributed data, so the constraint with an offset has only values in order. we can use a window function to implement it:

1. We believe that we want to get a product, of which the income level is from 2 to 5

2.

windowSpec = Window.partitionBy().orderBy(df.revenue.asc())

result = df.select ("product", "Category", "Income", row_number (). above (windowSpec) .alias ("row_number"), DENSE_RANK (). above (windowSpec) .alias ("rank")) result.show () result = result.filter ((col ("rank")> = start) and (col ("rank") <= end)) result.show ()

refer to https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

0

Cyanny 01 dec. 16 at 6:28 am

source to share

phact · Accepted Answer · 2015-05-16T00:22:40+0000

Try something like this:

val rdd = csc.sql("select * from <keyspace>.<table>")
val rdd2 = rdd.view.zipWithIndex()
rdd2.filter(x => { x._2 > 5 && x._2 < 10;}).collect()
rdd2.filter(x => { x._2 > 9 && x._2 < 12;}).collect()

Pass / Accept with Spark SQL

More articles: