Scala / Spark: how we evaluate a dataframe - with SqlContext and Version 1.5

Question

Scala / Spark: how we evaluate a dataframe - with SqlContext and Version 1.5

I have sample data as shown below:

I / P

accountNumber   assetValue  
A100            1000         
A100            500          
B100            600          
B100            200

o / r

AccountNumber   assetValue  Rank
A100            1000         1
A100            500          2
B100            600          1
B100            200          2

Now my question is how to add this ranking column to a dataframe that is sorted by account number. I don't expect the sheer volume of strings to be so open to an idea if I need to do this outside of the data frame.

I spark version 1.5 and sql context hence cannot use windows function

+3

java scala apache-spark-sql spark-dataframe

user3293666 23 Mar 17 at 3:12 am

source to share

2 answers

Nayan sharma · Answer 1 · 2017-03-23T09:35:22+0000

Raw SQL:

val df = sc.parallelize(Seq(
  ("A100", 1000), ("A100", 500), ("B100", 600), ("B100", 200)
)).toDF("accountNumber", "assetValue")

df.registerTempTable("df")
sqlContext.sql("SELECT accountNumber,assetValue, RANK() OVER (partition by accountNumber ORDER BY assetValue desc) AS rank FROM df").show


+-------------+----------+----+
|accountNumber|assetValue|rank|
+-------------+----------+----+
|         A100|      1000|   1|
|         A100|       500|   2|
|         B100|       600|   1|
|         B100|       200|   2|
+-------------+----------+----+

Psidom · Answer 2 · 2017-03-23T04:04:54+0000

You can use a function row_number

and an expression Window

with which you can specify columns partition

and order

:

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.row_number

val df = Seq(("A100", 1000), ("A100", 500), ("B100", 600), ("B100", 200)).toDF("accountNumber", "assetValue")

df.withColumn("rank", row_number().over(Window.partitionBy($"accountNumber").orderBy($"assetValue".desc))).show

+-------------+----------+----+
|accountNumber|assetValue|rank|
+-------------+----------+----+
|         A100|      1000|   1|
|         A100|       500|   2|
|         B100|       600|   1|
|         B100|       200|   2|
+-------------+----------+----+

Scala / Spark: how we evaluate a dataframe - with SqlContext and Version 1.5

More articles: