Add Number of Days in Date Column Column in Same Frame for Spark Scala App

I have a dataframe

df columns

("id", "current_date", "days")

and I am trying to add " days

" to " current_date

" and create a new one dataframe

with a new column

called " new_date

" using scala spark functiondate_add()

val newDF = df.withColumn("new_Date", date_add(df("current_date"), df("days").cast("Int")))

      

But it looks like the function date_add

only takes values Int

, not columns

. How do you get the desired result in this case? Are there any alternative functions that I can use to get the desired result?

spark version: 1.6.0 scala version: 2.10.6

+3


source to share


2 answers


A small custom udf can be used to make this date arithmetic possible.

import org.apache.spark.sql.functions.udf
import java.util.concurrent.TimeUnit
import java.util.Date
import java.text.SimpleDateFormat    

val date_add = udf((x: String, y: Int) => {
    val sdf = new SimpleDateFormat("yyyy-MM-dd")
    val result = new Date(sdf.parse(x).getTime() + TimeUnit.DAYS.toMillis(y))
  sdf.format(result)
} )

      



Using

scala> val df = Seq((1, "2017-01-01", 10), (2, "2017-01-01", 20)).toDF("id", "current_date", "days")
df: org.apache.spark.sql.DataFrame = [id: int, current_date: string, days: int]

scala> df.withColumn("new_Date", date_add($"current_date", $"days")).show()
+---+------------+----+----------+
| id|current_date|days|  new_Date|
+---+------------+----+----------+
|  1|  2017-01-01|  10|2017-01-11|
|  2|  2017-01-01|  20|2017-01-21|
+---+------------+----+----------+

      

+3


source


No need to use UDF, you can do it with SQL statement:



val newDF = df.withColumn("new_date", expr("date_add(current_date,days)"))

      

+5


source







All Articles