How can I split a comma string and get n values ββin Spark Scala dataframe?
How to take only 2 data from a column arraytype
in Spark Scala? I got type data val df = spark.sqlContext.sql("select col1, col2 from test_tbl")
.
I have the following data:
col1 | col2
--- | ---
a | [test1,test2,test3,test4,.....]
b | [a1,a2,a3,a4,a5,.....]
I want to get the following data:
col1| col2
----|----
a | test1,test2
b | a1,a2
When I do df.withColumn("test", col("col2").take(5))
, it doesn't work. It gives this error:
the value is not a member of org.apache.spark.sql.ColumnName
How can I get the data in the specified order?
source to share
Internally withColumn
you can call udf getPartialstring
for which you can use method slice
or take
as below example .
import sqlContext.implicits._
import org.apache.spark.sql.functions._
val getPartialstring = udf((array : Seq[String], fromIndex : Int, toIndex : Int)
=> array.slice(fromIndex ,toIndex ).mkString(","))
your caller will appear as
df.withColumn("test",getPartialstring(col("col2"))
col("col2").take(5)
fails because column has no method take(..)
why your error message says
error: value take is not a member of org.apache.spark.sql.ColumnName
You can use udf approach to solve this problem.
source to share
You can use the Column array function apply
to get each individual element up to a specific index and then build a new array using the function array
:
import spark.implicits._
import org.apache.spark.sql.functions._
// Sample data:
val df = Seq(
("a", Array("a1", "a2", "a3", "a4", "a5", "a6")),
("a", Array("b1", "b2", "b3", "b4", "b5")),
("c", Array("c1", "c2"))
).toDF("col1", "col2")
val n = 4
val result = df.withColumn("col2", array((0 until n).map($"col2"(_)): _*))
result.show(false)
// +----+--------------------+
// |col1|col2 |
// +----+--------------------+
// |a |[a1, a2, a3, a4] |
// |a |[b1, b2, b3, b4] |
// |c |[c1, c2, null, null]|
// +----+--------------------+
Note that this will cause a "pad" to appear with a help null
for records with arrays less than n
.
source to share