Apache Spark DataSet API: head (n: Int) vs take (n: Int)

Question

Apache Spark DataSet API: head (n: Int) vs take (n: Int)

The Apache Spark Dataset API has two methods, i, head(n:Int)

and take(n:Int)

.

Dataset.Scala source contains

def take(n: Int): Array[T] = head(n)

Couldn't find any difference in execution code between the two functions. why does the API have two different methods to get the same result?

+3

apache-spark apache-spark-sql spark-dataframe

Krishna reddy Jul 17 17 at 7:54

source to share

3 answers

Haroun mohammedi · Answer 1 · 2017-07-17T12:22:41+0000

I think this is because the developers of the spark tend to give it a rich API, there are also two methods where

and filter

that do exactly the same thing.

Luis · Answer 2 · 2017-07-17T08:51:47+0000

The reason is because, in my opinion, the Apache Spark Dataset API is trying to mimic the Pandas DataFrame API, which contains head

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html .

Piyush ugale · Answer 3 · 2018-04-26T10:18:10+0000

I experimented and found that head (n) and take (n) give exactly the same replica result. Both are displayed only as a ROW object.

DF.head (2)

[Row (Transaction_date = u'1 / 2/2009 6:17 ', Product = u'Product1', Price = u'1200 ', Payment_Type = u'Mastercard', Name = u'carolina ', City = U'Basildon ', State = u'England', Country = u'United Kingdom '), Row (Transaction_date = u'1 / 2/2009 4:53', Product = u'Product2 ', Price = u'1200', Payment_Type = u'Visa ', Name = u'Betina', City = u'Parkville ', State = u'MO', Country = u'United States')]

DF.take (2)

[Row (Transaction_date = u'1 / 2/2009 6:17 ', Product = u'Product1', Price = u'1200 ', Payment_Type = u'Mastercard', Name = u'carolina ', City = U'Basildon ', State = u'England', Country = u'United Kingdom '), Row (Transaction_date = u'1 / 2/2009 4:53', Product = u'Product2 ', Price = u'1200', Payment_Type = u'Visa ', Name = u'Betina', City = u'Parkville ', State = u'MO', Country = u'United States')]

Apache Spark DataSet API: head (n: Int) vs take (n: Int)

More articles: