SparkR collect () and head () error for Spark DataFrame: arguments assume different number of rows
I read the parquet file from HDFS system:
path<-"hdfs://part_2015"
AppDF <- parquetFile(sqlContext, path)
printSchema(AppDF)
root
|-- app: binary (nullable = true)
|-- category: binary (nullable = true)
|-- date: binary (nullable = true)
|-- user: binary (nullable = true)
class(AppDF)
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
collect(AppDF)
.....error:
arguments imply differing number of rows: 46021, 39175, 62744, 27137
head(AppDF)
.....error:
arguments imply differing number of rows: 36, 30, 48
I have read several articles about this issue. But that's none of my business. Actually, I just read the table from the parquet file and head()
or collect()
. My parquet table looks like this:
app category date user
aaa test 20150101 123
aaa test 20150102 345
aaa test 20150103 678
aaaa testA 20150104 123
aaaa testA 20150105 234
aaaa testA 20150106 4345
bbbb testB 20150101 5435
I am using spark-1.4.0-bin-hadoop2.6 And I am running this on a cluster using
./sparkR --master yarn--client
I also tried it in local mode, there is the same problem.
showDF(AppDF)
+-----------+-----------+-----------+-----------+
| app| category| date| user|
+-----------+-----------+-----------+-----------+
|[B@217fa749|[B@43bfbacd|[B@60810b7a|[B@3818a815|
|[B@5ac31778|[B@3e39f5d5|[B@4f3a92dd| [B@e8013ce|
|[B@7a9440d1|[B@1b2b9836|[B@4b160f29|[B@153d7342|
|[B@7559fcf2|[B@66edb00e|[B@7ec19bec|[B@58e3e3f7|
|[B@598b9ab8|[B@5c5ad3f5|[B@4f11a931|[B@107af885|
|[B@7951ec36|[B@716b0b73|[B@2abce531|[B@576b09e2|
|[B@34560144|[B@7a6d3233|[B@16faf110|[B@34e85d39|
| [B@3406452|[B@787a4528|[B@235282e3|[B@7e0f1732|
|[B@10bc1446|[B@2bd7083f|[B@325e7695|[B@57bb4a08|
|[B@48f98037|[B@7450c04e|[B@61817c8a|[B@7c177a08|
|[B@694ce2dd|[B@36c2512d| [B@f5f7d71|[B@46248d99|
|[B@479dee25|[B@517de3de|[B@1ffb2d9e|[B@236ff079|
|[B@52ac196f|[B@20b9f0d0| [B@f70f879|[B@41c8d7da|
|[B@68d34af3| [B@7ddcd49|[B@72d077a7|[B@545fafd4|
|[B@5610b292|[B@623bbb62|[B@3f8b5150|[B@53877bc7|
|[B@63cf70a8|[B@47ed58c9|[B@2f601903|[B@4e0a2c41|
|[B@7ddf876d|[B@5e3445aa|[B@39c9cc37|[B@6f7e4c84|
|[B@4cd1a74b|[B@583e5453|[B@64124267|[B@6ac5ab84|
|[B@577f9ddf|[B@7b55c859|[B@3cd48a51|[B@25c4eb0a|
|[B@2322f0e5|[B@4af55c68|[B@3285d64a|[B@70b7ae2f|
+-----------+-----------+-----------+-----------+
I already tried to read this parquet file in Scala. And do the collect () operation. Everything seems to work well. So this should be a SparkR specific issue
+3
source to share
No one has answered this question yet
See similar questions:
or similar: