No problem reading tar.gz, but I see a lot of gibberish in the final o / p Used: (on pyspark)
lines=sc.textFile("abc.tar.gz") count = lines.flatMap(lambda x: x.split(' ')).map(lambda x: (x,1)).reduceByKey(add) print count.collect()
My o / p has a lot of x00 \ x00 \ Any?
No one has answered this question yet
Check out similar questions: