Is there a way for Spark to read AWS S3 files without using Hadoop?

Question

Is there a way for Spark to read AWS S3 files without using Hadoop?

Standalone programs can read / write AWS S3 files without Hadoop using AWS jar files. Spark programs can read / write files without Hadoop. However, Spark requires programs that read / write AWS S3 files in order to use Hadoop. And even though Spark 1.4 and Hadoop 2.6 and 2.7 have runtime errors, skip the Hadoop class for S3 even if the Hadoop directory is set.

Is there a way for Spark programs to read / write S3 files without using Hadoop using AWS jar files?
If not, how do I fix Spark's missing Hadoop class for S3 at runtime?

+3

amazon-s3 hadoop apache-spark

Michael Jul 26 15 at 4:59

source to share

1 answer

Arnon Rotem-Gal-Oz · Answer 1 · 2015-07-26T05:59:20+0000

Spark uses Hadoop classes to read S3, but does not require Hadoop to be installed (we are using pre-built for Hadoop 2.4). Just make sure you use the prefixess3n://

Is there a way for Spark to read AWS S3 files without using Hadoop?

More articles: