Is there a way for Spark to read AWS S3 files without using Hadoop?

Standalone programs can read / write AWS S3 files without Hadoop using AWS jar files. Spark programs can read / write files without Hadoop. However, Spark requires programs that read / write AWS S3 files in order to use Hadoop. And even though Spark 1.4 and Hadoop 2.6 and 2.7 have runtime errors, skip the Hadoop class for S3 even if the Hadoop directory is set.

  • Is there a way for Spark programs to read / write S3 files without using Hadoop using AWS jar files?

  • If not, how do I fix Spark's missing Hadoop class for S3 at runtime?

+3


source to share


1 answer


Spark uses Hadoop classes to read S3, but does not require Hadoop to be installed (we are using pre-built for Hadoop 2.4). Just make sure you use the prefixess3n://



+4


source







All Articles