Spark 1.3.1: Unable to read file from S3 bucket, org / jets3t / service / ServiceException

Question

Spark 1.3.1: Unable to read file from S3 bucket, org / jets3t / service / ServiceException

I'm on an AWS EC2 VM (Ubuntu 14.04) wanting to do some Spark to RDD basics from my S3 files. On successful work of this dirty command (not using sparkContext.hadoopConfiguration

at the moment)

scala> val distFile = sc.textFile("s3n://< AWS_ACCESS_KEY_ID>:<AWS_SECRET_ACCESS_KEY>@bucketname/folder1/folder2/file.csv")

Then I get the following error on startup distFile.count()

java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:334)
         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:324)
         at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
...
...
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

I previously

defined an AWS IAM user with corresponding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
added export

for both keys as env variables in .bashrc
built by Spark 1.3.1 with SPARK_HADOOP_VERSION=2.6.0-cdh5.4.1 sbt/sbt assembly
hasoop 2.6-cdh5.4.1 installed and running (pseudo -distributed )

Is it syntax related textFile("s3n// ...")

? I have tried others including s3://

without success ...

thank

+3

amazon-s3 amazon-ec2 apache-spark hadoop2

guzu92 May 23 '15 at 12:56

source to share

3 answers

Mohammad adnan · Answer 1 · 2016-07-25T13:09:35+0000

Include the Jets3t jar in your classpath. Add the appropriate compatible version with the current settings. You need to add ServiceException to your classpath.

Robson Ventura Rodrigues · Answer 2 · 2015-06-03T03:00:30+0000

In CLASSPATH, you need to enable the haop-mapreduce-client banners. In my case, I made my own distro with these dependencies.

I put the following files in the lib folder:

Hadoop-MapReduce-client-jobclient-2.6.0.jar
Hadoop-MapReduce-client-hs-plugins-2.6.0.jar
Hadoop-MapReduce-client shuffle-2.6.0.jar
Hadoop-MapReduce-client-jobclient-2.6.0-tests.jar
Hadoop-MapReduce-client-inphase 2.6.0.jar
Hadoop-MapReduce-client-application-2.6.0.jar
Hadoop-MapReduce-client-hs-2.6.0.jar
Hadoop-MapReduce-client-kernel-2.6.0.jar

Jay kim · Answer 3 · 2017-07-10T16:36:08+0000

I had the same problem. Even though this happened while sparking v2.1.0 with hadoop v2.7.2 environment, however I left it here because it would be the same reason. Here's what I have.

A needed class was not found. This could be due to an error in your runpath. Missing class: org/jets3t/service/ServiceException
java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
at      org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:342)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:332)
at
...
...
 Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

This is because the classpath got a lower version of the dependency net.java.dev.jets3t:jets3t

than required org.apache.hadoop:hadoop-aws

.

I solved the problem after adding net.java.dev.jets3t:jets3t:0.9.0

to my build.sbt

Spark 1.3.1: Unable to read file from S3 bucket, org / jets3t / service / ServiceException

More articles: