Spark 1.3.1: Unable to read file from S3 bucket, org / jets3t / service / ServiceException
I'm on an AWS EC2 VM (Ubuntu 14.04) wanting to do some Spark to RDD basics from my S3 files. On successful work of this dirty command (not using sparkContext.hadoopConfiguration
at the moment)
scala> val distFile = sc.textFile("s3n://< AWS_ACCESS_KEY_ID>:<AWS_SECRET_ACCESS_KEY>@bucketname/folder1/folder2/file.csv")
Then I get the following error on startup distFile.count()
java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:334)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:324)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
...
...
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
I previously
- defined an AWS IAM user with corresponding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
- added
export
for both keys as env variables in .bashrc - built by Spark 1.3.1 with
SPARK_HADOOP_VERSION=2.6.0-cdh5.4.1 sbt/sbt assembly
- hasoop 2.6-cdh5.4.1 installed and running (pseudo -distributed )
Is it syntax related textFile("s3n// ...")
? I have tried others including s3://
without success ...
thank
source to share
In CLASSPATH, you need to enable the haop-mapreduce-client banners. In my case, I made my own distro with these dependencies.
I put the following files in the lib folder:
- Hadoop-MapReduce-client-jobclient-2.6.0.jar
- Hadoop-MapReduce-client-hs-plugins-2.6.0.jar
- Hadoop-MapReduce-client shuffle-2.6.0.jar
- Hadoop-MapReduce-client-jobclient-2.6.0-tests.jar
- Hadoop-MapReduce-client-inphase 2.6.0.jar
- Hadoop-MapReduce-client-application-2.6.0.jar
- Hadoop-MapReduce-client-hs-2.6.0.jar
- Hadoop-MapReduce-client-kernel-2.6.0.jar
source to share
I had the same problem. Even though this happened while sparking v2.1.0 with hadoop v2.7.2 environment, however I left it here because it would be the same reason. Here's what I have.
A needed class was not found. This could be due to an error in your runpath. Missing class: org/jets3t/service/ServiceException
java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:342)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:332)
at
...
...
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
This is because the classpath got a lower version of the dependency net.java.dev.jets3t:jets3t
than required org.apache.hadoop:hadoop-aws
.
I solved the problem after adding net.java.dev.jets3t:jets3t:0.9.0
to my build.sbt
source to share