Spark 1.3.1: Unable to read file from S3 bucket, org / jets3t / service / ServiceException

I'm on an AWS EC2 VM (Ubuntu 14.04) wanting to do some Spark to RDD basics from my S3 files. On successful work of this dirty command (not using sparkContext.hadoopConfiguration

at the moment)

scala> val distFile = sc.textFile("s3n://< AWS_ACCESS_KEY_ID>:<AWS_SECRET_ACCESS_KEY>@bucketname/folder1/folder2/file.csv")

      

Then I get the following error on startup distFile.count()

java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:334)
         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:324)
         at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
...
...
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

      

I previously

  • defined an AWS IAM user with corresponding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • added export

    for both keys as env variables in .bashrc
  • built by Spark 1.3.1 with SPARK_HADOOP_VERSION=2.6.0-cdh5.4.1 sbt/sbt assembly

  • hasoop 2.6-cdh5.4.1 installed and running (pseudo -distributed )

Is it syntax related textFile("s3n// ...")

? I have tried others including s3://

without success ...

thank

+3


source to share


3 answers


Include the Jets3t jar in your classpath. Add the appropriate compatible version with the current settings. You need to add ServiceException to your classpath.



+1


source


In CLASSPATH, you need to enable the haop-mapreduce-client banners. In my case, I made my own distro with these dependencies.

I put the following files in the lib folder:



  • Hadoop-MapReduce-client-jobclient-2.6.0.jar
  • Hadoop-MapReduce-client-hs-plugins-2.6.0.jar
  • Hadoop-MapReduce-client shuffle-2.6.0.jar
  • Hadoop-MapReduce-client-jobclient-2.6.0-tests.jar
  • Hadoop-MapReduce-client-inphase 2.6.0.jar
  • Hadoop-MapReduce-client-application-2.6.0.jar
  • Hadoop-MapReduce-client-hs-2.6.0.jar
  • Hadoop-MapReduce-client-kernel-2.6.0.jar
0


source


I had the same problem. Even though this happened while sparking v2.1.0 with hadoop v2.7.2 environment, however I left it here because it would be the same reason. Here's what I have.

A needed class was not found. This could be due to an error in your runpath. Missing class: org/jets3t/service/ServiceException
java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
at      org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:342)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:332)
at
...
...
 Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

      

This is because the classpath got a lower version of the dependency net.java.dev.jets3t:jets3t

than required org.apache.hadoop:hadoop-aws

.

I solved the problem after adding net.java.dev.jets3t:jets3t:0.9.0

to my build.sbt

0


source







All Articles