ClassNotFoundException is encountered when the package-specific libraries are in the "lib" folder in the hadoop job bank

I used to package the dependent libraries in the "lib" folder in the jar with the hasoop abbreviation card. This works great. But this time everything went wrong. Can someone give me some idea to solve the problem? The problem is this:

When I package the jobbox using the Eclipse Export function and the Extract required libraries into generated JAR option. The generated flask works fine.

But if I package the jobbox with ant - script to include dependent libraries in the "lib" folder in the jobbox, I run into a ClassNotFoundException:

java.io.IOException: Split class cascading.tap.hadoop.MultiInputSplit not found
        at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:340)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:365)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.ClassNotFoundException: cascading.tap.hadoop.MultiInputSplit
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
        at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:338)
        ... 7 more

      

Can anyone provide some idea? Thank!

+3


source to share


3 answers


This message will solve your problem.



0


source


The boxes are zip files under the cover. So why don't you

  • rename both jars to zip
  • extract them
  • use the file and folder comparison tools to compare the extracted folders (Beyond Compare, WinMerge, etc.).


The difference may be in the manifest as well.

Once you know what the difference is, it would be easier to tweak the build tools to generate the correct jar files.

0


source


If you are using Cascading, make sure the JAR containing the class you have installed in

_props = new Properties();
AppProps.setApplicationJarClass(_props, MyMain.class);

      

is a JAR that has a "lib" folder with all dependencies.

It sometimes happens that the JAR (allowing the call to MyWorkflow.jar) that contains MyMain.class is in its own jar without the lib folder, and there is another "module" that does ten different things besides calling a cascading workflow. This master module (allows this MasterModule.jar to be called) has the MyWorkflow module defined as a maven dependency. So when you try to run

hadoop jar MasterModule.jar <options> 

      

one would expect all the jars in the MasterModule.jar lib folder to be added to the TaskTracker classpath ... but Cascading finds that MyMain.class belongs to MyWorkflow.jar and doesn't see any lib folder in MyWorkflow.jar, so you start to see the ClassNotFoundException ...

Also note that lib versus lib dependencies are not supported since CDH5. Cloudera Blog

0


source







All Articles