Can't get parquet tools working from command line
I am trying to run the newest version of parquet tools but I am having some problems. org.apache.hadoop.conf.Configuration
Not in the shaded jar for some reason . (I have the same problem with v1.6.0 as well).
Is there something outside mvn package
or mvn install
that I should be doing? (The actual call mvn
I'm using is this mvn install -DskipTests -pl \!parquet-thrift,\!parquet-cascading,\!parquet-pig-bundle,\!parquet-pig,\!parquet-scrooge,\!parquet-hive,\!parquet-protobuf
). This works really well and the tests pass if I want to run them.
The error I am getting below (you can see that I tried to insert a chaos jar from an old version of parquet that seemed to link it to the classpath, I get the same results with or without it).
> java -classpath /path/to/hadoop-core-1.1.0.jar -jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at parquet.tools.command.ShowMetaCommand.execute(ShowMetaCommand.java:59)
at parquet.tools.Main.main(Main.java:222)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
org/apache/hadoop/conf/Configuration
source to share
This set of steps from the parquet-mr question list fixed the same problem for me:
mvn install
cd parquet-tools
mvn clean package -Plocal
mvn install
mvn dependency:copy-dependencies
# replace 1.8.2 in the next step with the version you're using
cp target/parquet-tools-1.8.2-SNAPSHOT.jar target/dependency/
mkdir -p ~/local/bin/lib
cp target/dependency/* ~/local/bin/lib/
cp src/main/scripts/* ~/local/bin/
echo export PATH=$PATH:~/local/bin >> .profile
source to share
I ran into a similar problem and fixed it by specifying a "local" profile:
mvn clean package -Plocal
I originally skipped this paragraph, but explained that if you want to mix in Hadoop dependencies, this "local" profile does it, not by default when you expect to use it somewhere Hadoop is already installed and present in your classpath:
https://github.com/Parquet/parquet-mr/tree/master/parquet-tools
source to share