Can't get parquet tools working from command line

I am trying to run the newest version of parquet tools but I am having some problems. org.apache.hadoop.conf.Configuration

Not in the shaded jar for some reason . (I have the same problem with v1.6.0 as well).

Is there something outside mvn package

or mvn install

that I should be doing? (The actual call mvn

I'm using is this mvn install -DskipTests -pl \!parquet-thrift,\!parquet-cascading,\!parquet-pig-bundle,\!parquet-pig,\!parquet-scrooge,\!parquet-hive,\!parquet-protobuf

). This works really well and the tests pass if I want to run them.

The error I am getting below (you can see that I tried to insert a chaos jar from an old version of parquet that seemed to link it to the classpath, I get the same results with or without it).

> java -classpath /path/to/hadoop-core-1.1.0.jar -jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet

java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
    at parquet.tools.command.ShowMetaCommand.execute(ShowMetaCommand.java:59)
    at parquet.tools.Main.main(Main.java:222)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 2 more
org/apache/hadoop/conf/Configuration

      

+3


source to share


5 answers


On macOS using homebrew, this is the easiest way to get started:



$ brew install parquet-tools

      

+9


source


If you have hasoop installed, change the command instead hadoop jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet

.



+2


source


This set of steps from the parquet-mr question list fixed the same problem for me:

mvn install
cd parquet-tools
mvn clean package -Plocal
mvn install
mvn dependency:copy-dependencies
# replace 1.8.2 in the next step with the version you're using
cp target/parquet-tools-1.8.2-SNAPSHOT.jar target/dependency/
mkdir -p ~/local/bin/lib
cp target/dependency/* ~/local/bin/lib/
cp src/main/scripts/* ~/local/bin/
echo export PATH=$PATH:~/local/bin >> .profile

      

+1


source


In the target jar, you can also include hadoop dependencies:

mvn clean package -Plocal -DskipTests -Dhadoop.scope=compile

+1


source


I ran into a similar problem and fixed it by specifying a "local" profile:

mvn clean package -Plocal

      

I originally skipped this paragraph, but explained that if you want to mix in Hadoop dependencies, this "local" profile does it, not by default when you expect to use it somewhere Hadoop is already installed and present in your classpath:

https://github.com/Parquet/parquet-mr/tree/master/parquet-tools

0


source







All Articles