Org.apache.nutch.crawl.Crawl missing in nutch 1.9 on hadoop 1.2.1

Question

Org.apache.nutch.crawl.Crawl missing in nutch 1.9 on hadoop 1.2.1

I have installed fully distributed Hadoop 1.2.1. I tried to integrate nutch with the steps below:

Download apache-nutch-1.9-src.zip
Add value http.agent.name to nutch-site.xml
Copy hadoop-env.sh

, core-site.xml

, hdfs-site.xml

, mapred-site.xml

, masters

, slaves

to $ NUTCH_HOME / conf
compile with ant runtime
create urls/seed.txt

and put on hasoop dfs
modify $ NUTCH_HOME / conf / regex-urlfilter.txt

Test bypass using the command:

bin/hadoop -jar nutch-1.9.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5

and get this error:

Exception on thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl at java.net.URLClassLoader $ 1.run (URLClassLoader.java:366) at java.net.URLClassLoader $ 1.run (URLClassLoader.java : 355) at java.security.AccessController.doPrivileged (native method) at java.net.URLClassLoader.findClass (URLClassLoader.java:354) at java.lang.ClassLoader.loadClass (ClassLoader.java:425) at java.lang. ClassLoader.loadClass (ClassLoader.java:358) at java.lang.Class.forName0 (native method) at java.lang.Class.forName (.java class: 270) at org.apache.hadoop.util.RunJar.main ( RunJar.java:153)

I tried extract nutch-1.9.job and I didn't recognize the Crawl class in org / apache / nutch / crawl.

Do I need to configure something?

+3

hadoop nutch

fluke-ng 08 Sep At 14:06

source to share

1 answer

Talat · Answer 1 · 2014-09-15T08:15:14+0000

Crawl.java has been removed in version 1.8. You can use a wrapper script traversal for all scans.

The deprecated oancrawl.Crawler class is still in the codebase https://issues.apache.org/jira/browse/NUTCH-1621

Org.apache.nutch.crawl.Crawl missing in nutch 1.9 on hadoop 1.2.1

More articles: