How To Handle Too Much Redirection With HtmlUnit
I am trying to parse the site, but I ran into an exception Too much redirect
. Here is my code:
WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
HtmlPage homePage = null;
String url = "http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio";
try {
client.getOptions().setUseInsecureSSL(true);
client.setAjaxController(new NicelyResynchronizingAjaxController());
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
client.getOptions().setThrowExceptionOnScriptError(false);
client.waitForBackgroundJavaScript(30000);
client.waitForBackgroundJavaScriptStartingBefore(30000);
client.getOptions().setCssEnabled(false);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setRedirectEnabled(true);
homePage = client.getPage(url);
synchronized (homePage) {
homePage.wait(25000);
}
System.out.println(homePage.asXml());
} catch (Exception e) {
e.printStackTrace();
}
Exceptions are listed below
com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: Too much redirect for http://www.freelake.org/resolver/2345183424.20480.0000/route.00/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1353)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1371)
Is there a way to solve this problem?
+3
source to share
3 answers
The page http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio sends 2 redirects:
- http://www.freelake.org/GroupHome.page then
- http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio
Use the 2nd url and it should work. Or find a way to tell the library to allow a certain number of redirects; 2 in this case.
Edit: This might help. Don't use this library yourself:
client.getOptions().setRedirectEnabled(true);
0
source to share