How To Handle Too Much Redirection With HtmlUnit

I am trying to parse the site, but I ran into an exception Too much redirect

. Here is my code:

WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
HtmlPage homePage = null;
String url = "http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio";
try {
    client.getOptions().setUseInsecureSSL(true);
    client.setAjaxController(new NicelyResynchronizingAjaxController());
    client.getOptions().setThrowExceptionOnFailingStatusCode(false);
    client.getOptions().setThrowExceptionOnScriptError(false);
    client.waitForBackgroundJavaScript(30000);
    client.waitForBackgroundJavaScriptStartingBefore(30000);
    client.getOptions().setCssEnabled(false);
    client.getOptions().setJavaScriptEnabled(true);
    client.getOptions().setRedirectEnabled(true);
    homePage = client.getPage(url);
    synchronized (homePage) {
        homePage.wait(25000);
    }
    System.out.println(homePage.asXml());
} catch (Exception e) {
    e.printStackTrace();
}        

      

Exceptions are listed below

com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: Too much redirect for http://www.freelake.org/resolver/2345183424.20480.0000/route.00/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1353)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1371)

      

Is there a way to solve this problem?

+3


source to share


3 answers


This is because HtmlUnit caches the response and redirects to another page and then goes back.

I have tested below and it works:



client.getCache().setMaxSize(0);

      

+5


source


I had the same problem, but I am doing it through Selenium. In Selenium, you cannot access the WebClient directly because this is protected

.

I worked around it this way:



WebDriver driver = new HtmlUnitDriver(true) {
    {
        this.getWebClient().getCache().setMaxSize(0);
    }
};

      

+1


source


The page http://www.freelake.org/pages/Freetown-Lakeville_RSD/Departments/Director_of_Financial_Operatio sends 2 redirects:

Use the 2nd url and it should work. Or find a way to tell the library to allow a certain number of redirects; 2 in this case.

Edit: This might help. Don't use this library yourself:

client.getOptions().setRedirectEnabled(true);

      

0


source







All Articles