Web recycling with jsoup and selenium

I want to extract some information from this dynamic selenium and jsoup site. To get the information I want to extract, I have to click the "Details öffnen" button. The first photo shows the website before you click the button, and the second shows the website after you click the button. The red marked information is the information I want to extract.

enter image description here

enter image description here

At first I tried to fetch information with Jsoup only, but as I was told Jsoup cannot handle dynamic content, so now I am trying to fetch information with selenium and Jsoup as you can see in the source code. Howerver. I'm not sure if selenium is correct for this, so maybe there are other ways to extract the information I need, but it's important that this can be done using Java.

The next two pictures show the html code before the button is clicked and after the button is clicked.

enter image description here

enter image description here

public static void main(String[] args) {

    WebDriver driver = new FirefoxDriver(createFirefoxProfile());
    driver.get("http://www.seminarbewertung.de/seminar-bewertungen?id=3448");
    //driver.findElement(By.cssSelector("input[type='button'][value='Details öffnen']")).click();
    WebElement webElement = driver.findElement(By.cssSelector("input[type='submit'][value='Details öffnen'][rating_id='2318']"));
    JavascriptExecutor executor = (JavascriptExecutor)driver;
    executor.executeScript("arguments[0].click();", webElement);
    String html_content = driver.getPageSource();
    //driver.close();


    Document doc1 = Jsoup.parse(html_content);
    System.out.println("Hallo");

    Elements elements = doc1.getAllElements();
    for (Element element : elements) {
        System.out.println(element);
    }

}

private static FirefoxProfile createFirefoxProfile() {
    File profileDir = new File("/tmp/firefox-profile-dir");
    if (profileDir.exists())
        return new FirefoxProfile(profileDir);
    FirefoxProfile firefoxProfile = new FirefoxProfile();
    File dir = firefoxProfile.layoutOnDisk();
    try {
        profileDir.mkdirs();
        FileUtils.copyDirectory(dir, profileDir);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return firefoxProfile;
}

      

With this source code, I cannot find the div element with the information I want to extract.

It would be great if someone could help me with this.

+3


source to share


1 answer


It is true that Jsoup cannot handle dynamic content if it is generated by javascript, but in your case the button makes an Ajax request and that can be done with Jsoup quite well.

I would suggest making a call to remove the buttons and their ids, and then making successful calls (Ajax messages) to get the details (comments or whatever).

The code could be:

    Document document = Jsoup.connect("http://www.seminarbewertung.de/seminar-bewertungen?id=3448").get();
    //we retrieve the buttons
    Elements select = document.select("input.rating_expand");
    //we go for the first
    Element element = select.get(0);
    //we pick the id
    String ratingId = element.attr("rating_id");

    //the Ajax call
    Document document2 = Jsoup.connect("http://www.seminarbewertung.de/bewertungs-details-abfragen")
            .header("Accept", "*/*")
            .header("X-Requested-With", "XMLHttpRequest")
            .data("rating_id", ratingId)
            .post();

    //we find the comment, and we are done
    //note that this selector is only as a demo, feel free to adjust to your needs
    Elements select2 = document2.select("div.ratingbox div.panel-body.text-center");
    //We are done!
    System.out.println(select2.text());

      



This code will print what you want:

Das Eingehen auf individuelle Bedurfnisse eines jeden einzelnen Teilnehmer scheint mir ein Markenzeichen von Fromm zu sein. Bei einem früheren Seminar habe ich dies also schon so erlebt!

Hope this helps.

Happy New Year!

+3


source







All Articles