Parsing HTML to Formatted Plaintext using jsoup

I was working on a maven project that allows me to parse html data from a website. I was able to parse it using the following code:

public void parseData(){
        String url = "";
        try {
            Document doc = Jsoup.connect(url).get();
            Element essay ="div.col-section").first();
            String essayText = essay.text();

        } catch (IOException ex) {
            Logger.getLogger(formAdem.class.getName()).log(Level.SEVERE, null, ex);


So far I have no problems. I can parse html data. I used the select method from jsoup and I was fetching the data using "div.col-section", which means I am looking for a div element with class col-section. I wanted to print the data in a textbox. The result I have is a huge paragraph, although the actual data on the website is more than one paragraph. So, how to parse the data just like on a website?


source to share

1 answer

The reason that it is not formatted, is that formatting is in HTML - tagged <p>

and <ol>

etc. A call .text()

to a block element loses this formatting.

Jsoup has a sample HTML for a simple text converter that you can tailor to your needs - by providing a div element as focus.

Alternatively, you can simply select "div.col-section > *"

and iterate over each element and print that text with a new line.



All Articles