How to select leaf labels of html document using jsoup
I am using jsoup to parse an html document. I need to extract all child div elements. These are basically div tags without nested div tags. I have used the following in java to extract div tags,
Elements bodyTag = document.select("div:not(div>div)");
Here's an example:
<div id="header">
<div class="container">
<div id="header-logo">
<a href="/" title="mekay.com">
<div id="logo">
</div> </a>
</div>
<div id="header-banner">
<div data-type="ad" data-publisher="lqm.j2ee.site" data-zone="ron">
</div>
</div>
</div>
</div>
I only need to extract the following:
<div id="logo">
</div>
<div data-type="ad" data-publisher="lqm.j2ee.site" data-zone="ron">
</div>
Instead, the above code snippet returns all div tags. So, could you please help me figure out what is wrong with this selector.
source to share
If you only want tags div
that don't have any children then use this
Elements emptyDivs = document.select("div:empty");
The selector you are using now means fetch me all the divs that are not direct children of another div
. It's okay that it brings the very first parent div because it is div id="header"
not a direct child div
. Most likely its parent body
.
source to share