Node inner text ignoring children inner text
Excuse me if this sounds too simple to be asked, but since this is my first day with html-agility-pack, I cannot choose a way to select the inner text of a node that is a direct child of a node and ignore the inner text of child nodes.
for example
<div id="div1">
<div class="h1"> this needs to be selected
<small> and not this</small>
</div>
</div>
I am currently trying this
HtmlDocument page = new HtmlWeb().Load(url);
var s = page.DocumentNode.SelectSingleNode("//div[@id='div1']//div[@class='h1']");
string selText = s.innerText;
which returns all of the text (for example, this should be selected, not this). Any suggestions?
+3
source to share
2 answers
div
can have multiple text nodes if there is text before and after its children. Since I also pointed out here , I think the best way to get all the text content of a node is to do something like:
HtmlDocument page = new HtmlWeb().Load(url);
var nodes = page.DocumentNode.SelectNodes("//div[@id='div1']//div[@class='h1']/text()");
StringBuilder sb = new StringBuilder();
foreach(var node in nodes)
{
sb.Append(node.InnerText);
}
string content = sb.ToString();
+3
source to share