How can I access the content of multiple <div> tags using HTMLAgilityPack?

I can't find any documentation for HTMLAgilityPack

the codeplex website. I currently want to access a div on an Amazon website and clean up text information for use in a WPF application.

var getWeb = new HtmlWeb();                     
var doc = getWeb.Load(uri);
HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//div[@id = 'zg_centerListWrapper']");

      

This div contains about 12 other divs, each of which is an element in a category best sellers

.

To access the properties of each is seemingly painstaking (and I'm also not entirely sure how I would do it at first glance). So which should I use DocumentNode.SelectNodes()

? And how should I implement it? Also I find it hard to believe that after such a time there is no documentation for HTMLAgilityPack

... Maybe I was looking in the wrong places because youtube seems to be my best source at the moment.

+3


source to share


2 answers


In fact, the parameter SelectNodes()

and SelectSingleNode()

is an expression of the xpath , the xpath version 1.0, to be precise (see. The xpath 1.0 here ).

XPath is another technology with its own specification, documentation and discussion. You can usually search for tutorials or xpath articles instead of the HtmlAgilityPack (HAP) specifications to better understand what expression the HAP should pass to get certain HTML elements.

For example, let's say your HTML looks like this:



<div id="zg_centerListWrapper">
    <div>I want this</div>
    <div>..and this</div>
    <div>..and this one too</div>
</div>

      

see that div

you are interested in direct children div[@id = 'zg_centerListWrapper']

, then you can use the following xpath to get them:

var xpath = "//div[@id = 'zg_centerListWrapper']/div";
HtmlNodeCollection ourNodes = doc.DocumentNode.SelectNodes(xpath);

      

+1


source


You can use DocumentNode.Descendants("div")

and then something like

.Where(div => div.Attributes.Contains("class") && div.Attributes["class"].Value.Contains("best category"))

      



But yes, the documentation will definitely help ..

0


source







All Articles