Htmlagilitypack xpath not working

The problem I have is that my xpath is not working.

I am trying to get a link to the following google.com link at the bottom.

But I am unable to reach url using Xpath.

Please help me fix my xpath. Also tell me what should be in place?

HtmlWeb hw = new HtmlWeb();

HtmlAgilityPack.HtmlDocument doc = hw.Load("http://www.google.com/search?q=seo");
HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes("//*[@id='pnnext']");

foreach (HtmlNode linkNode in linkNodes)
{
    HtmlAttribute link = linkNode.Attributes["href"];
    MessageBox.Show(link.Value );
}

      

+3


source to share


1 answer


The weird thing here is that HtmlAgilityPack doesn't recognize the id

Next link attribute .

This could be a bug in the HtmlAgilityPack; you can post it to HAP Issue Tracker .

However, at the same time I found this solution:

  • find the table containing the swap items (table c id="nav"

    ). The identifier for this element is correctly recognized
  • take the first (and only tr

    ) in the table and the last td

    one (using XPath function last()

    )
  • take the element a

    inside td

    that we got in the previous step.

In short, here's the code:



var doc = new HtmlWeb().Load("http://www.google.com/search?q=seo");

var nextLink = doc.DocumentNode
    .SelectSingleNode("//table[@id='nav']/tr/td[last()]/a");

Console.WriteLine(nextLink.GetAttribute("href", "err"));

      


Update

After Simon's comment, I checked this again and the conclusion is that this is not a bug in the HTML Agility Pack; the attribute id="pnnext"

is only present when the request is made by the browser (possibly depending on the value of the UserAgent header). When executed HttpWebRequest

from code, this means the following link appears in the output:

<a href="/search?q=seo&amp;hl=en&amp;ie=UTF-8&amp[...]" style="text-align:left">

      

+4


source







All Articles