Get innerText from <div class> with <a href> child

I am working with webBrowser in C # and I need to get text from link. A link is just a href without a class.

like this

<div class="class1" title="myfirstClass">
<a href="link.php">text I want read in C#
<span class="order-level"></span>

      

Shouldn't it be something like this?

        HtmlElementCollection theElementCollection = default(HtmlElementCollection);
        theElementCollection = webBrowser1.Document.GetElementsByTagName("div");
        foreach (HtmlElement curElement in theElementCollection)
        {
            if (curElement.GetAttribute("className").ToString() == "class1")
            {
                HtmlElementCollection childDivs = curElement.Children.GetElementsByName("a");
                foreach (HtmlElement childElement in childDivs)
                {
                    MessageBox.Show(childElement.InnerText);
                }

            }
        }

      

+3


source to share


2 answers


This is how you get an element by tag name:

String elem = webBrowser1.Document.GetElementsByTagName("div");

      

And with that, you have to extract the href value:

var hrefLink = XElement.Parse(elem)
     .Descendants("a")
     .Select(x => x.Attribute("href").Value)
     .FirstOrDefault();

      

If you have more than 1 "a" in it, you can also set up a foreach loop if that's what you want.

EDIT:

With XElement:

You can get content including the outer node by calling element.ToString()

.



If you want to exclude an outer tag, you can call String.Concat(element.Nodes())

.

To get innerHTML from HtmlAgilityPack

:

  • Install HtmlAgilityPack from NuGet .
  • Use this code.

HtmlWeb web = new HtmlWeb();

HtmlDocument dc = web.Load("Your_Url");

var s = dc.DocumentNode.SelectSingleNode("//a[@name="a"]").InnerHtml;

Hope this helps!

+1


source


Here I have created a console application to extract the anchor text.

static void Main(string[] args)
        {
            string input = "<div class=\"class1\" title=\"myfirstClass\"><a href=\"link.php\">text I want read in C#<span class=\"order-level\"></span>";
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(input);
            foreach (HtmlNode item in doc.DocumentNode.Descendants("div"))
            {
                var link = item.Descendants("a").First();
                var text = link.InnerText.Trim();
                Console.Write(text);
            }
            Console.ReadKey();
        }

      



Please note that this is a question htmlagilitypack

, so please mark the question correctly.

+1


source







All Articles