C # regex

I have a html page with a link like / with _us.php? Page = digit and out.php? i = digit. how can I get all these links from the page, but it will be better if I can get just numbers from these links right away.

+2


source to share


2 answers


The HTML Agility Pack is perfect for this; this is pretty much the same as the example on the home page:

foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]")
{
    string href = link["href"].Value;
}

      



Now just parse the "href"; maybe something like:

Match match = Regex.Match(href, @"[&?]\w+=(\d+)");
int i;
if (match.Success && int.TryParse(match.Groups[1].Value, out i))
{
    Console.WriteLine(i);
}

      

+3


source


You might want to try actually parsing the page and translating the DOM.



Try: http://www.codeplex.com/htmlagilitypack

0


source







All Articles