C # regex
I have a html page with a link like / with _us.php? Page = digit and out.php? i = digit. how can I get all these links from the page, but it will be better if I can get just numbers from these links right away.
+2
kusanagi
source
to share
2 answers
The HTML Agility Pack is perfect for this; this is pretty much the same as the example on the home page:
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]")
{
string href = link["href"].Value;
}
Now just parse the "href"; maybe something like:
Match match = Regex.Match(href, @"[&?]\w+=(\d+)");
int i;
if (match.Success && int.TryParse(match.Groups[1].Value, out i))
{
Console.WriteLine(i);
}
+3
Marc gravell
source
to share
You might want to try actually parsing the page and translating the DOM.
Try: http://www.codeplex.com/htmlagilitypack
0
Christopher tarquini
source
to share