Get part of matched value with regex
I am trying to get a part of a string.
This expression is used:
@"<a .*href=""(?<Url>(.*))(?="")"""
Sample data to match:
var input = @"<html lang=""en"">
<head>
<link href=""http://www.somepage.com/c/main.css"" rel=""stylesheet"" type=""text/css"" />
<link rel=""canonical"" href=""http://www.somepage.com"" />
<script src=""http://www.somepage.com/professional/bower_components/modernizr/modernizr.js"" type=""text/javascript""></script>
</head>
<body>
<header>
<div>
<div>
<a aria-haspopup=""true"" href=""http://www.somepage.com/someotherpage""><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>
</div>
</div>
</header>
</body>
</html>"
Now I managed to get this value:
http://www.somepage.com/someotherpage\"><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>
with this code:
var regexPattern = new Regex(PATTERN, RegexOptions.IgnoreCase);
var matches = regexPattern.Matches(httpResult);
foreach (Match match in matches)
{
// here I'm getting this value
var extractedValue = match.Groups["Url"].Value; // it value is http://www.somepage.com/someotherpage\"><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>
}
What I want to get under match.Groups["Url"].Value
is simple http://www.somepage.com/someotherpage
without any changes after href
attribute
.
Is it possible to get only that part of the match without using Substring
on extractedValue
?
source to share
The following should work:
<a .*href=""(?<Url>(.+?))(?="")""
The problem was that in (. *) * Is greedy. +? "Matches the previous item one or more times, but as little as possible" , so it stops at the first quote. For more information on greed in regexes, you can check the Regular Expression Tutorial - Repeating with Star and Plus
source to share
Use this template instead, much less indented if you don't use memo .*
(faster processing). Also the template uses \x22
both "
to simplify template manipulation as it avoids the C # confusion problem.
Regex.Matches(input, @"<a.+href=\x22(?<Url>[^\x22]+).+/a>")
.OfType<Match>()
.Select (mt => mt.Groups["Url"].Value);
// Result = http://www.somepage.com/someotherpage
source to share