(.*))(?="")""" ...">

Get part of matched value with regex

I am trying to get a part of a string.

This expression is used:

@"<a .*href=""(?<Url>(.*))(?="")"""

      

Sample data to match:

var input = @"<html lang=""en"">
    <head>
        <link href=""http://www.somepage.com/c/main.css"" rel=""stylesheet"" type=""text/css"" />

        <link rel=""canonical"" href=""http://www.somepage.com"" />
        <script src=""http://www.somepage.com/professional/bower_components/modernizr/modernizr.js"" type=""text/javascript""></script>
    </head>
        <body>
            <header>
                <div>
                    <div>
                        <a aria-haspopup=""true"" href=""http://www.somepage.com/someotherpage""><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>
                    </div>
                </div>
            </header>
        </body>
    </html>"

      

Now I managed to get this value:

http://www.somepage.com/someotherpage\"><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>

      

with this code:

var regexPattern = new Regex(PATTERN, RegexOptions.IgnoreCase);
var matches = regexPattern.Matches(httpResult);
foreach (Match match in matches)
{
    // here I'm getting this value 
    var extractedValue = match.Groups["Url"].Value; // it value is http://www.somepage.com/someotherpage\"><img src=""http://www.somepage.com/i/sprite/logo.png"" alt=page"" /></a>
}

      

What I want to get under match.Groups["Url"].Value

is simple http://www.somepage.com/someotherpage

without any changes after href

attribute

.

Is it possible to get only that part of the match without using Substring

on extractedValue

?

+3


source to share


4 answers


You were almost there. Just one minor change to your regex to avoid quotes in the matching set.
<a .*href=""(?<Url>([^"]*))(?="")""
                  //^^^^ This is what i changed.

      



+2


source


Maybe it will work. Unfortunately, I don't have time to check this now:



"<a[^>]*href=\"(?<Url>([^\"]+))\"[^>]*>"

      

+1


source


The following should work:

<a .*href=""(?<Url>(.+?))(?="")""

      

The problem was that in (. *) * Is greedy. +? "Matches the previous item one or more times, but as little as possible" , so it stops at the first quote. For more information on greed in regexes, you can check the Regular Expression Tutorial - Repeating with Star and Plus

+1


source


Use this template instead, much less indented if you don't use memo .*

(faster processing). Also the template uses \x22

both "

to simplify template manipulation as it avoids the C # confusion problem.

Regex.Matches(input, @"<a.+href=\x22(?<Url>[^\x22]+).+/a>")
     .OfType<Match>()
     .Select (mt => mt.Groups["Url"].Value);
     // Result = http://www.somepage.com/someotherpage

      

0


source







All Articles