C # Regex cannot match anyone (possibly because it cannot handle characters correctly)

I am creating a regex pattern and testing on this site: http://rubular.com/

I am writing this template just like the first field on this site.

<div class="product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/>

      

I left the second box empty.

My regex pattern works great with this site.

But I cannot get it to work in C #

I am trying to do this:

WebClient client = new WebClient();

string MainPage = client.DownloadString("http://www.vatanbilgisayar.com/cep-telefonu-modelleri/");

string ItemPattern = "<div class=\"product clearfix\">\\n+" +   //  <div class="product clearfix">\n
                "<div class=\"img\">\\n" +                  //  <div class="img">\n
                "+<a href=\"(.*?)\">\\n" +                  //  +<a href="(.*?)">\n
                "+<img class=\"lazyload\"" +                //  +<img class="lazyload"
                "id='.*' data-original=\"(.*?)\"" +         //  id='.*' data-original="(.*?)"
                "alt=\".*\" title=\"(.*?)\"\\/>";           //  alt=".*" title="(.*?)" \/>

MatchCollection matches = Regex.Matches(MainPage, ItemPattern);

foreach (Match match in matches)
{
    Console.WriteLine("Area Code:        {0}", match.Groups[1].Value);
    Console.WriteLine("Telephone number: {0}", match.Groups[2].Value);
    Console.WriteLine();
}

      

I just avoid every "s \. I really don't understand why it doesn't work and it starts to drive me crazy ..

+3


source to share


2 answers


You need 2 layers of escape sequences. You need to run once for c#

and again for the regex syntax.



If you want to escape from characters for regex you need to escape as well \

, so you have to change \

to \\

to escape sequences at regex level

+4


source


use TWO \ for every single \ in your line. Apart from the slip you already did for the quotes. Since \ is an escape character. It looks like basically "\ n" happens 3 times.

Original line:

"product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/

      

Alternatively, you can split this into multiple lines. C # ignores spaces, so just close the quote and add "+" to the end of the line, continue starting at another quote.



C # line:

string ItemPattern = "<div class=\"product clearfix\">\\n" +   //  <div class="product clearfix">\n
                    "+<div class=\"img\">\\n" +                 //  +<div class="img">\n
                    "+<a href=\"(.*?)\">\\n" +                  //  +<a href="(.*?)">\n
                    "+<img class=\"lazyload\"" +                //  +<img class="lazyload"
                    "id='.*' data-original=\"(.*?)\"" +         //  id='.*' data-original="(.*?)"
                    "alt=\".*\" title=\"(.*?)\"\\/>";           //  alt=".*" title="(.*?)" \/>

      

If you still have a problem with this, there is something else wrong, perhaps in RegEx.Match (mainPage, ItemPattern). According to the debugging you did, it looks like the string is successfully being generated and there is no MatchCollection. So it is either in how you get matches or referring to them.

+2


source







All Articles