C # Regex cannot match anyone (possibly because it cannot handle characters correctly)
I am creating a regex pattern and testing on this site: http://rubular.com/
I am writing this template just like the first field on this site.
<div class="product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/>
I left the second box empty.
My regex pattern works great with this site.
But I cannot get it to work in C #
I am trying to do this:
WebClient client = new WebClient();
string MainPage = client.DownloadString("http://www.vatanbilgisayar.com/cep-telefonu-modelleri/");
string ItemPattern = "<div class=\"product clearfix\">\\n+" + // <div class="product clearfix">\n
"<div class=\"img\">\\n" + // <div class="img">\n
"+<a href=\"(.*?)\">\\n" + // +<a href="(.*?)">\n
"+<img class=\"lazyload\"" + // +<img class="lazyload"
"id='.*' data-original=\"(.*?)\"" + // id='.*' data-original="(.*?)"
"alt=\".*\" title=\"(.*?)\"\\/>"; // alt=".*" title="(.*?)" \/>
MatchCollection matches = Regex.Matches(MainPage, ItemPattern);
foreach (Match match in matches)
{
Console.WriteLine("Area Code: {0}", match.Groups[1].Value);
Console.WriteLine("Telephone number: {0}", match.Groups[2].Value);
Console.WriteLine();
}
I just avoid every "s \. I really don't understand why it doesn't work and it starts to drive me crazy ..
source to share
use TWO \ for every single \ in your line. Apart from the slip you already did for the quotes. Since \ is an escape character. It looks like basically "\ n" happens 3 times.
Original line:
"product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/
Alternatively, you can split this into multiple lines. C # ignores spaces, so just close the quote and add "+" to the end of the line, continue starting at another quote.
C # line:
string ItemPattern = "<div class=\"product clearfix\">\\n" + // <div class="product clearfix">\n
"+<div class=\"img\">\\n" + // +<div class="img">\n
"+<a href=\"(.*?)\">\\n" + // +<a href="(.*?)">\n
"+<img class=\"lazyload\"" + // +<img class="lazyload"
"id='.*' data-original=\"(.*?)\"" + // id='.*' data-original="(.*?)"
"alt=\".*\" title=\"(.*?)\"\\/>"; // alt=".*" title="(.*?)" \/>
If you still have a problem with this, there is something else wrong, perhaps in RegEx.Match (mainPage, ItemPattern). According to the debugging you did, it looks like the string is successfully being generated and there is no MatchCollection. So it is either in how you get matches or referring to them.
source to share