Using regex to get the value of html tags variable

I am trying to get value between certain html text, still not successful, I cannot use html aglity pack as it gives data present only between html tags

public static string[] split_comments(string html)
    {
        html = html.ToLower();


        html = html.Replace(@""""," ");

      

the actual line in html is

//<meta itemprop="rating" content="4.7"> the 4.7 value changes every time and I need to get this value

Match match = Regex.Match(html, @"<meta itemprop=rating content=([A-Za-z0-9\-]+)\>$");
            if (match.Success)
            {
                // Finally, we get the Group value and display it.
                string key = match.Groups[1].Value;
            }

      

So I am trying to get the html tag and in this tag I want to get data, variables all the time.

+3


source to share


5 answers


string html = "<meta itemprop=\"rating\" content=\"4.7\">";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var content = doc.DocumentNode
                .Element("meta")
                .Attributes["content"].Value;

      

- EDIT -

From your first acceptance and then the unacceptable answer, I think you took the code and ran your real html and saw that it returned the wrong result.

This does not show that the answer is wrong, as it works correctly with the posted snippet.



So, making a wild guess and assuming there are other tags meta

in your real html with attributes itemprop

like

<meta itemprop="rating" content="4.7">
<meta itemprop="somekey" content="somevalue">

      

the code will look like this:

var content = doc.DocumentNode
                .Descendants("meta")
                .Where(n => n.Attributes["itemprop"] != null && n.Attributes["itemprop"].Value == "rating")
                .Select(n => n.Attributes["content"].Value)
                .First();

      

+4


source


First you have to replace this:

html = html.Replace(@""""," ");

      

with this:

html = html.Replace(@"""","");

      

and change your regex to:



Match match = Regex.Match(html, @"<meta itemprop=rating content=([A-Za-z0-9\-.]+)\>$");

      

otherwise your if will always be false. After that, you can just use the substring:

 html = html.Substring(html.IndexOf("content=") + 8);

 html = html.Substring(0, html.Length - 1);

      

I hope this helps

+2


source


Here

html = html.Replace(@""""," "); 

      

you are replacing double quotes with spaces. So your example line looks like this:

<meta itemprop= rating  content= 4.7 > 

      

Your Regex, however, matches the text without any extra whitespace. Also, your regex requires a backslash before the closure >

, which is not in this example.

+1


source


Your regex should be something like @"\<meta.+?content\=\"(.+)\"\>"

. Parsing HTLM with regex is bad though.

+1


source


try this:

        double searchedValue;
        Regex reg = new Regex(@"content= (?<groupname>.*?) >");
        var matches = reg.Match(@"<meta itemprop= rating  content= 4.7 >");
        var value = matches.Groups["groupname"].Value;
        //maybe you need to replace like value.Replace('.',',')
        double.TryParse(value , out searchedValue);

      

(?<groupname> ... )

sets the group. you can access the value withmatches.Groups["groupname"].Value

.*?

reads in the next match " >

".

if you don't use " ?

", it will look for the last match " >

" in the text.

Good luck =)

+1


source







All Articles