Simple regex question?

I have a stringstream where it has many strings inside:

  <A style="FONT-WEIGHT: bold" id=thread_title_559960       href="http://microsoft.com/forum/f80/topicName-1234/">Beautiful Topic Name</A> </DIV> 

      

I'm trying to get relevant links that start with:

style="FONT-WEIGHT: bold

      

As a result, I will have a link:

http://microsoft.com/forum/f80/topicName-1234/

Topic Id:
    1234

Topic Display Name:
    Beautiful Topic Name

I am using this pattern, right now, but it doesn't do it all:
    "href=\"(?<url>.*?)\">(?<title>.*?)</A>"

      

Because there are other links starting with href.

Also, in order to use Regex, I added all the lines in one line of a line. Does regex support newlines? IE can keep matching lines that span multiple lines?

Please help me with the template.

+2


source to share


2 answers


In a regular expression, the dot wildcard does not match newline characters. If you want to match any character, including newlines, use [^\x00]

instead .

. This matches all but the null character, which means it matches all.

Try the following:

<A\s+style="FONT-WEIGHT: bold"\s+id=(\S+)\s+href="([^"]*)">([^\x00]*?)</A>

      



If you are trying to assign this to a string using double quotes, you will need to avoid quotes and backslashes. It will look something like this:

myVar = "<A\\s+style=\"FONT-WEIGHT: bold\"\\s+id=(\\S+)\\s+href=\"([^\"]*)\">([^\\x00]*?)</A>";

      

+4


source


You can make .

matching newlines in the template using the RegexOptions.Singleline enum:

Specifies single line mode. changes the value of the period (.), so it matches every character (instead of every character except \ n).



So, if your title spans multiple lines, with this option enabled, the (?<title>.*?)

pattern portion will continue on lines trying to find a match.

+2


source







All Articles