Regular expression variants matching a multi-line line and also ignoring case

I have some poorly formed html, sometimes "missing". Also, sometimes it shows cases of capital expenditures, and in other cases, lower ones:

<DIV class="main">
    <DIV class="subsection1">
   <H2>
   <DIV class=subwithoutquote>StackOverflow</DIV></H2></DIV></DIV>

      

I would like to match both multi-line and ignore case. But the next pattern doesn't seem to work. (For concatenation, I also tried & instead of)

const string pattern = @"<div class=""?main""?><div class=""?subsection1""?><h2><div class=""?subwithoutquote""?>(.+?)</div>";
Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase & RegexOptions.Singleline);

      

Or do you need to add \ n * to the pattern to solve a multi-line problem?

+3


source to share


2 answers


The first problem is that you don't allow regex whitespace between tabs. Correct regex (tested in Rubular):

<div class=""?main""?>\s*<div class=""?subsection1""?>\s*<h2>\s*<div class=\"?subwithoutquote\"?>(.+?)<\/div>\s*

      

Note the addition of multiple entries \s*

.

The second problem is that you are not concatenating the parameters correctly.



Your code:

Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase & RegexOptions.Singleline);

      

Since these are bit flags, the AND ( &

operator) bit is an invalid flag. What you want is the Bywise-or ( |

) operator .

Bitwise - And means "if a bit is set to either of them, leave it set, otherwise unset it. You need a bitwise - or, which means" if a bit is set to or from them, set it; otherwise, disable it. "

+6


source


In this case, you need them OR.

const string pattern = @"<div class=""?main""?><div class=""?subsection1""?><h2><div class=""?subwithoutquote""?>(.+?)</div>";
Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline)

      



Edit: change your RegEx to the following ...

const string pattern = @"<div class="?main"?>\s*<div class="?subsection1"?>\*+<h2>\s*<div class="?subwithoutquote"?>(.+?)</div>

      

+3


source







All Articles