Regular expression variants matching a multi-line line and also ignoring case
I have some poorly formed html, sometimes "missing". Also, sometimes it shows cases of capital expenditures, and in other cases, lower ones:
<DIV class="main">
<DIV class="subsection1">
<H2>
<DIV class=subwithoutquote>StackOverflow</DIV></H2></DIV></DIV>
I would like to match both multi-line and ignore case. But the next pattern doesn't seem to work. (For concatenation, I also tried & instead of)
const string pattern = @"<div class=""?main""?><div class=""?subsection1""?><h2><div class=""?subwithoutquote""?>(.+?)</div>";
Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase & RegexOptions.Singleline);
Or do you need to add \ n * to the pattern to solve a multi-line problem?
source to share
The first problem is that you don't allow regex whitespace between tabs. Correct regex (tested in Rubular):
<div class=""?main""?>\s*<div class=""?subsection1""?>\s*<h2>\s*<div class=\"?subwithoutquote\"?>(.+?)<\/div>\s*
Note the addition of multiple entries \s*
.
The second problem is that you are not concatenating the parameters correctly.
Your code:
Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase & RegexOptions.Singleline);
Since these are bit flags, the AND ( &
operator) bit is an invalid flag. What you want is the Bywise-or ( |
) operator .
Bitwise - And means "if a bit is set to either of them, leave it set, otherwise unset it. You need a bitwise - or, which means" if a bit is set to or from them, set it; otherwise, disable it. "
source to share
In this case, you need them OR.
const string pattern = @"<div class=""?main""?><div class=""?subsection1""?><h2><div class=""?subwithoutquote""?>(.+?)</div>";
Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline)
Edit: change your RegEx to the following ...
const string pattern = @"<div class="?main"?>\s*<div class="?subsection1"?>\*+<h2>\s*<div class="?subwithoutquote"?>(.+?)</div>
source to share