I have some piece of ill-formed html, sometimes the " is missing. Also, it sometimes shows capital cases while other times lower cases:
<DIV class="main">
<DIV class="subsection1">
<H2>
<DIV class=subwithoutquote>StackOverflow</DIV></H2></DIV></DIV>
I would like to match both multi-line and ignoring the case. But the following patern does not seem to be working. (For the concatenation, I also tried | instead of &)
const string pattern = @"<div class=""?main""?><div class=""?subsection1""?><h2><div class=""?subwithoutquote""?>(.+?)</div>";
Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase & RegexOptions.Singleline);
Or should I add \n* in the pattern to solve multi-line issue?
The first problem is that you are not allowing for white-space in your regex between tabs. The correct regex (tested in Rubular) is:
Notice the addition of several
\s*
entries.The second problem is that you're not concatenating the options properly.
Your code:
Since these are bit flags, Bitwise-And (
&
operator) is a wrong flag. What you want is Bitwise-Or (|
operator).