I do not understand the behavior. I have such example, need to catch html comment.
var str = '.. <!--My -- comment test--> ';
var regex1 = /<!--[.]*-->/g;
var regex2 = /<!--.*-->/g;
alert(str.match(regex1)); // null
alert(str.match(regex2)); // <!--My -- comment test-->
The second regex regex2
works fine, outputs exactly what's needed. The first shows null
. And I don't understand the difference. RegExpressions <!--[.]*-->
and <!--.*-->
mean the same - "after <!--
take ANY character except newline in quantity from 0 to as many as possible and finish with -->
". But for the second it works and for the first does not. Why?
UPD. I've read comments and have an update.
var str3 = '.. <!--Mycommenttest--> ';
var str4 = '.. <!--My comment test--> ';
var regex3 = /<!--[\w]*-->/g;
var regex4 = /<!--[\s\S]*-->/g;
alert(str.match(regex3)); // <!--Mycommentstest-->
alert(str.match(regex4)); // <!-- My comment test -->
So it's possible to use limited matching variables to match anything. So which way should be used to use RegExps right way? With []
or without them? Can't get the difference, both give the right output.
Character class shorthands like
\w
,\d
and\s
mean exactly the same inside character classes as out, but metacharacters like.
typically lose their special meanings inside character classes. That's why/<!--[.]*-->/
didn't work as you expected:[.]
matches a literal.
.But
/<!--.*-->/
doesn't really work either, since.
doesn't match newlines. In most regex flavors you would use single-line mode to let the dot match all characters including newlines, like this:/<!--.*-->/s
or this:(?s)<!--.*-->
. But JavaScript doesn't support that feature, so most people use[\s\S]
instead, meaning "any whitespace character or any character that's not whitespace"--in other words, any character.But that's not right either, since (as Jason pointed out in his comment) it will greedily match everything from the first
<!--
to the last-->
, which could encompass several individual comments and all the non-comment material between them. To make it truly correct is probably not worth the effort. When using regexes to match HTML, you have to make many simplifying assumptions anyway; if you can't assume a certain level of well-formedness, you might as well give up. In this case, it should suffice to make the quantifier non-greedy: