Why doesn't the RegExp "greedy" mode work?

Question

Why doesn't the RegExp "greedy" mode work?

584 Views Asked by Green At 17 August 2025 at 00:27

I do not understand the behavior. I have such example, need to catch html comment.

var str = '.. <!--My -- comment test--> ';

var regex1 = /<!--[.]*-->/g;
var regex2 = /<!--.*-->/g;

alert(str.match(regex1));      // null
alert(str.match(regex2));      // <!--My -- comment test-->

The second regex regex2 works fine, outputs exactly what's needed. The first shows null. And I don't understand the difference. RegExpressions  and  mean the same - "after ". But for the second it works and for the first does not. Why?

UPD. I've read comments and have an update.

var str3 = '.. <!--Mycommenttest--> ';
var str4 = '.. <!--My comment test--> ';

var regex3 = /<!--[\w]*-->/g;
var regex4 = /<!--[\s\S]*-->/g;

alert(str.match(regex3));         // <!--Mycommentstest-->
alert(str.match(regex4));         // <!-- My comment test -->

So it's possible to use limited matching variables to match anything. So which way should be used to use RegExps right way? With [] or without them? Can't get the difference, both give the right output.

Original Q&A

There are 4 best solutions below

sidyll On 03 February 2012 at 17:48

The dot (.) does not mean "anything" inside a character class. Why would you need a character class to match anything?

Wes Hardaker On 03 February 2012 at 17:49

The first doesn't because it doesn't mean the same. The first means to match the period character. The period character isn't a generic match when put inside of a [] set. (and if you think about it, this makes sense: why would you want to match anything inside a set of limited matching variables)

beerbajay On 03 February 2012 at 18:22

RegExpressions  and  mean the same

This is not correct.

The brackets [] indicate a character class, where any character in the class may be matched. [.] is the character class which contains the '.' character. Contrast this with ., which is a pre-defined character class taken to mean "any character" (except for line-terminators).

So what you're matching with  is either an empty comment or a comment consisting wholly of '.' characters. And what you're matching with  is either an empty comment or a comment filled with any character except linebreaks.

**Alan Moore** · Accepted Answer

Character class shorthands like \w, \d and \s mean exactly the same inside character classes as out, but metacharacters like . typically lose their special meanings inside character classes. That's why // didn't work as you expected: [.] matches a literal ..

But // doesn't really work either, since . doesn't match newlines. In most regex flavors you would use single-line mode to let the dot match all characters including newlines, like this: //s or this: (?s). But JavaScript doesn't support that feature, so most people use [\s\S] instead, meaning "any whitespace character or any character that's not whitespace"--in other words, any character.

But that's not right either, since (as Jason pointed out in his comment) it will greedily match everything from the first , which could encompass several individual comments and all the non-comment material between them. To make it truly correct is probably not worth the effort. When using regexes to match HTML, you have to make many simplifying assumptions anyway; if you can't assume a certain level of well-formedness, you might as well give up. In this case, it should suffice to make the quantifier non-greedy:

var regex5 = /<!--[\s\S]*?-->/g;

Why doesn't the RegExp "greedy" mode work?

There are 4 best solutions below

Related Questions in JAVASCRIPT

Related Questions in REGEX

Related Questions in REGEX-GREEDY

Related Questions in QUANTIFIERS

Trending Questions

Popular # Hahtags

Popular Questions