Is it possible to recursively capture a FINITE number of matches (all of the same format) using ECMA RegEx?

87 Views Asked by At

For instance, say we're looking at a query string, all-lowercase, all non-numeric, no special character (just [a-z] and =):

?some=querystring&ssembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches

Let us take as a given we know there will be three key-value pairs we wish to capture, and even that they are located at the beginning of said string:

  • some=querystring
  • ssembly=containing
  • n=indeterminate

Now, intuitively, it seems like I should be able to use something like...

^\?(&?[a-z=]+){3}.*$

...or possibly...

^\?(?:&?([a-z=]+)){3}.*$

...but, of course, the only capture this yields is

n=indeterminate

Is there a syntax that would allow me to capture all three groups (as independent, accessible values, natch) without having to resort to the following?

^\?([a-z=]+)&([a-z=]+)&([a-z=]+).*$

I know there's no way to capture n instances (an arbitrarily-large set), but, given this is a finite number of captures I wish to obtain from my finite automata...

I know full well there are any number of ways to accomplish this in Javascript, or any other language for that matter. I'm specifically trying to ascertain if I'm stuck with the WET expression above.

3

There are 3 best solutions below

0
sln On BEST ANSWER

It would take a complex explanation to describe the nuance how the FIRST continuous
specific number of non-breaking segments are done in the various flavors of Regex Engines.

This JavaScript regex below does that task and is really the only way to do it in JS.
Note that this regex will fail for any number of continuous segments less than 3.
You can test it here https://regex101.com/r/3oWjwz/1

Other engines have different tools to work with to accomplish this task.
For example the Dot Net engine is by far the most comprehensive tool bed for doing these things (Capture Colections) ^?(?:&?([a-z=]+)(?![a-z=])){3}.*$

var input = '?some=querystring&ssembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches'

var regex = RegExp("(?<=^\\?(?=(?:&?[a-z=]+(?![a-z=])){3})(?:&?[a-z=]+){0,2}&?)[a-z=]+", 'g');

console.log( input.match(regex) );

A List of how this is applied to different Quantified Forms.

Quantifier Unlimited +
https://regex101.com/r/bOGyJy/1

# Quantifier Unlimited +  
# (?<=^\?(?:&?[a-z=]+(?![a-z=]))*&?)[a-z=]+
# https://regex101.com/r/bOGyJy/1

(?<=
   ^ \? 
   (?:
      &? [a-z=]+ 
      (?! [a-z=] )
   )*
   &?
)
[a-z=]+ 

Quantifier Exact {3}
https://regex101.com/r/3oWjwz/1

# Quantifier Exact {3}  
# (?<=^\?(?=(?:&?[a-z=]+(?![a-z=])){3})(?:&?[a-z=]+){0,2}&?)[a-z=]+
# https://regex101.com/r/3oWjwz/1

(?<=
   ^ \? 
   (?=
      (?:
         &? [a-z=]+ 
         (?! [a-z=] )
      ){3}                     # Exact range 3
   )
   (?: &? [a-z=]+ ){0,2}       # Zero to one less tham max range
   &?
)
[a-z=]+ 

Quantifier Range {2,4}
https://regex101.com/r/D1NrLQ/1

# Quantifier Range {2,4}  
# (?<=^\?(?=(?:&?[a-z=]+(?![a-z=])){2,4})(?:&?[a-z=]+){0,3}&?)[a-z=]+
# https://regex101.com/r/D1NrLQ/1

(?<=
   ^ \? 
   (?=
      (?:
         &? [a-z=]+ 
         (?! [a-z=] )
      ){2}                     # 2 the minimum range
   )
   (?: &? [a-z=]+ ){0,3}       # Zero to one less than max range
   &?
)
[a-z=]+ 
17
Barmar On

There's no recursion in EcmaScript regular expressions. Reference documentation is here, you'll see there's no recursion operator. You can also check regular-expressions.info; it tells which engines support recursion: Perl 5.10, PCRE 4.0, Ruby 2.0, Delphi, PHP, and R.

8
trincot On

JavaScript has no concept of recursion in its regex syntax, but the example you have given is not about recursion, but adjacent repetition of the same pattern.

In that case I would suggest using a regex that just matches one occurrence of that pattern, but with the g flag, and use it with matchAll. This returns an iterator, and so you just consume the part that you need.

If it is guaranteed that you will have three matches, you can do:

const input = "?some=querystring&assembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches";
const [[a],[b],[c]] = input.matchAll(/\w+=\w+/g);
console.log(a, b, c);

This is just an example that is targeting your example. As matchAll returns an iterator, you can use the power of JS to work with iterators (like a for loop, destructuring assignment, spread syntax, ...etc).

Alternative: dynamically built regex

The repetitive nature of the regex you are troubled about can be taken over by the repeat() method:

const input = "?some=querystring&assembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches";
const regex = RegExp("([a-z]+=[a-z]+)&?".repeat(3));
const [, ...matches] = input.match(regex);
console.log(matches);