Extracting all overlapping substrings between nested matching parentheses with a .NET regex

222 Views Asked by At

I'm trying to parse mathematical expressions with nested brackets:

(1 * (2 - 3)) + 4

I want to get every expression in brackets, like this:

  • (1 * (2 - 3))
  • (2 - 3)

Using this expression: (.*?\))(?=($|[^(]+)) I'm getting this result:

(1 * (2 - 3)

)

And using this expression: \(.*?\) I'm getting this result:

(1 * (2 - 3) 

But nothing works correctly. How can I loop an expression inside?

2

There are 2 best solutions below

2
On BEST ANSWER

You can use

(?=(\((?>[^()]+|(?<c>)\(|(?<-c>)\))*(?(c)(?!))\)))

See the regex demo. Details:

  • (?= - a positive lookahead:
    • (\((?>[^()]+|(?<c>)\(|(?<-c>)\))*(?(c)(?!))\))) - Group 1:
      • \( - a ( char
      • (?>[^()]+|(?<c>)\(|(?<-c>)\))* - zero or more repetitions of any one or more chars other than ( and ), or a ( char (with a value pushed onto Group "c" stack), or a ) char (with a value popped from the Group "c" stack)
      • (?(c)(?!)) - if Group "c" stack is not empty, fail and backtrack
      • \) - a ) char.

See the C# demo:

var text = "(1 * (2 - 3)) + 4";
var pattern = @"(?=(\((?>[^()]+|(?<c>)\(|(?<-c>)\))*(?(c)(?!))\)))";
var results = Regex.Matches(text, pattern)
    .Cast<Match>()
    .Select(m => m.Groups[1].Value)
    .ToList();
Console.WriteLine(String.Join(", ", results));
// => (1 * (2 - 3)), (2 - 3)
4
On

The usual way would be to use a recursive regular expression but unfortunately this capability is not supported by C#'s Regex. Alternatively, you can manually parse the string (and there is C# code provided in this PAQ to do that).