capture text in quotes, including nested quotes

51 Views Asked by At

I have been wanting to create a regex pattern that allows capturing the text contained between single and double quotes, taking into account that the closing quote must be the same as the opening quote, and must include the nested quotes.

Input
'\'this text must be captured\' "this one too" \'and this "nested" too\' \'this should not be captured"'
Expected
['this text must be captured', 'this one too', 'and this "nested" too']

I have done a few but they all have some problems

pattern 1
pattern = r'"(.*?)"|\'(.*?)\''
pattern = r'"([^"]*)"|\'([^\']*)\''

result: [('', 'this text must be captured'), ('this one too', ''), ('', 'and this "nested" too')]

Here one of the two alternate cases captures correctly, but the other captures an empty one

pattern 2
pattern = r'(?P<unquoted>(?:"(?:\\.|[^"\\])*"|\'(?:\\.|[^\'\\])*\'))'

result: ["'this text must be captured'", '"this one too"', '\'and this "nested" too\'']

Here captures a single group, but includes the original quote which should not be included

1

There are 1 best solutions below

3
Rene Veerman On

closest i could get is this :

input :

'this text must be captured' "this one too" 'and this "nested" too' 'this should not be captured"'

regex pattern :

/['](.*?)[']|"(.*?)"/gm

the drawback of this regex is that it won't cover strings with single apostrophes, and returns the string with an extra double quote, but you can filter that out of your result set by checking if a match has only 1 single/double quote character..

and for escaped quotes, there's Regex for quoted string with escaping quotes