I'm trying to write a Prettify-style syntax highlighter for Qiskit Terra (which closely follows the Python syntax). Apparently, Prettify uses Javascript flavor regex. For instance, /^\"(?:[^\"\\]|\\[\s\S])*(?:\"|$)/, null, '"' is the regex corresponding to valid strings in Q#. Basically I'm trying to put together the equivalent regex expression for Python.
Now, I know that Python supports strings within triple quotes i.e. '''<string>''' and """<string>""" are valid strings (this format is especially used for docstrings). To deal with this case I wrote the corresponding capturing group as:
(^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$))
Here is the regex101 link.
This works okay except in some cases like:
''' 'This "is" my' && "first 'regex' sentence." ''' &&
''' 'This "is" the second.' '''
Clearly here it should have considered ''' 'This "is" my' && "first 'regex' sentence." ''' as one string and ''' 'This "is" the second.' ''' as another. But no, the regex I wrote groups together the whole thing as one string (check the regex101 link). That is, it doesn't conclude the string even when it encounters a ''' (corresponding to the ''' at the beginning).
How should I modify the regex (^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$)) to take into account this case? I'm aware of this: How to match “anything up until this sequence of characters” in a regular expression? but it doesn't quite answer my question, at least not directly.
I Don't know what else you want to use this for but the following regex does what you want with the example given with the MULTILINE flag on.
Output is,
You can also see it working here https://regex101.com/r/k4adk2/11