How to deal with multi-character commenter when lexing with shlex in Python?

159 Views Asked by At

I want to parse a string with some inline comment blocks like /* comments */, and I want to ignore those comments.

For example,

>>> import shlex
>>> string = "foo /bar /* this is a test */"
>>> lex = shlex.shlex(string)
>>> list(lex)
['foo', 'bar', '/', '*', 'this', 'is', 'a', 'test', '*', '/']

If I add the commenter /*, this is interpreted as two commenters / and *, and I missed the /bar

>>> import shlex
>>> string = "foo /bar /* this is a test */"
>>> lex = shlex.shlex(string)
>>> list(lex)
['foo']

Ideally, I want the result as

['foo', '/bar']

p.s., as to the comment by @Chillie why not use regular expression. I go for shlex because I I do want to use shlex to ignore the whitespace in the subquotes of the string.

For example, the real use case can be like

>>> import shlex
>>> string = "foo /bar \"value string\" /* this is a test */"

and I want to capture the value string in the quotes but not split by the whitespace inside the quotes,

>>> ['foo', '/bar', '\"value string\"']

I am not a regex expert. To design a good pattern for such a case is a bit difficult for me. If someone can come up with a regex solution, I am also fine with that.

0

There are 0 best solutions below