I came to this dilemma while working on improving my semi-automated library management for an ECAD software (namely KiCad), what is below is only an example which I hope represents the issue I ran into.
The library file contains multiple lines and sometimes the line can have double-quotes embedded in one of its fields, which is managed by adding an escape \
char. For instance:
string = "\"This is a \\\"difficult\\\" problem\" please help"
print(f'string = {string}')
will output:
string = "This is a \"difficult\" problem" please help
I need to split this string using shlex (this was the solution of choice, I would like to keep it), following those 2 conditions:
- "This is... problem" needs to be a single list item
- the double-quotes around both "This ... problem" and "difficult" need to be conserved.
Note: In this example, the two other words please
and help
do not require special treatment.
I've tried with both posix=False
and posix=True
:
- Using
posix=False
s1 = shlex.shlex(string, posix=False)
slist1 = list(s1)
print(f'slist1 = {slist1}')
Output:
slist1 = ['"This is a \\"', 'difficult', '\\', '" problem"', 'please', 'help']
- Using
posix=True
s2 = shlex.shlex(string, posix=True)
slist2 = list(s2)
print(f'slist2 = {slist2}')
Output:
slist2 = ['This is a "difficult" problem', 'please', 'help']
In the first case, you can immediately see it does not meet condition #1.
In the second case, it almost meets both conditions, but fails to conserve the double-quote around "This is ... problem". I cannot just add them post-split because I have no idea what position is the string in the list and I do not want to add the double-quotes to all entries.
Is there a trick for solving this or am I just running into a wall?
I'd really appreciate the help!