I'm writing a small conversion program that takes a reduced Markdown syntax to html (as a learning exercise) but I'm having trouble getting the spacing correct:
from pyparsing import *
strong = QuotedString("**")
text = Word(printables)
tokens = strong | text
grammar = OneOrMore(tokens)
strong.setParseAction(lambda x:"<strong>%s</strong>"%x[0])
A = "The **cat** in the **hat**."
print ' '.join(grammar.parseString(A))
What I get:
The <strong>cat</strong> in the <strong>hat</strong> .
What I would like:
The <strong>cat</strong> in the <strong>hat</strong>.
Yes this can be done without pyparsing and other utilities exist to do the exact same thing (e.g. pandoc) but I would like to know how to do this using pyparsing.
Not very skilled with pyparsing but I would try to use
transformString()
instead ofparseString()
, andleaveWhitespace()
for the tokens matched, like:It yields:
UPDATE: Improved version pointed out by Paul McGuire (see comments):