Markdown syntax with pyparsing, getting spaces correct

484 Views Asked by At

I'm writing a small conversion program that takes a reduced Markdown syntax to html (as a learning exercise) but I'm having trouble getting the spacing correct:

from pyparsing import *

strong  = QuotedString("**")
text    = Word(printables)
tokens  = strong | text
grammar = OneOrMore(tokens)

strong.setParseAction(lambda x:"<strong>%s</strong>"%x[0])

A = "The **cat** in the **hat**."
print ' '.join(grammar.parseString(A))

What I get:

The <strong>cat</strong> in the <strong>hat</strong> .

What I would like:

The <strong>cat</strong> in the <strong>hat</strong>.

Yes this can be done without pyparsing and other utilities exist to do the exact same thing (e.g. pandoc) but I would like to know how to do this using pyparsing.

1

There are 1 best solutions below

2
On BEST ANSWER

Not very skilled with but I would try to use transformString() instead of parseString(), and leaveWhitespace() for the tokens matched, like:

from pyparsing import *

strong  = QuotedString("**").leaveWhitespace()
text    = Word(printables).leaveWhitespace()
tokens  = strong | text
grammar = OneOrMore(tokens)

strong.setParseAction(lambda x:"<strong>%s</strong>"%x[0])

A = "The **cat** in the **hat**."
print grammar.transformString(A)

It yields:

The <strong>cat</strong> in the <strong>hat</strong>.

UPDATE: Improved version pointed out by Paul McGuire (see comments):

from pyparsing import *

strong  = QuotedString("**")

strong.setParseAction(lambda x:"<strong>%s</strong>"%x[0])

A = "The **cat** in the **hat**."
print strong.transformString(A)