tokenize math expressions

556 Views Asked by At

I am working a Shunting Yard program in Python and running into a problem with a math expression such as "3x + 4 * y" or "3 sin(x)".

The problem is that the Python tokenize function doesn't know that I need "3x" to become ["3", "*" "x"] instead of just ["3", "x"]. I can, of course, stipulate that the user must enter 3*x instead of just 3x but that's so lame. There should be a cleaner way to get around this.

Here's the tokenizing code I am using (copied from stackoverflow postings):

expression="3x + 4*y"
from io import StringIO
print [token[1] for token in tokenize.generate_tokens (StringIO(expression).readline) if token[1]]

which gives me:

[u'3', u'x', u'+', u'4', u'*', u'y']

but I need:

[u'3', u'*', u'x', u'+', u'4', u'*', u'y']

in order for the Shunting Yard code to work properly.

Thank you for helping.

1

There are 1 best solutions below

0
On

You'll need to pay attention to the types of the tokens returned and preprocess the token array to get what you want. In your example, 3x results in a NUMBER followed by a NAME token. Insert the implied OP token and you're away.