I'm working on upgrading a LogisticRegression text classification from single word features to bigrams (two word features). However when I include the two word feature in the formula sent to patsy.dmatrices, I receive the following error...
y, X = dmatrices("is_host ~ dedicated + hosting + dedicated hosting", df, return_type="dataframe")
File "<string>", line 1
dedicated hosting
^
SyntaxError: unexpected EOF while parsing
I've looked around online for any examples on how to approach this and haven't found anything. I tried throwing a few different syntax options at the formula and none seem to work.
"is_host ~ dedicated + hosting + {dedicated hosting}"
"is_host ~ dedicated + hosting + (dedicated hosting)"
"is_host ~ dedicated + hosting + [dedicated hosting]"
What is the proper way to include multi-word features in the formula passed to dmatricies?
You want:
Q
is short for quote.