I am trying to use the statsmodel linear regression functions with formulas. My sample data is coming from a Pandas data frame. I am having a slight problem with column names within the formula. Due to the downstream processes, I have hyphens within my column names. For example:
+------+-------+-------+
+ VOLT + B-NN + B-IDW +
+------+-------+-------+
Now, one of the reasons for keeping the hyphen as it allows python to split the string for other analysis, so I have to keep it. As you can see, when I want to regress VOLT with B-NN using VOLT ~ B-NN
, I encounter a problem as the patsy formula cannot find B.
Is there a way to tell Patsy that B-NN is a variable name and not B minus NN?
Thanks.
BJR
patsy uses
Q
for quoting names, e.g.Q('B-IDW')
http://patsy.readthedocs.io/en/latest/builtins-reference.html#patsy.builtins.Q