I have the following statement in R
library(plyr)
filteredData <- ddply(data, .(ID1, ID2), businessrule)
I am trying to use Python and Pandas to duplicate the action. I have tried...
data['judge'] = data.groupby(['ID1','ID2']).apply(lambda x: businessrule(x))
But this provides error...
incompatible index of inserted column with frame index
The error message can be reproduced with
It is likely that your code raises an error for the same reason this toy example does. The right-hand side is a Series with a 2-level MultiIndex:
df['new'] = ...
tells Pandas to assign this Series to a column indf
. Butdf
has a single-level index:Because the single-level index is incompatible with the 2-level MultiIndex, the assignment fails. It is in general never correct to assign the result of
groupby/apply
to a columns ofdf
unless the columns or levels you group by also happen to be valid index keys in the original DataFrame,df
.Instead, assign the Series to a new variable, just like what the R code does:
Note that
lambda x: businessrule(x)
can be replaced withbusinessrule
.