Restricting to pandas method chaining, how to apply merge method using last dataframe state with lambda function without using pipe?
The code below works. But it depends on the pipe method.
(pd.DataFrame(
[{'YEAR':2013,'FK':1, 'v':1},
{'YEAR':2013,'FK':2, 'v':2},
{'YEAR':2014,'FK':1, 'v':3},
{'YEAR':2014,'FK':2, 'v':4}
])
.pipe(lambda w: w.merge(w.query('YEAR==2013')[['FK','v']],
on='FK',
how='left'
))
)
The code below doesn't work.
(pd.DataFrame(
[{'YEAR':2013,'FK':1, 'v':1},
{'YEAR':2013,'FK':2, 'v':2},
{'YEAR':2014,'FK':1, 'v':3},
{'YEAR':2014,'FK':2, 'v':4}
])
.merge(lambda w: w.query('YEAR==2013'),
on='FK',
how='left'
)
)
Return:
TypeError: Can only merge Series or DataFrame objects, a <class 'function'> was passed
You can't, this is precisely why the
pipemethod exists.For completeness, DataFrame methods/accessors that accept a callable (as primary parameter and as of pandas 2.0.3) are:
loc/ilocmask/whereassignapply/applymapFor other cases, you need to use
pipe.