I'm actually facing some problem I cannot understand. Imagine this scenario:
df_mock = pd.DataFrame({'v': [[1,2,3],[4,5,6],[7,8,9]]})
class O:
def __init__(self, row):
self.row = row
def calc(self):
self.v = self.row.v
df_mock['obj'] = df_mock.apply(lambda row: O(row), axis=1)
df_mock['obj'].apply(lambda o: o.calc())
print(df_mock['obj'].apply(lambda o: o.v))
When I run this, I get:
0 [7, 8, 9]
1 [7, 8, 9]
2 [7, 8, 9]
Name: obj, dtype: object
But I expected that a reference to each row gets copied in the obj O.row. However, for some reason, after the apply, the last reference is kept in the objects of all rows.
Why does this happen? Does pandas.apply(axis=1) make some kind of unique reference for all rows and passes the current row as the same reference?
It can be seen much simpler if you just run:
df_mock.apply(id, axis=1)
It will output the same id for all cases
0 139938239801360
1 139938239801360
2 139938239801360
dtype: int64