I am looking for a clear and efficient way to append rows to a DataFrame in-place within a function in pandas. Here's what I got so far:
pd.concat
Approach (Not In-Place):def append_rows_concat(df, new_rows): return pd.concat([df, new_rows], ignore_index=True)
- iirc this creates a copy of the DataFrame, requiring reassignment. Not truly in-place/by-reference.
.loc
Approach (In-Place, but Clunky):def append_rows_loc(df, new_rows): start_index = len(df) for i, row in new_rows.iterrows(): df.loc[start_index + i] = row
- Modifies DataFrame directly (in-place/by-reference), avoids copies
- Less readable, maybe less efficient for large DFs due to row-by-row iteration
Question
As per test code below, both options work as expected. Is there an elegant and reliable approach (like 1.), that is efficient, and true in-place modification (like 2.)? Aiming for direct modification and avoiding copies as with pd.concat
, mostly to learn how it's done. Specifically, I want to keep various functions consistent in that that they modify the main df by-reference.
How to append efficiently has been discussed before e.g. here, however, pd.append
has been discontinued, and I really want to understand how to do this without re-assignment, even practically the df could be easily copied.
The question is how to do it inside a function (and by-reference, without reassignment of df via return df
), not so much how it would be done in main.
python version: 3.11.7
pandas version: 2.1.4
Test code
Setup
import pandas as pd
# some sample DFs
df = pd.DataFrame({
'col1': [1, 2, 3],
'col2': ['a', 'b', 'c'],
'col3': [4, 5, 6]})
df_update = pd.DataFrame({
'col1': [11, 22],
'col2': ['aa', 'bb']})
print(df)
print(df_update)
col1 col2 col3
0 1 a 4
1 2 b 5
2 3 c 6
col1 col2
0 11 aa
1 22 bb
pd.concat
Approach
print(append_rows_concat(df, df_update))
print(df)
col1 col2 col3
0 1 a 4.0
1 2 b 5.0
2 3 c 6.0
3 11 aa NaN
4 22 bb NaN
col1 col2 col3
0 1 a 4
1 2 b 5
2 3 c 6
.loc
Approach
append_rows_loc(df, df_update) # Modifies df in place
print(df)
col1 col2 col3
0 1 a 4.0
1 2 b 5.0
2 3 c 6.0
3 11 aa NaN
4 22 bb NaN