Append DataFrame Rows In-Place Within a Function in Pandas: Elegant Solutions?

150 Views Asked by At

I am looking for a clear and efficient way to append rows to a DataFrame in-place within a function in pandas. Here's what I got so far:

  1. pd.concat Approach (Not In-Place):

    def append_rows_concat(df, new_rows):
        return pd.concat([df, new_rows], ignore_index=True)
    
    • iirc this creates a copy of the DataFrame, requiring reassignment. Not truly in-place/by-reference.
  2. .loc Approach (In-Place, but Clunky):

    def append_rows_loc(df, new_rows):
        start_index = len(df)
        for i, row in new_rows.iterrows():
            df.loc[start_index + i] = row
    
    • Modifies DataFrame directly (in-place/by-reference), avoids copies
    • Less readable, maybe less efficient for large DFs due to row-by-row iteration

Question

As per test code below, both options work as expected. Is there an elegant and reliable approach (like 1.), that is efficient, and true in-place modification (like 2.)? Aiming for direct modification and avoiding copies as with pd.concat, mostly to learn how it's done. Specifically, I want to keep various functions consistent in that that they modify the main df by-reference.

How to append efficiently has been discussed before e.g. here, however, pd.append has been discontinued, and I really want to understand how to do this without re-assignment, even practically the df could be easily copied.

The question is how to do it inside a function (and by-reference, without reassignment of df via return df), not so much how it would be done in main.

python version: 3.11.7

pandas version: 2.1.4

Test code

Setup

import pandas as pd

# some sample DFs
df = pd.DataFrame({
    'col1': [1, 2, 3],
    'col2': ['a', 'b', 'c'],
    'col3': [4, 5, 6]})
df_update = pd.DataFrame({
    'col1': [11, 22],
    'col2': ['aa', 'bb']})
print(df)
print(df_update)
   col1 col2  col3
0     1    a     4
1     2    b     5
2     3    c     6
   col1 col2
0    11   aa
1    22   bb

pd.concat Approach

print(append_rows_concat(df, df_update))
print(df)
   col1 col2  col3
0     1    a   4.0
1     2    b   5.0
2     3    c   6.0
3    11   aa   NaN
4    22   bb   NaN
   col1 col2  col3
0     1    a     4
1     2    b     5
2     3    c     6

.loc Approach

append_rows_loc(df, df_update)  # Modifies df in place
print(df)
   col1 col2  col3
0     1    a   4.0
1     2    b   5.0
2     3    c   6.0
3    11   aa   NaN
4    22   bb   NaN
0

There are 0 best solutions below