Python Pandas: Find the difference of two dataframe

54 Views Asked by warem At 17 March 2024 at 23:25

import pandas as pd

a = [0.0, 0.6900000000000001, 1.3800000000000001, 1.3800000000000001, 2.0700000000000003, 2.7600000000000002, 2.7600000000000002, 3.45, 4.1262, 4.140000000000001, 4.140000000000001, 4.83, 5.5200000000000005, 5.5200000000000005, 5.9754000000000005, 6.4308000000000005, 6.6654, 6.9]
b = [0.0, 0.6900000000000001, 1.3800000000000001, 1.3800000000000001, 2.0700000000000003, 2.7600000000000002, 2.7600000000000002, 3.45, 4.1262, 4.140000000000001, 4.140000000000001, 4.83, 5.5200000000000005, 5.5200000000000005, 5.9754000000000005, 6.4308000000000005, 6.4308000000000005, 6.6654, 6.9]
df1 = pd.DataFrame(a, columns=['x(m)'])
df2 = pd.DataFrame(b, columns=['x(m)'])

df1:

df2:

The difference of the two dataframes is row 16 in df2 is not in df1. How can I get the difference? The output can be the row index 16 plus the value 6.4308 , or a combined dataframe with the third column specifying not in df1 or anything similar. Thanks.

Original Q&A

There are 1 best solutions below

Scott Boston On 18 March 2024 at 01:06 BEST ANSWER

IIUC, do you want something like this?

df1['seq'] = df1.groupby('x(m)').cumcount()
df2['seq'] = df2.groupby('x(m)').cumcount()
df1.merge(df2, how="outer", indicator="in_df")\
   .sort_values("x(m)")\
   .replace({"right_only": "not in df1", "left_only": "not in df2"})

Output:

      x(m)  seq       in_df
0   0.0000    0        both
1   0.6900    0        both
2   1.3800    0        both
3   1.3800    1        both
4   2.0700    0        both
5   2.7600    0        both
6   2.7600    1        both
7   3.4500    0        both
8   4.1262    0        both
10  4.1400    1        both
9   4.1400    0        both
11  4.8300    0        both
12  5.5200    0        both
13  5.5200    1        both
14  5.9754    0        both
15  6.4308    0        both
18  6.4308    1  not in df1
16  6.6654    0        both
17  6.9000    0        both

Note: pd.DataFrame.merge without the on defined will merge dataframes on all identical columns in the dataframe. If you want to keep the indexing in both dataframes, reset_index first.

And, if you only one the difference then add .query('in_df != "both"')

Python Pandas: Find the difference of two dataframe

There are 1 best solutions below

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in COMPARE

Related Questions in DIFFERENCE

Trending Questions

Popular # Hahtags

Popular Questions