How to retain rows from multiple dataframes with common column value?

Question

How to retain rows from multiple dataframes with common column value?

617 Views Asked by R_Dax At 17 August 2025 at 10:56

Suppose I have multiple dataframes:

print (df1)
                datetime    A
0       2012-08-14 07:00    1
1       2012-08-14 07:01    2
2       2012-08-14 08:15    3
                     ...  ...
192908  2013-08-14 16:00  600
192948  2013-08-14 16:15  700
192949  2013-08-14 16:57  900

print (df2)
               datetime    B    
0      2012-08-14 07:00  100
1      2012-08-14 07:15  200
2      2012-08-14 07:30  300
                    ...  ...
12140  2013-09-24 15:45   50
12141  2013-09-24 16:00   60
12142  2013-09-24 16:15   70

How do I create a new df containing only the rows for which there is a value in columns A and B at the same datetime? I tried using the isin function:

df1 = df1[df1['date'].isin(df2['date'])]

but this only does a one-way check, i.e. only values of A for which there exists a value of B at the same datetime are retained, but if there are extra values in B for datetimes that do not exist in A then these are left in df2.

I can repeat the operation in the opposite direction to resolve this:

df2 = df2[df2['date'].isin(df1['date'])]

but for >2 dataframes (I have around fifty in my present work) this becomes extremely long and inefficient because it would be necessary to go through every possible paired combination between the full set of dataframes. For example, a third dataframe, df3, would first need to be checked against df1 and df2, but if it contained datetimes that exist neither in df1 nor df2 then df1 and df2 would in turn need to be re-checked back against df3.

Desired output is to have re-defined all of the dataframes such that they contain only values of A, B, etc. with a matching datetime value.

Original Q&A

There are 1 best solutions below

**Rob Raymond** · Accepted Answer

This is a join / merge operation. Standard Codd relational theory/algebra.

import io
df1 = pd.read_csv(io.StringIO("""                datetime    A
0       2012-08-14 07:00    1
1       2012-08-14 07:01    2
2       2012-08-14 08:15    3
192908  2013-08-14 16:00  600
192948  2013-08-14 16:15  700
192949  2013-08-14 16:57  900"""), sep="\s\s+", engine="python")

df2 = pd.read_csv(io.StringIO("""               datetime    B    
0      2012-08-14 07:00  100
1      2012-08-14 07:15  200
2      2012-08-14 07:30  300
12140  2013-09-24 15:45   50
12141  2013-09-24 16:00   60
12142  2013-09-24 16:15   70"""), sep="\s\s+", engine="python")

pd.merge(df1,df2, on="datetime", how="inner")

output

    datetime    A   B
0   2012-08-14 07:00    1   100

want to merge many data frames

import io, random, functools

# generate a list of dataframes for merge... start with two sample ones
dfs = [df1, df2]
# generate longer list of dataframes, rename columns to add some interest for merge :-)
dfs = [dfs[random.randint(0, len(dfs)-1)].pipe(lambda d: d.rename(columns={d.columns[1]:f"{d.columns[1]}_{i}"})) for i in range(8)]

# and one line merge the whole list of dataframes
functools.reduce(lambda left,right: pd.merge(left,right,on='datetime'), dfs)

	datetime	A_0	B_1	B_2	B_3	A_4	B_5	B_6	A_7
0	2012-08-14 07:00	1	100	100	100	1	100	100	1

How to retain rows from multiple dataframes with common column value?

There are 1 best solutions below

output

want to merge many data frames

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in FILTER

Related Questions in ISIN

Trending Questions

Popular # Hahtags

Popular Questions