I have two separate lists each of n
elements, with one being ID numbers and the second being pandas dataframes. I will define them as id
and dfs
. The dataframes in dfs
have the same format with columns A
, B
, and C
but with different numerical values. I have zipped the two lists together as such:
df_groups = list(zip(id, dfs))
With this list I am trying to locate any instances where id
is the same and then add columns A
and B
together for those dataframes and merge into one dataframe. For an example, I will use the following:
id = ['a','b','c','a','d']
The corresponding dataframes I have may look as such:
dfs[0]
A B C
0 0 1
0 0 1
dfs[1]
A B C
0 1 1
0 1 1
dfs[2]
A B C
1 1 1
1 2 1
dfs[3]
A B C
5 6 1
11 8 1
dfs[4]
A B C
3 5 2
3 18 2
Then, as can be seen above, id[0]
is the same as id[3]
. As such, I want to create a new list of tuples such that dfs[0]['A']
and dfs[3]['A']
are added together (similarly also for column B
and the duplicate id
value is dropped.
Thus, it should look like this:
id = ['a','b','c','d']
dfs[0]
A B C
5 6 1
11 8 1
dfs[1]
A B C
0 1 1
0 1 1
dfs[2]
A B C
1 1 1
1 2 1
dfs[3]
A B C
3 5 2
3 18 2
The following worked for removing the duplicate values of id
but I am not quite sure how to go about the column operations on dfs
. I will of course need to add the columns A
and B
first before running the below:
from itertools import groupby
df_groups_b = ([next(b) for a, b in groupby(df_groups, lambda x: x[0])])
Any assistance would be much appreciated, thank you!
Edit: to clarify, column C from the original dataframe would be retained as is. In the case where the first tuple elements match, column C from the corresponding dataframes will be identical.
You can write a custom summarising function to go through all dataframes in a group and return a sum.
I don't entirely love this solution, because in converts
C
to float, but you can play with it further if needed:Output:
UPD: just noticed your update
You can then do this instead and the column types will be unchanged
UPD2: Returning group_id and unified dataframe together as a tuple.