I have two separate lists each of n elements, with one being ID numbers and the second being pandas dataframes. I will define them as id and dfs. The dataframes in dfs have the same format with columns A, B, and C but with different numerical values. I have zipped the two lists together as such:
df_groups = list(zip(id, dfs))
With this list I am trying to locate any instances where id is the same and then add columns A and B together for those dataframes and merge into one dataframe. For an example, I will use the following:
id = ['a','b','c','a','d']
The corresponding dataframes I have may look as such:
dfs[0]
A  B  C
0  0  1
0  0  1
dfs[1]
A  B  C
0  1  1
0  1  1
dfs[2]
A  B  C
1  1  1
1  2  1
dfs[3]
A  B  C
5  6  1
11 8  1
dfs[4]
A  B  C
3  5  2
3  18 2
Then, as can be seen above, id[0] is the same as id[3]. As such, I want to create a new list of tuples such that dfs[0]['A'] and dfs[3]['A'] are added together (similarly also for column B and the duplicate id value is dropped.
Thus, it should look like this:
id = ['a','b','c','d']
dfs[0]
A  B  C
5  6  1
11 8  1
dfs[1]
A  B  C
0  1  1
0  1  1
dfs[2]
A  B  C
1  1  1
1  2  1
dfs[3]
A  B  C
3  5  2
3  18 2
The following worked for removing the duplicate values of id but I am not quite sure how to go about the column operations on dfs. I will of course need to add the columns A and B first before running the below:
from itertools import groupby
df_groups_b = ([next(b) for a, b in groupby(df_groups, lambda x: x[0])])
Any assistance would be much appreciated, thank you!
Edit: to clarify, column C from the original dataframe would be retained as is. In the case where the first tuple elements match, column C from the corresponding dataframes will be identical.
                        
You can write a custom summarising function to go through all dataframes in a group and return a sum.
I don't entirely love this solution, because in converts
Cto float, but you can play with it further if needed:Output:
UPD: just noticed your update
You can then do this instead and the column types will be unchanged
UPD2: Returning group_id and unified dataframe together as a tuple.