My hypothetical dataframe is
df = pd.DataFrame({'col1':[91,91,91,91,92,92,92,92],
'col2':[91,92]*4,'value':[10]*8})
df
col1 col2 value
0 91 91 10
1 91 92 10
2 91 91 10
3 91 92 10
4 92 91 10
5 92 92 10
6 92 91 10
7 92 92 10
Grouping over the two columns produces those groups:
grouped = df.groupby(['col1','col2'])
grouped.groups
{(91, 91): Int64Index([0, 2], dtype='int64'),
(91, 92): Int64Index([1, 3], dtype='int64'),
(92, 91): Int64Index([4, 6], dtype='int64'),
(92, 92): Int64Index([5, 7], dtype='int64')}
I want to expand this set of groups, so that I can aggregate over the extended selection of groups.
Let’s say I want to add groups produced by
groupedall = df.groupby(['col1'])
groupedall.groups
{91: Int64Index([0, 1, 2, 3], dtype='int64'),
92: Int64Index([4, 5, 6, 7], dtype='int64')}
This is what I try: I substitute 99 for col2 value (where 99 will mean ‘any’),
groupedall.groups[(91, 99)] = groupedall.groups.pop(91)
groupedall.groups[(92, 99)] = groupedall.groups.pop(92)
And then add those new groups to my original group dict.
grouped.groups.update(groupedall.groups)
grouped.groups
{(91, 91): Int64Index([0, 2], dtype='int64'),
(91, 92): Int64Index([1, 3], dtype='int64'),
(91, 99): Int64Index([0, 1, 2, 3], dtype='int64'),
(92, 91): Int64Index([4, 6], dtype='int64'),
(92, 92): Int64Index([5, 7], dtype='int64'),
(92, 99): Int64Index([4, 5, 6, 7], dtype='int64')}
But when I try to group over the grouped object, those newly added groups are omitted.
grouped.sum()
value
col1 col2
91 91 20
92 20
92 91 20
92 20
And I want the output to include groups that I’ve just added:
value
col1 col2
91 91 20
92 20
99 40
92 91 20
92 20
99 40
What am I missing here?
Option 1
Option 2