Pandas data frame processing limit?

23 Views Asked by At

I am setting up a data set in pandas which involves concatenating and then merging tables together. This is all fine until I then come to aggregate the data frame in order to show the sum of columns for distinct values. My desired output contains 27 aggregated columns and 2 summed columns.

data.groupby(["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "aa"]).agg({"1":'sum', "2":'sum'}) 

This returns an empty data frame - all column headers are displayed with no data beneath. I don't believe this is a display issue - when I run "pd.options.display.max_columns = 60" the same still happens and when I run "data.empty" the result is "True".

I have found, however, that when I try grouping with less columns it returns the desired output (filled data frame). I have tried this using all available columns so I can be sure that it isn't an intrinsic problem with the column values:

data.groupby(["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v"]).agg({"1":'sum', "2":'sum'}) #- **WORKS**
data.groupby(["w", "x", "y", "z", "aa"]).agg({"1":'sum', "2":'sum'}) #- **WORKS**

The problem only arises when I try to run all columns together. In short : Is there a limit to processing aggregated pandas data frames? If so is there a way to bypass this limit?

Any help is greatly appreciated :)

I have tried aggregating the entire data frame and the result is empty.

1

There are 1 best solutions below

1
Karan Shishoo On

So there is nothing wrong with your code implementation per say, just that it is likely that when creating groups with so many columns NA values exist in each group, and the GroupBy command ignores any groups with NA. Since this cannot be confirmed without looking at your data, it is something you will need to verify.

This specific issue is covered in detail in this stackoverflow question