I'd like to count the size of groups after grouping using groupby()
, i.e. the number of occurrences of some value. Using pandas this can be done using GroupBy.size()
:
>>> pd.DataFrame({'my_column': [1, 1, 1, 2, 2, 3]}).groupby(by='my_column').size()
my_column
1 3
2 2
3 1
dtype: int64
Numpy supports something similar using np.unique()
:
>>> np.unique([1, 1, 1, 2, 2, 3], return_counts=True)[1]
array([3, 2, 1])
Using xarray I can find only very awkward ways to achieve the same, e.g. converting the DataArray object to a Pandas DataFrame:
>>> d = xr.DataArray([1, 1, 1, 2, 2, 3], name='my_column')
>>> d.to_dataframe().groupby(by='my_column').size()
my_column
1 3
2 2
3 1
dtype: int64
...or do very unreadable things like:
>>> xr.ones_like(d).groupby(d).sum(dim='dim_0')
<xarray.DataArray 'my_column' (my_column: 3)>
array([3, 2, 1])
Coordinates:
* my_column (my_column) int64 1 2 3
Is there a better way to get a reduced DataArray
object with correct coordinates and dimensions? Is there reason for not introducing a DataArrayGroupBy.size()
method similar to Pandas?
(I was using xarray version 0.15.0 when writing this question.)
The answer here is to use
GroupBy.count()
: