I have a dataset of one variable with the dimension of time, lat, Lon. The dataset looks like the following and it has several grids where there is NaN values:
<xarray.Dataset>
Dimensions: (time = 6300, latitude: 300, longitude: 360)
Coordinates:
* latitude (latitude) float64 49.62 49.88 50.12 50.38 ... 70.88 71.12 71.38
* longitude (longitude) float64 -9.875 -9.625 -9.375 ... 39.38 39.62 39.88
* time (time) datetime64[ns] 1950-06-01 1950-06-02 ... 2018-08-31
Data variables:
precip (time, latitude, longitude) float32 dask.array<chunksize=(6300, 300, 360)
I want to calculate the percentile of each value within the dataset. The desired xarray dataset will be like the following. Please note that I will be calculating percentile along time axis (for each pixels, want to use the timeseries to of the corresponding pixel during calculation of the percentile):
<xarray.Dataset>
Dimensions: (time = 6300, latitude: 300, longitude: 360)
Coordinates:
* latitude (latitude) float64 49.62 49.88 50.12 50.38 ... 70.88 71.12 71.38
* longitude (longitude) float64 -9.875 -9.625 -9.375 ... 39.38 39.62 39.88
* time (time) datetime64[ns] 1950-06-01 1950-06-02 ... 2018-08-31
Data variables:
precip_percentile (time, latitude, longitude) float32 dask.array<chunksize=(6300, 300, 360)
I did some exploring and I am using the following code to calculate the percentile using xarray.ufunc:
def percentileofscore_weak(x):
return stats.percentileofscore(x, x, kind='rank')
# Apply percentileofscore_weak along the time axis using apply_ufunc
percentiles = xr.apply_ufunc(
percentileofscore_weak,
mean_month,
input_core_dims=[['time']],
output_core_dims=[[]],
dask='parallelized', # Enable parallelization for large datasets
dask_gufunc_kwargs={'allow_rechunk': False}
)
The above code generates a percentile like following:
xarray.DataArray 'percentile' lat: 300, lon: 360. That means it took the time dimension away.
How can I calculate percentile for individual values using each grid's corresponding timeseries and generate an xarray dataset like following:
<xarray.Dataset>
Dimensions: (time = 6300, latitude: 300, longitude: 360)
Coordinates:
* latitude (latitude) float64 49.62 49.88 50.12 50.38 ... 70.88 71.12 71.38
* longitude (longitude) float64 -9.875 -9.625 -9.375 ... 39.38 39.62 39.88
* time (time) datetime64[ns] 1950-06-01 1950-06-02 ... 2018-08-31
Data variables:
precip_percentile (time, latitude, longitude) float32 dask.array<chunksize=(6300, 300, 360)