Access and work with dimension label within xr.apply_ufunc

132 Views Asked by At

I was wondering whether there is a way to access and work with dimension labels when using the apply_ufunc function of xarray and @guvectorize of numba. From my understanding, having labeled dimensions is one of the advantages of xarray but I wasn't really able to use them within the functions.

In my case, I want to store valid values to fill NaN values along the dimension time for a certain max. amount of days, so the most convenient way would be using the xarray function ffill for propagating the values forward. But I also want to use the timestamp labels of the dimension time to keep track of the original date of the value that is being propagated. I was able doing that by iterating over each cell of my 3-d array (x,y,time), but performance-wise this approach is not working for my large dataset which is why I turned to apply_ufunc.

Some dummy data:

values = np.random.rand(2,2,10)*10
values.ravel()[np.random.choice(values.size, 30, replace=False)] = np.nan
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
ds = xr.Dataset({"value_day": (["x", "y", "time"], values),},
    coords={"lon": (["x", "y"], lon),"lat": (["x", "y"], lat), "time": pd.date_range("2018-01-01", periods=10),},)

<xarray.Dataset>
Dimensions:    (x: 2, y: 2, time: 10)
Coordinates:
    lon        (x, y) float64 -99.83 -99.32 -99.79 -99.23
    lat        (x, y) float64 42.25 42.21 42.63 42.59
  * time       (time) datetime64[ns] 2018-01-01 2018-01-02 ... 2018-01-10
Dimensions without coordinates: x, y
Data variables:
    value_day  (x, y, time) float64 nan nan 7.261 nan ... nan nan 7.411 nan

I have a working function for propagating the values, but I don't know, if possible at all, how to access the time label and return it. I commented it where I would use the time label. It would also be fine having two functions (value propagating + time propagating) and use a wrapper afterwards.

@guvectorize(
    "(float64[:], float64[:])",
    "(n) -> (n)",
    nopython=True,
)
def bookkeeping(ds_xy, out): #(ds_xy, ds_xy_time, out)
    limit=5
    #variables for first iteration
    stored_value = ds_xy[0]  
    #stored_value_time = ds_xy_time[0]
    
    ds_days_collect = []
    #ds_days_time_collect = []
    
    counter_stored_days = 0
    #iterate over each date of timeseries
    for value_current in ds_xy:    
    #for value_current,time_current in zip(ds_xy,ds_xy_time):                            
        if np.isnan(value_current) == False:
            #overwrite stored value and corresponding date if new value for pixel
            stored_value = value_current
            #stored_value_time = time_current
        else:
            #keep stored value for bookkeeping up to 3 days
            if counter_stored_days > limit:
                #use nan value of day
                stored_value = value_current
                #stored_value_time = time_current
                counter_stored_days = 0
            else:              
                counter_stored_days += 1

        ds_days_collect.append(stored_value) 
        #ds_days_time_collect.append(stored_value_time)
        
    out[:] = np.asarray(ds_days_collect) #,ds_days_time_collect
    
def bookkeeping_ufunc(ds):
    return xr.apply_ufunc(
    bookkeeping,
    ds,
    input_core_dims= [["time"]],
    output_core_dims=[["time"]]       
) 

bookkeeping_ufunc(ds)
0

There are 0 best solutions below