How can I efficiently calculate the first instance of a value in an axis in Dask/xarray?

671 Views Asked by At

I have some input atmospheric model data of total water in each grid box. I'm trying to calculate cloud top height from this input data; so for each column I need to find the highest instance where this input data is greater than a threshold.

My input data is 100 x 900 x 900 for nz x ny x nx. My data is loaded into xarray via dask with chunks of 100 x 50 x 50. Traditionally, I would do this like so:

cloud_top_height = numpy.zeros((900,900)
for x in range(0, nx):
  for y in range(0, ny):
    cloud_top_found = false
    for z in range(nz, 0, -1):
      if cloud_val > threshold:
        cloud_top_height[x,y] = z
        cloud_top_found = true
    if not cloud_top_found:
        cloud_top_height = np.nan

However, this is really inefficient with dask/numpy/xarray. I've struggled to find a replacement, though. I've seen various suggestions that I use argmax with 3D boolean indexing, but I think that will give me the opposite of what I want, and xarray doesn't support 3D boolean indexing anyway.

What is the best way to calculate the largest index in an axis with a value greater than a threshold using xarray/dask?

1

There are 1 best solutions below

0
On

How about:

In [2]: da = xr.DataArray(np.random.rand(5,5,5), 
                          dims=list('abc'), 
                          coords=dict(c=range(5)))

In [3]: (
    ...:     da
    ...:     .where(lambda x: x>0.8)
    ...:     .idxmax(dim='c')
    ...: )
Out[3]:
<xarray.DataArray 'c' (a: 5, b: 5)>
array([[ 4.,  2.,  1.,  1.,  1.],
       [nan,  1., nan,  0., nan],
       [ 1.,  1.,  2., nan,  1.],
       [nan, nan,  2.,  1.,  2.],
       [ 2.,  0., nan,  2.,  1.]])
Dimensions without coordinates: a, b

If that's not it, could you post a reproducible example?