I have some input atmospheric model data of total water in each grid box. I'm trying to calculate cloud top height from this input data; so for each column I need to find the highest instance where this input data is greater than a threshold.
My input data is 100 x 900 x 900
for nz x ny x nx
. My data is loaded into xarray via dask with chunks of 100 x 50 x 50
. Traditionally, I would do this like so:
cloud_top_height = numpy.zeros((900,900)
for x in range(0, nx):
for y in range(0, ny):
cloud_top_found = false
for z in range(nz, 0, -1):
if cloud_val > threshold:
cloud_top_height[x,y] = z
cloud_top_found = true
if not cloud_top_found:
cloud_top_height = np.nan
However, this is really inefficient with dask/numpy/xarray. I've struggled to find a replacement, though. I've seen various suggestions that I use argmax
with 3D boolean indexing, but I think that will give me the opposite of what I want, and xarray doesn't support 3D boolean indexing anyway.
What is the best way to calculate the largest index in an axis with a value greater than a threshold using xarray/dask?
How about:
If that's not it, could you post a reproducible example?