I can't calculate parcel_profile (1D function) on an xarray dataset (segmented with dask) in 4D.
Hello,
I really need help, I'm working on ERA5 hourly data on pressure levels. I've extracted relative humidity and temperature on several atmospheric pressure levels. I use Metpy's 'dewpoint_from_relative_humidity' function to calculate dewpoint. The table was quite large, so I use dask to create several chunks. I have the following table:
import numpy as np
import metpy as mp
import metpy.calc as mpcalc
import xarray as xr
import metpy.units as mpunit
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
ncin_1 = xr.open_dataset(ncfname_1, engine='netcdf4').chunk('auto')
ncin_1['t'] = ncin_1['t'] - 273.15
ncin_1['r'] = ncin_1['r'] / 100.0
ncin_1['r'] = ncin_1['r'].clip(min=0.01, max=1)
# variable reprocessing with metpy unit registry
ncin_1['level'].attrs['units'] = 'hPa'
ncin_1 = ncin_1.metpy.quantify()
ncin_1['time'] = ncin_1['time'].metpy.time
ncin_1['latitude'] = ncin_1['latitude'].metpy.latitude
ncin_1['longitude'] = ncin_1['longitude'].metpy.longitude
ncin_1['level'] = ncin_1['level'].metpy.vertical
ncin_1['t'] = ncin_1['t'] * mpunit.units.degC
ncin_1['dewpoint'] = ncin_1['dewpoint'] * mpunit.units.degC
ncin_1['r'] = ncin_1['r'] * mpunit.units.dimensionless
# pressures levels in descending order
ncin_1= ncin_1.where((ncin_1['level'] >= 100) & (test['level'] <= 1000), drop=True)
ncin_1= ncin_1.sortby('level', ascending=False)
# create a profile variable to be filled in
ncin_1['profil'] = (('time', 'level', 'latitude', 'longitude'),np.full_like(ncin_1['t'], fill_value=np.nan))
<xarray.Dataset>
Dimensions: (longitude: 60, latitude: 41, level: 21, time: 24836)
Coordinates:
* longitude (longitude) float32 -5.02 -4.77 -4.52 -4.27 ... 9.23 9.48 9.73
* latitude (latitude) float32 51.15 50.9 50.65 50.4 ... 41.65 41.4 41.15
* level (level) int32 20 50 100 150 200 250 ... 750 800 850 900 950 1000
* time (time) datetime64[ns] 2006-01-01 ... 2022-12-31T18:00:00
Data variables:
r (time, level, latitude, longitude) float32 0.03323 ... 0.7702
t (time, level, latitude, longitude) float32 -69.07 ... 14.25
dewpoint (time, level, latitude, longitude) float32 -90.23 ... 10.28
profil (time, level, latitude, longitude) float32 nan nan ... nan nan
Attributes:
Conventions: CF-1.6
history: 2023-07-19 22:32:41 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...
I'd like to calculate the lifted index (using metpy's lifted_index function), but first I need to calculate the parcel_profile variable. The problem is that this function is a 1D function, according to the documentation. I've made several scripts using either xarray.apply_ufunc or xarray.map_blocks.
- whit xarray.apply_ufunc :
def wrapper_parcel_profile(pressure, temperature, dewpoint):
return mpcalc.parcel_profile(pressure * units.hPa , temperature * units.degC , dewpoint * units.degC ).to('degC')
t_1000 = ncin_1['t'].metpy.sel(level=1000)
dewpoint_1000 = ncin_1['dewpoint'].metpy.sel(level=1000)
pressure = ncin_1['level']
ncin_1['profil'] = xr.apply_ufunc(
wrapper_parcel_profile,
pressure, t_1000, dewpoint_1000,
input_core_dims=[['level'], ['time', 'latitude', 'longitude'], ['time', 'latitude', 'longitude']],
output_core_dims=[['time', 'level' , 'latitude', 'longitude']],
vectorize=True,
dask='parallelized',
output_dtypes=[float],
dask_gufunc_kwargs={'allow_rechunk': True}
)
This script runs because I'm using dask, so if I've understood correctly, as long as I don't run the ncin_1.compute() command, nothing is calculated directly. I get this error message:
ValueError: operands could not be broadcast together with shapes (19,) (24836,41,60) (It must have something to do with ncin_1['level']?)
- with xarray.map_block :
def wrapper_parcel_profile(pressure, temperature, dewpoint):
return mpcalc.parcel_profile( pressure * units.hPa , temperature * units.degC , dewpoint * units.degC).to('degC')
pressure = test['level']
t_1000 = test['t'].metpy.sel(level=1000) * units.degC
dewpoint_1000 = test['dewpoint'].metpy.sel(level=1000) * units.degC
test['profil'] = xr.map_blocks(wrapper_parcel_profile, test ,template= test['t'])
I get this error message when I use "ncin_1.compute()" in my database:
TypeError: Mixing chunked array types is not supported, but received multiple types: {, }
Is my approach the right one? Is it possible to do this simply by staying in an xarray dataset? Are the solutions I've found appropriate? Thanks in advance for your help
I think you're on the right path.
I'm not familiar with
parcel_profile, but it looks like it consumes and produces 1D arrays? If so,xr.apply_ufuncwith just one core dimension (presumablylevels) should work.apply_ufuncis a powerful function, with many options. The trick to getting it to work is to start with a very simple case that you understand, then increase the generality until your code can handle your full problem. e.g.One resource that is very helpful is the xarray tutorial documentation on
apply_ufunchere.Other comments:
dask_gufunc_kwargs={'allow_rechunk': True},vectorized=Truefor applying a function that already understands arrays, that option is for scalar operations.