xarray wave period in seconds ingested as timedelta64

918 Views Asked by At

The "units" attribute of a variable measuring ocean wave period is in "seconds". This is not a datetime field, but xarray is automatically ingesting this variable as timedelta64. Since the units are not "seconds since...", I would have assumed that xarray should treat this as a normal float32 data array, but apparently not so. Is there a way for me to tell xarray to ingest the wave period variables as float32 or to convert them from timedelta64 back to their original values after ingest? I still want it to convert the "time" variable to timedelta64, so I don't want to turn off translation for the entire dataset, just for specific variables (Tper, sper, wper).

Here is the base OPeNDAP URL that I am working with from a wave forecast in our TDS server:

http://oos.soest.hawaii.edu/thredds/dodsC/hioos/model/wav/ww3/hawaii/WaveWatch_III_Hawaii_Regional_Wave_Model_best.ncd

Suggestions? Thanks!

You can see the "ncdump"-like output using the OPeNDAP page here:

http://oos.soest.hawaii.edu/thredds/dodsC/hioos/model/wav/ww3/hawaii/WaveWatch_III_Hawaii_Regional_Wave_Model_best.ncd.html

Or you can run ncdump on the OPeNDAP URL like so:

ncdump -h http://oos.soest.hawaii.edu/thredds/dodsC/hioos/model/wav/ww3/hawaii/WaveWatch_III_Hawaii_Regional_Wave_Model_best.ncd

This results in the following:

netcdf WaveWatch_III_Hawaii_Regional_Wave_Model_best {
dimensions:
        lat = 101 ;
        lon = 141 ;
        time = 54453 ;
        z = 1 ;
variables:
        float lon(lon) ;
                lon:units = "degrees_east" ;
                lon:long_name = "longitude" ;
                lon:standard_name = "longitude" ;
                lon:short_name = "lon" ;
                lon:axis = "x" ;
                lon:_CoordinateAxisType = "Lon" ;
        float lat(lat) ;
                lat:units = "degrees_north" ;
                lat:long_name = "latitude" ;
                lat:standard_name = "latitude" ;
                lat:short_name = "lat" ;
                lat:axis = "y" ;
                lat:_CoordinateAxisType = "Lat" ;
        float z(z) ;
                z:units = "meters" ;
                z:long_name = "depth below mean sea level" ;
                z:standard_name = "depth" ;
                z:short_name = "depth" ;
                z:axis = "z" ;
                z:_CoordinateAxisType = "Height" ;
        double time(time) ;
                time:long_name = "Forecast time for ForecastModelRunCollection" ;
                time:standard_name = "time" ;
                time:calendar = "proleptic_gregorian" ;
                time:units = "hours since 2011-06-21 00:00:00.000 UTC" ;
                time:missing_value = NaN ;
                time:_CoordinateAxisType = "Time" ;
        double time_run(time) ;
                time_run:long_name = "run times for coordinate = time" ;
                time_run:standard_name = "forecast_reference_time" ;
                time_run:calendar = "proleptic_gregorian" ;
                time_run:units = "hours since 2011-06-21 00:00:00.000 UTC" ;
                time_run:missing_value = NaN ;
                time_run:_CoordinateAxisType = "RunTime" ;
        double time_offset(time) ;
                time_offset:long_name = "offset hour from start of run for coordinate = time" ;
                time_offset:standard_name = "forecast_period" ;
                time_offset:calendar = "proleptic_gregorian" ;
                time_offset:units = "hours since 2011-06-21T00:00:00Z" ;
                time_offset:missing_value = NaN ;
        float Thgt(time, z, lat, lon) ;
                Thgt:units = "meters" ;
                Thgt:long_name = "significant wave height" ;
                Thgt:standard_name = "sea_surface_wave_significant_height" ;
                Thgt:short_name = "Thgt" ;
                Thgt:valid_range = 0.f, 60.f ;
                Thgt:_FillValue = NaNf ;
                Thgt:coordinates = "time_run time z lat lon " ;
        float Tper(time, z, lat, lon) ;
                Tper:units = "seconds" ;
                Tper:long_name = "peak wave period" ;
                Tper:standard_name = "sea_surface_wave_period_at_variance_spectral_density_maximum" ;
                Tper:short_name = "Tper" ;
                Tper:valid_range = 0.f, 60.f ;
                Tper:_FillValue = NaNf ;
                Tper:coordinates = "time_run time z lat lon " ;
        float Tdir(time, z, lat, lon) ;
                Tdir:units = "degrees" ;
                Tdir:long_name = "peak wave direction" ;
                Tdir:standard_name = "sea_surface_wave_from_direction" ;
                Tdir:short_name = "Tdir" ;
                Tdir:valid_range = 0.f, 360.f ;
                Tdir:_FillValue = NaNf ;
                Tdir:coordinates = "time_run time z lat lon " ;
        float shgt(time, z, lat, lon) ;
                shgt:units = "meters" ;
                shgt:long_name = "swell significant wave height" ;
                shgt:standard_name = "sea_surface_swell_wave_significant_height" ;
                shgt:short_name = "shgt" ;
                shgt:valid_range = 0.f, 60.f ;
                shgt:_FillValue = NaNf ;
                shgt:coordinates = "time_run time z lat lon " ;
        float sper(time, z, lat, lon) ;
                sper:units = "seconds" ;
                sper:long_name = "swell peak wave period" ;
                sper:standard_name = "sea_surface_swell_wave_period" ;
                sper:short_name = "sper" ;
                sper:valid_range = 0.f, 60.f ;
                sper:_FillValue = NaNf ;
                sper:coordinates = "time_run time z lat lon " ;
        float sdir(time, z, lat, lon) ;
                sdir:units = "degrees" ;
                sdir:long_name = "swell peak wave direction" ;
                sdir:standard_name = "sea_surface_swell_wave_from_direction" ;
                sdir:short_name = "sdir" ;
                sdir:valid_range = 0.f, 360.f ;
                sdir:_FillValue = NaNf ;
                sdir:coordinates = "time_run time z lat lon " ;
        float whgt(time, z, lat, lon) ;
                whgt:units = "meters" ;
                whgt:long_name = "wind significant wave height" ;
                whgt:standard_name = "sea_surface_wind_wave_significant_height" ;
                whgt:short_name = "whgt" ;
                whgt:valid_range = 0.f, 60.f ;
                whgt:_FillValue = NaNf ;
                whgt:coordinates = "time_run time z lat lon " ;
        float wper(time, z, lat, lon) ;
                wper:units = "seconds" ;
                wper:long_name = "wind peak wave period" ;
                wper:standard_name = "sea_surface_wind_wave_period" ;
                wper:short_name = "wper" ;
                wper:valid_range = 0.f, 60.f ;
                wper:_FillValue = NaNf ;
                wper:coordinates = "time_run time z lat lon " ;
        float wdir(time, z, lat, lon) ;
                wdir:units = "degrees" ;
                wdir:long_name = "wind peak wave direction" ;
                wdir:standard_name = "sea_surface_wind_wave_from_direction" ;
                wdir:short_name = "wdir" ;
                wdir:valid_range = 0.f, 360.f ;
                wdir:_FillValue = NaNf ;
                wdir:coordinates = "time_run time z lat lon " ;

// global attributes:
                :title = "WaveWatch III (WW3) Hawaii Regional Wave Model" ;
                :_CoordSysBuilder = "ucar.nc2.dataset.conv.CF1Convention" ;
                :Conventions = "CF-1.6, ACDD-1.3" ;
                :cdm_data_type = "Grid" ;
                :featureType = "GRID" ;
                :location = "Proto fmrc:WaveWatch_III_Hawaii_Regional_Wave_Model" ;
                :id = "ww3_hawaii" ;
                :naming_authority = "org.pacioos" ;
                :Metadata_Link = "http://pacioos.org/metadata/ww3_hawaii.html" ;
                :ISO_Topic_Categories = "oceans" ;
                :summary = "Through a collaborative effort with NOAA/NCEP and NWS Honolulu, the University of Hawaii has implemented a global-scale WaveWatch III (WW3) model, which in turn provides boundary conditions for this Hawaii regional WW3: a 7-day model with a 5-day hourly forecast at approximately 5-km or 0.05-deg resolution. The primary purpose of this regional model is to capture island effects such as island shadowing, refraction, and accurate modeling of local wind waves. Hawaii WW3 is forced with winds from the University of Hawaii Meteorology Department\'s operational mesoscale model, which has a more suitable spatial resolution than the global scale wind field. The Hawaii regional WW3 also provides boundary conditions for nearshore island-scale models via Simulating WAves Nearshore (SWAN). While considerable effort has been made to implement all model components in a thorough, correct, and accurate manner, numerous sources of error are possible. As such, please use these data with the caution appropriate for any ocean related activity." ;
                :keywords = "Earth Science Services > Models > Ocean General Circulation Models (OGCM)/Regional Ocean Models, Earth Science Services > Models > Weather Research/Forecast Models, Earth Science > Oceans > Ocean Waves > Significant Wave Height, Earth Science > Oceans > Ocean Waves > Wave Period, Earth Science > Oceans > Ocean Waves > Wave Speed/Direction" ;
                :keywords_vocabulary = "GCMD Science Keywords" ;
                :platform = "Models/Analyses > > Operational Models" ;
                :platform_vocabulary = "GCMD Platform Keywords" ;
                :instrument = "Not Applicable > Not Applicable" ;
                :instrument_vocabulary = "GCMD Instrument Keywords" ;
                :locations = "Continent > North America > United States Of America > Hawaii, Ocean > Pacific Ocean > Central Pacific Ocean > Hawaiian Islands" ;
                :locations_vocabulary = "GCMD Location Keywords" ;
                :standard_name_vocabulary = "CF Standard Name Table v39" ;
                :comment = "Model runs produced by Dr. Kwok Fai Cheung ([email protected])." ;
                :geospatial_lat_min = 18. ;
                :geospatial_lat_max = 23. ;
                :geospatial_lon_min = 199. ;
                :geospatial_lon_max = 206. ;
                :geospatial_vertical_min = 0. ;
                :geospatial_vertical_max = 0. ;
                :geospatial_bounds = "POLYGON ((18 -161.0, 23 -161.0, 23 -154.0, 18 -154.0, 18 -161.0))" ;
                :geospatial_bounds_crs = "EPSG:4326" ;
                :time_coverage_start = "2011-06-21T21:00:00Z" ;
                :geospatial_lat_units = "degrees_north" ;
                :geospatial_lat_resolution = 0.05 ;
                :geospatial_lon_units = "degrees_east" ;
                :geospatial_lon_resolution = 0.05 ;
                :geospatial_vertical_units = "meters" ;
                :geospatial_vertical_positive = "up" ;
                :geospatial_vertical_resolution = 0. ;
                :time_coverage_resolution = "PT1H" ;
                :creator_email = "[email protected]" ;
                :creator_name = "Kwok Fai Cheung" ;
                :creator_type = "person" ;
                :creator_url = "http://www.ore.hawaii.edu/OE/cheung_research.htm" ;
                :creator_institution = "University of Hawaii" ;
                :date_created = "2011-06-22" ;
                :date_issued = "2011-06-22" ;
                :date_modified = "2014-06-23" ;
                :date_metadata_modified = "2017-01-30" ;
                :institution = "University of Hawaii" ;
                :project = "Pacific Islands Ocean Observing System (PacIOOS)" ;
                :program = "Pacific Islands Ocean Observing System (PacIOOS)" ;
                :contributor_name = "Jim Potemra" ;
                :contributor_role = "distributor" ;
                :publisher_email = "[email protected]" ;
                :publisher_name = "Pacific Islands Ocean Observing System (PacIOOS)" ;
                :publisher_url = "http://pacioos.org" ;
                :publisher_institution = "University of Hawaii" ;
                :publisher_type = "group" ;
                :license = "The data may be used and redistributed for free but is not intended for legal use, since it may contain inaccuracies. Neither the data Contributor, University of Hawaii, PacIOOS, NOAA, State of Hawaii nor the United States Government, nor any of their employees or contractors, makes any warranty, express or implied, including warranties of merchantability and fitness for a particular purpose, or assumes any legal liability for the accuracy, completeness, or usefulness, of this information." ;
                :acknowledgement = "The Pacific Islands Ocean Observing System (PacIOOS) is funded through the National Oceanic and Atmospheric Administration (NOAA) as a Regional Association within the U.S. Integrated Ocean Observing System (IOOS). PacIOOS is coordinated by the University of Hawaii School of Ocean and Earth Science and Technology (SOEST)." ;
                :source = "WaveWatch III (WW3) numerical wave model" ;
                :references = "http://pacioos.org/waves/model-hawaii/, http://polar.ncep.noaa.gov/waves/wavewatch/" ;
                :history = "FMRC Best Dataset" ;
2

There are 2 best solutions below

5
On

Would you mind showing the output of ncdump -h file.nc if possible?

I did a quick check of reading in some ocean wave period data of a WaveWatchIII experiment I ran and everything looked fine.

ncdump -h ww3.20140101_20141231.nc
netcdf ww3.20140101_20141231 {
dimensions:
    longitude = 46 ;
    latitude = 17 ;
    time = UNLIMITED ; // (365 currently)
variables:
    float longitude(longitude) ;
            longitude:units = "degree_east" ;
            longitude:long_name = "longitude" ;
            longitude:standard_name = "longitude" ;
            longitude:valid_min = -180.f ;
            longitude:valid_max = 180.f ;
            longitude:axis = "X" ;
    float latitude(latitude) ;
            latitude:units = "degree_north" ;
            latitude:long_name = "latitude" ;
            latitude:standard_name = "latitude" ;
            latitude:valid_min = -90.f ;
            latitude:valid_max = 90.f ;
            latitude:axis = "Y" ;
    double time(time) ;
            time:long_name = "julian day (UT)" ;
            time:standard_name = "time" ;
            time:units = "days since 1850-01-01T00:00:00Z" ;
            time:conventions = "relative julian days with decimal part (as parts of the day )" ;
            time:axis = "T" ;
...
    short t01(time, latitude, longitude) ;
            t01:long_name = "mean period T01" ;
            t01:standard_name = "sea_surface_wind_wave_mean_period_from_variance_spectral_density_first_frequency_moment" ;
            t01:globwave_name = "mean_period_t01" ;
            t01:units = "s" ;
            t01:_FillValue = -32767s ;
            t01:scale_factor = 0.01f ;
            t01:add_offset = 0.f ;
            t01:valid_min = 0 ;
            t01:valid_max = 5000 ;

...

>>> import xarray as xr
>>> ds = xr.open_dataset('ww3.20140101_20141231.nc')
>>> da = ds['t01']
>>> da
<xarray.DataArray 't01' (time: 365, latitude: 17, longitude: 46)>
[285430 values with dtype=float64]
Coordinates:
  * longitude  (longitude) float32 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 ...
  * latitude   (latitude) float32 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0 ...
  * time       (time) datetime64[ns] 2014-01-01 2014-01-02 2014-01-03 
...
Attributes:
    long_name:      mean period T01
    standard_name:  
sea_surface_wind_wave_mean_period_from_variance_spectral_...
globwave_name:  mean_period_t01
units:          s
valid_min:      0
valid_max:      5000    
>>> da[10,10,10]
<xarray.DataArray 't01' ()>
array(2.9799999333918095)
Coordinates:
    longitude  float32 5.0
    latitude   float32 40.0
    time       datetime64[ns] 2014-01-11
Attributes:
    long_name:      mean period T01
    standard_name:  sea_surface_wind_wave_mean_period_from_variance_spectral_...
    globwave_name:  mean_period_t01
    units:          s
    valid_min:      0
    valid_max:      5000

The only thing I remember doing differently is I changed L2374 in WW3v4.18/ftn/ww3_ounf.ftn from Jday0=julday(1,1,1990) to Jday0=julday(1,1,1850) and similar lines in WW3v4.18/ftn/ww3_ounp.ftn as i'm doing runs before 1990 and didn't want to get weird negative times.

I also did a quick check on reading in some (climate model) data using OPeNDAP:

>>> import xarray as xr
>>> remote_data = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.RSMAS/.CCSM4/.forecast/.ts/dods')
    <xarray.Dataset>
Dimensions:  (L: 45, M: 9, S: 15, X: 360, Y: 181)
Coordinates:
  * S        (S) datetime64[ns] 2017-06-25 2017-07-02 2017-07-09 2017-07-16 ...
  * M        (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
  * X        (X) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ...
  * L        (L) timedelta64[ns] 0 days 12:00:00 1 days 12:00:00 ...
  * Y        (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 -85.0 -84.0 -83.0 ...
Data variables:
    ts       (M, L, S, Y, X) float64 ...
Attributes:
    Conventions:  IRIDL

S: start date; M: ensemble member; X: lon; Y: lat; L: lead-time

>>> ts = remote_data['ts'][0, 0, -1, 45, 80]
>>> ts
<xarray.DataArray 'ts' ()>
array(281.383544921875)
Coordinates:
    S        datetime64[ns] 2017-10-01
    M        float32 1.0
    X        float32 80.0
    L        timedelta64[ns] 12:00:00
    Y        float32 -45.0
Attributes:
    pointwidth:     0.0
    standard_name:  surface_temperature
    long_name:      Surface Temperature
    level_type:     surface
    units:          Kelvin_scale
    cell_method:    time: mean

In my case I should combine S and L to make a T (time) coordinate. Then append the T coord to the DataArray and drop the S and L coordinate. Unfortunately, I don't know how to do this part beyond this:

>>> T = ts.coords['S'].values + ts.coords['L'].values
>>> T
numpy.datetime64('2017-10-01T12:00:00.000000000')
1
On

By default, xarray converts variables with a units attribute of the form "seconds" to np.timedelta64 and variables with units of the form "seconds since ..." to np.datetime64.

This is convenient for some applications. To make this more concrete, let me highlight a section from ncdump -h on your netCDF file:

netcdf WaveWatch_III_Hawaii_Regional_Wave_Model_best {
variables:
        double time(time) ;
                time:long_name = "Forecast time for ForecastModelRunCollection" ;
                time:standard_name = "time" ;
                time:calendar = "proleptic_gregorian" ;
                time:units = "hours since 2011-06-21 00:00:00.000 UTC" ;
                time:missing_value = NaN ;
                time:_CoordinateAxisType = "Time" ;
        double time_run(time) ;
                time_run:long_name = "run times for coordinate = time" ;
                time_run:standard_name = "forecast_reference_time" ;
                time_run:calendar = "proleptic_gregorian" ;
                time_run:units = "hours since 2011-06-21 00:00:00.000 UTC" ;
                time_run:missing_value = NaN ;
                time_run:_CoordinateAxisType = "RunTime" ;
        double time_offset(time) ;
                time_offset:long_name = "offset hour from start of run for coordinate = time" ;
                time_offset:standard_name = "forecast_period" ;
                time_offset:calendar = "proleptic_gregorian" ;
                time_offset:units = "hours since 2011-06-21T00:00:00Z" ;
                time_offset:missing_value = NaN ;
        float Tper(time, z, lat, lon) ;
                Tper:units = "seconds" ;
                Tper:long_name = "peak wave period" ;
                Tper:standard_name = "sea_surface_wave_period_at_variance_spectral_density_maximum" ;
                Tper:short_name = "Tper" ;
                Tper:valid_range = 0.f, 60.f ;
                Tper:_FillValue = NaNf ;
                Tper:coordinates = "time_run time z lat lon " ;

My understanding of CF standard names is that forecast_period should be equal to the difference between time and forecast_reference_time, i.e., forecast_period = time - forecast_reference_time. If you specified your time_offset variable with units in the form "hours", then it would be decoded to timedelta64, along with datetime64 for time and time_run, so xarray's arithmetic would actually satisfy this identity. You might find this useful if you only wanted to include two of these variables and wanted to calculate the third on the fly.

On the other hand, you probably don't want to convert the Tper variable to timedelta64. Technically, it is also a time period, but it's not a variable that makes sense to compare to time.

Unfortunately, xarray is currently unable to distinguish between these two cases from looking at the variable metadata, so it converts everything to timedelta64 (though I would be happy to discuss ideas you might have on GitHub). For now, the best work-around is to remove the units attribute from these variables before decoding with xarray, e.g.,

raw = xr.open_dataset(url, decode_cf=False)
del raw.Tper.attrs['units']
ds = xr.decode_cf(raw)