How do I drop NA values in a netcdf file?

622 Views Asked by At

I'm working with some weather data and have a netcdf file that gives me wave height. My goal is to match ports along a coast to the closest grid point. I pulled the data from the ERA5 data store. When requesting the data, you can specify the bounds by providing the latitude and longitude from your area of interest. My goal is to then use that data in R for an analysis. Right now, the way I'm conducting this analysis is by using the functions in ncdf4 package (code below). The problem I run into is that I'm dealing with wave data, which means that a grid that does not predominantly overlap the ocean appears as an NA in the data. So, when I attempt to match up a point with the nearest grid, I receive NaS instead of one of the grid cells with a relevant value. What I'd like to do is drop all NA values in a netcdf file so that my attempt at matching with the nearest grid cell results in a value.

The code I'm currently running:


#open the connection with the netcdf file
nc <- nc_open("swell_bilmap.nc")

#extract lon and lat
lat <- ncvar_get(nc,'lat')
lon <- ncvar_get(nc,'lon')
dim(lat);dim(lon)

#extract the time
t <- ncvar_get(nc, "time")

#time unit
ncatt_get(nc,'time')

#convert the hours into date + hour
#as_datetime() function of the lubridate package needs seconds
timestamp <- as_datetime(c(t*60*60),origin="1900-01-01")

#import the data, in this case the data var is shts
data_shts <- ncvar_get(nc,"shts")

#close the conection with the ncdf file
nc_close(nc)


#create all the combinations of lon-lat
lonlat <- expand.grid(lon=lon,lat=lat)

#then I match a given lat and long with the nearest, distance based, grid cell
#we must convert the coordinates in a spatial object 
coord_brookings <- st_as_sf(lonlat,coords=c("lon","lat"))%>%
  st_set_crs(4326)

#we do the same with our coordinate of our port of interest (keep in mind PORTS is the data frame that contains the point lats and longs
psj_brookings <- st_point(c(PORTS[1,5],PORTS[1,4]))%>%
  st_sfc()%>%
  st_set_crs(4326)


#add the distance to the points
coord_brookings <- mutate(coord_brookings,dist=st_distance(coord_brookings,psj_brookings))

#create a distance matrix with the same dimensions as our data
dist_mat_brookings_shts <- matrix(coord_brookings$dist,dim(data_u)[-3])

#the arrayInd function is useful to obtain the row and column indexes
mat_index_brookings_shts <- as.vector(arrayInd(which.min(dist_mat_brookings_shts), dim(dist_mat_brookings_shts)))

#extract the time series
df_brookings_shts <- data.frame(shts=data_shts[mat_index_brookings_shts[1],mat_index_brookings_shts[2],],time=timestamp)

What this then gives me a data frame with the values for this variable from the nearest grid cell for each date and time step.

A visual representation of the problem I'm running into:

enter image description here

The points that I'm matching to grid cell lie on the land-surface. It might look at first glance that there just isn't data where we see no color, however there is, it's just an na value so it doesn't show up as anything. But when I run my code in R, I get NAs. I want only the colored areas/non NAs to show up.

I'm new to working with netcdf files, so thanks for all the help! (also I'd like to do this in either R, CDO, or ArcGIS pro, preference in that order).

1

There are 1 best solutions below

1
On

I am not sure that removing missing values helps here. If there is no spatial overlap between the wave data and parts of your grid, then removing the nas will not improve the situation.

The only solution is to extrapolate the values. Since this is wave data, it is probably safe to just use the nearest neighbour to the coastal cells.

This can be done easily with CDO:

cdo setmisstonn infile.nc outfile.nc