I'm using nc_open
to get a DatasetNode from a THREDDS Data Server, and reading a subset of the data in ncvar_get
by specifying start
and count
. Reproducible example below:
library(thredds)
library(ncdf4)
Top <- CatalogNode$new("https://oceanwatch.pifsc.noaa.gov/thredds/catalog.xml")
DD <- Top$get_datasets()
dnames <- names(DD)
dname <- dnames[4] # "Chlorophyll a Concentration, Aqua MODIS - Monthly, 2002-present. v.2018.0"
D <- DD[[dname]]
dl_url <- file.path("https://oceanwatch.pifsc.noaa.gov/thredds/dodsC", D$url)
dataset <- nc_open(dl_url)
dataset_lon <- ncvar_get(dataset, "lon") # Get longitude values
dataset_lat <- ncvar_get(dataset, "lat") # Get latitude values
dataset_time <- ncvar_get(dataset, "time") # get time values in tidy format
# specify lon/lat boundaries for data subset:
lonmin = 160
lonmax = 161
latmin = -1
latmax = 0
LonIdx <- which(dataset_lon >= lonmin & dataset_lon <= lonmax)
LatIdx <- which(dataset_lat >= latmin & dataset_lat <= latmax)
# read the data for first 10 timesteps:
dataset_array <- ncvar_get(dataset,
start=c(findInterval(lonmin, dhw_lon), findInterval(latmax, sort(dhw_lat)), 1),
count=c(length(LonIdx), length(LatIdx), 10), varid="chlor_a", verbose=TRUE)
Is there a way to calculate the approximate file size for the ncvarget
before reading the data?
Many thanks to both @michael-delgado and @robert-wilson for the above. I've edited the original post to include a reproducible example and answered my own question in case it helps anyone else later down the line.
If I understand correctly all current implementations of R use float32. Using the example Aqua MODIS Chlorophyll dataset in the post above:
An upper bound on the file size (assuming no
NA
) before to downloading the data with ncvar_get would be 23,040 bytes:which is confirmed with the dimensions of data after downloading:
Writing the output array to disk produces a 20,444 byte file:
which is close to the calculated upper limits (23,040 bytes). For me this approach is useful in obtaining an upper limit and approximate size before downloading the data using ncvar_get, many thanks to both of you.
(Out of interest, excluding NA in the above example leaves 4559 out of 5760 cells:
(sum(!is.na(dataset_array)) * 4)
which gives 18,236 bytes, smaller than the actual file size (20,444 bytes).