Downloading NOAA data

Question

Downloading NOAA data

567 Views Asked by Tobin Brooks At 17 August 2025 at 22:13

I'm trying to download NOAA data using the rnoaa package and I'm running into a bit of trouble.

I took a vector from a dataframe and it looks like this:

df <- dataframe$ghcnd

Grabbing necessary column

This gives me an output like:

[1] "GHCND:US1AKAB0058" "GHCND:US1AKAB0015" "GHCND:US1AKAB0021" "GHCND:US1AKAB0061"
 [5] "GHCND:US1AKAB0055" "GHCND:US1AKAB0038" "GHCND:US1AKAB0051" "GHCND:US1AKAB0052"
 [9] "GHCND:US1AKAB0060" "GHCND:US1AKAB0065" "GHCND:US1AKAB0062" "GHCND:US1AKFN0016"
[13] "GHCND:US1AKFN0018" "GHCND:US1AKFN0015" "GHCND:US1AKFN0011" "GHCND:US1AKFN0013"
[17] "GHCND:US1AKFN0030" "GHCND:US1AKJB0011" "GHCND:US1AKJB0014" "GHCND:US1AKKP0005"
[21] "GHCND:US1AKMS0011" "GHCND:US1AKMS0019" "GHCND:US1AKMS0012" "GHCND:US1AKMS0020"
[25] "GHCND:US1AKMS0018" "GHCND:US1AKMS0014" "GHCND:US1AKPW0001" "GHCND:US1AKSH0002"
[29] "GHCND:US1AKVC0006" "GHCND:US1AKWH0012" "GHCND:US1AKWP0001" "GHCND:US1AKWP0002"
[33] "GHCND:US1ALAT0014" "GHCND:US1ALAT0013" "GHCND:US1ALBW0095" "GHCND:US1ALBW0087"
[37] "GHCND:US1ALBW0020" "GHCND:US1ALBW0066" "GHCND:US1ALBW0031" "GHCND:US1ALBW0082"
[41] "GHCND:US1ALBW0099" "GHCND:US1ALBW0040" "GHCND:US1ALBW0004" "GHCND:US1ALBW0085"
[45] "GHCND:US1ALBW0009" "GHCND:US1ALBW0001" "GHCND:US1ALBW0094" "GHCND:US1ALBW0013"
[49] "GHCND:US1ALBW0079" "GHCND:US1ALBW0060"

In reality, I have about 22,000 weather stations. This is just showing the first 50.

rnoaa code

library(rnoaa)
options("noaakey" = Sys.getenv("noaakey"))
Sys.getenv("noaakey")

weather <- ncdc(datasetid = 'GHCND', stationid = df, var = 'PRCP', startdate = "2020-05-30",
                enddate = "2020-05-30", add_units = TRUE)

Which produces the following error: Error: Request-URI Too Long (HTTP 414)

However, when I subset the df into just, say, the first 100 entries, I can't get data for more than the first 25. However, the package details say I should be able to run 10,000 queries a day.

Loop Attempt

df1 <- df[1:125] ## Splitting dataframe. Too big otherwise

for (i in 1:length(df1)){
  weather2<-ncdc(datasetid = 'GHCND', stationid=df1[i],var='PRCP',startdate ='2020-06-30',enddate='2020-06-30',
          add_units = TRUE)
  
}

But this just producing a dataframe of a single row, that row being the 125th weather station.

If anyone could give advise on what to try next that would be great :)

Also, cross linked: https://discuss.ropensci.org/t/rnoaa-getting-county-level-rain-data/2403

Original Q&A

There are 2 best solutions below

Dave2e On 16 March 2021 at 23:02

In your loop attempt, weather2 is overwritten on each iteration of the loop.

Since the number of requests and the length of the return is unknown, one way to solve this problem is to wrap the call to ncdc inside a lapply statement and save each response in a list. Then at the end of the lapply statement merge all the data into one large dataframe.

library(rnoaa)
library(dplyr)

stationlist <-ghcnd_stations() %>% filter(state == "DE")
df <- paste0("GHCND:", stationlist$id[1:10]) 

#call request data multiple time and store individual results in a list 
 output<-lapply(df, function(station){
    weather <- ncdc(datasetid = 'GHCND', stationid = station, var = 'PRCP', startdate = "2020-05-30",
                    enddate = "2020-05-30", add_units = TRUE)
    #weather$data
    #to include the meta data
    data.frame(t(unlist(weather$meta)), weather$data)
 })
 
 #merge into 1 data frame
 answer <-bind_rows(output)

I would verify this process on a small subset of stations as the call to NOAA can be slow. I attempt to reduce the down the number of stations searched to the area of interest and to the ones still actively collecting data.

Also concerning the limit request.
From the help page: "Note that the default limit (no. records returned) is 25. Look at the metadata in $meta to see how many records were found. If more were found than 25, you could set the parameter limit to something higher than 25."

**Tobin Brooks** · Accepted Answer

Figured it out, with a lot of help from @Dave2e and a bud on the ropensci link above.

df <- cleaned_emshr$ghcnd  ## Grabbing necessary column

z <- split(df, ceiling(seq_along(df)/100))
out <- list()
for (i in seq_along(z)) {
  out[[i]] <- ncdc(datasetid = 'GHCND', stationid = z[[i]], var = 'PRCP', 
                   startdate = "2020-05-30", enddate = "2020-05-30", 
                   add_units = TRUE, limit = 100)
}

weather <- bind_rows(lapply(out, "[[", "data"))

Downloading NOAA data

Grabbing necessary column

rnoaa code

Loop Attempt

There are 2 best solutions below

Related Questions in R

Related Questions in NOAA

Related Questions in ROPENSCI

Related Questions in RNOAA

Trending Questions

Popular # Hahtags

Popular Questions