Why doesn't rbind() work inside a for loop?

4.7k Views Asked by At

I have a range of dates:

date_rng <- seq( as.Date("2008-01-01"), as.Date("2008-12-31"), by="+1 day")

I have some helper functions that are necessarily relevant to the question and I'll try to leave them out.

I start with the first date and make a call to this function:

# Function for getting weather table by airport code and date and return dataframe
get_table <- function(code, date){
  adv <- sprintf(
    "https://www.wunderground.com/history/airport/K%s/2008/%s/%s/DailyHistory.html", 
    code, month(date), day(date)
  )
  h <- adv %>% read_html()
  t <- h%>% 
  html_nodes(xpath = '//*[@id="obsTable"]') %>%
  html_table()
  df <- data.frame(t)
  return(df)
}
atl_weather <- get_table("ATL", date_rng[1])

Now I iterate over a the rest of the dates creating a new df for each one which I then try to append to the original:

# Loop through remaining dates and bind dfs
for(d in as.list(date_rng[2:4])){
  rbind(atl_weather, get_table("ATL", d), d)
}

But the binding doesn't happen and I'm left with the original dataframe for the first date in the range, created before the for loop.

This works though:

atl_weather <- get_table("ATL", date_rng[1])
new_df <- get_table("ATL", date_rng[2])
new_df <- scraped_data_formatter(new_df, date_rng[2])
rbind(atl_weather, new_df)

How can I get rbind() to work in the for loop (so that I iteratively build up the dataframe to include all the data from the full date range)?

2

There are 2 best solutions below

3
On BEST ANSWER

It does work. The problem is you are throwing away the result because you don't assign the output from rbind() to anything.

Change

rbind(atl_weather, get_table("ATL", d), d)

to this

atl_weather <- rbind(atl_weather, get_table("ATL", d), d)

assuming atl_weather is the data frame you want to incrementally add to.

That said, you don't want to do this in R; each time you add a column/row to an object R needs to do lots of copying of data around. Basically there's a lot of overhead in incrementally growing objects this way and doing this is a sure fire way to bog your code down.

Ideally, you'd allocate enough space first (i.e. enough rows so that you could index the ith row in the when you assign: new_atl_weather[i, ] <- c(....).)

1
On

I'll risk going off-topic (as the question has been correctly answered already), by giving you my favourite programming pattern to use whenever I'm forced to build up a dataframe within a for-loop:

for (d in as.list(date_rng[2:4])){
    if (exists("atl_weather")) {
        atl_weather = rbind(atl_weather, get_table("ATL", d), d)
    } else {
        atl_weather = get_table("ATL", d)
    }
}

Of course, if I had the function encompassed in get_table, I'd use some kind of apply statement instead. But when real life gets in the way and the inside of the for-loop is too complicated, I would usually have some kind of temp.data.frame object which gets assigned to or rbinded to atl_weather using a similar pattern to the above:

if (exists("atl_weather")) rm(atl_weather) # in case I'm dynamically running code in chunks

for (d in as.list(date_rng[2:4])){
    temp.df = ... # complicated stuff

    if (exists("atl_weather")) {
        atl_weather = rbind(atl_weather, temp.df)
    } else {
        atl_weather = temp.df
    }
}