I have a monthly (Jan - Dec) data set for weather and crop yield. This data is collected for multiple years (2002 - 2019). My aim is to obtain bootstrapped slope coefficient of the affect of temperature in each month on yield gap. In bootstrapping, I want to block the year information in a way that the function should randomly sample data from a specific year in each bootstrap rather than choosing rows from mixed years.
I read some blogs and tried different methods but I am not confident about those. I tried to disect the bootstrapped splits to ensure if I am doing it correctly but I was not.
Here is the starting code:
# Load libraries
library(readxl)
library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
library(reprex)
# data
ww_wt <- read_csv("https://raw.githubusercontent.com/MohsinRamay/yieldgap/main/ww_wt.csv")
#> New names:
#> * `` -> ...1
#> Rows: 1924 Columns: 20
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (3): ID, Location, Month
#> dbl (16): ...1, Year, Latitude, Longitude, YieldTrt, YieldUntrt, Mildew, Ye...
#> date (1): Date
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
ww_wt %>%
select(Year, Month, gap, temp) %>%
head()
#> # A tibble: 6 x 4
#> Year Month gap temp
#> <dbl> <chr> <dbl> <dbl>
#> 1 2002 September 0.282 13.6
#> 2 2002 October 0.282 13.3
#> 3 2002 November 0.282 7.07
#> 4 2002 December 0.282 3.44
#> 5 2002 January 0.282 5.61
#> 6 2002 February 0.282 6.93
# Bootstrapping
set.seed(123)
boots <- ww_wt %>%
ungroup() %>%
select(Year, Month, gap, temp) %>%
nest(data = -c(Month)) %>%
mutate(boots = map(data, ~bootstraps(.x, times = 100, apparent = FALSE))) %>%
unnest(boots) %>%
mutate(model = map(splits, ~lm(gap ~ temp, data = analysis(.))),
coefs = map(model, tidy))
Created on 2022-01-04 by the reprex package (v2.0.1)
I am nesting Months
because I want to get slopes for each month separately. Also, the data for each Year has different sample size n
because of different number of locations each year.
Thanks for providing these hints Julia. I think I have figured this out by just adding some extra lines of code.
The above plots clearly indicates that, within each bootstrap, data from each
Year
is being randomly sampled with replacement.Created on 2022-01-08 by the reprex package (v2.0.1)