"Error in checkNA.index(index) : NA in the individual index variable" when using a bootstrapped dataset as input for the plm function in R

32 Views Asked by At

I wanted to use a bootstrapped sample as input for my plm function, but seem to be getting the error "Error in checkNA.index(index) : NA in the individual index variable".

Specifically, I use a dataset from the study by Grossman, Pierskalla and Boswell Dean 2017 and want to estimate the second fixed effects model in Table 1, model B. Next to replicating the results, I would like to estimate the standard errors by bootstrapping complete entities with replacement (so keeping the temporal effects in place). I also account for new IDs by creating a new column "bootstrapindex" such that the plm method accounts for these new indices. Yet, I still seem to get this error because of the duplicate rows in the sample.

Could someone please explain me why this is happening and how it can be solved?

Here is the code I used & the dataset is publicly available.


library(dplyr)
bootstrapEntities <- function(data, idColumn, timeColumn) {
  # Ensure idColumn exists in data
  
  # Extract and clean unique entities
  uniqueEntities <- na.omit(unique(data[[idColumn]]))
  
  # Sample entities with replacement
  sampledEntities <- sample(uniqueEntities, size = length(uniqueEntities), replace = TRUE)
  
  # Initialize an empty list to store bootstrap samples
  bootstrapSamples <- list()
  
  # Initialize a counter for the new bootstrap index
  bootstrapIndexCounter <- 1
  
  # Loop through each sampled entity and extract its observations
  for (entity in sampledEntities) {
    entityData <- data[data[[idColumn]] == entity, ]
    # Ensure entityData is not empty or NA before adding
    if (nrow(entityData) > 0 && !all(is.na(entityData))) {
      # Assign a unique bootstrap index to all rows of this entity
      entityData$bootstrapindex <- bootstrapIndexCounter
      bootstrapSamples[[length(bootstrapSamples) + 1]] <- entityData
      # Increment the bootstrap index counter
      bootstrapIndexCounter <- bootstrapIndexCounter + 1
    }
  }
  
  # Combine all the sampled entities' data into a single dataframe
  bootstrapSample <- bind_rows(bootstrapSamples)
  
  bootstrapSample <- bootstrapSample %>% 
    filter(!is.na(.[[idColumn]]) & !is.na(.[[timeColumn]]))
  
  # Return the cleaned bootstrap sample with the new indexing
  return(bootstrapSample)
}`



data <- read.dta("[fill in your own path] /services_admin_tscs.dta")
data <- data %>%
  filter(region_x == "Sub-Saharan Africa")
pdata <- pdata.frame(data, index = c("ccodecow", "year"))

controls <- c("lpop_l", "wdi_urban_l", "lgdppc_l", "conflict_l", "dpi_state_l", "p_polity2_l", "loilpc_l", "aid_pc_l")
formula_B <- paste("ServicesA ~ ladminpc_l5 + factor(year) +", paste(controls, collapse = " + "))
formula_B <- as.formula(formula_B)
fe_model_B <- plm(formula_B, data = pdata, model = "within")
clustered_se_B <- vcovHC(fe_model_B, type = "HC1", cluster = "group")
coeftest(fe_model_B, vcov = clustered_se_B)

variables_of_interestB <- c("lpop_l", "wdi_urban_l", "lgdppc_l", "conflict_l", 
                            "dpi_state_l", "p_polity2_l", "loilpc_l", "aid_pc_l", 
                            "ladminpc_l5")

#THIS GIVES THE ERROR:
coef(plm(formula_B, data = bootstrapEntities(pdata, 'ccodecow','year') ,index = c("bootstrapindex", "year"), model = "within"))[variables_of_interestB]``

I made sure that there were no NA's in both the country IDs and the years. I created a new row for the new bootstrapindices.

I also tried to test where the problem lies. The bootstrap sample seems to work for "lm" instead of plm (but I don't know if the estimates make sense). Also, I tried investigating the bootstrapped samples, but they seem fine (without NA's in weird places or odd rows/columns).The problem really seems to lie with the fact that plm cannot deal with duplicate rows (even with changed country IDs).

Plm works fine for the normal dataset, just not for bootstrapped datasets from it.

Any other suggestions for bootstrapping are also welcome, e.g. replacing every single row with replacement, not taking into account temporal changes.

0

There are 0 best solutions below