Another question for Synth dataprep() Error unit.variable not found as numeric variable in foo

1.4k Views Asked by At

I am using the Synth package to demonstrate the divergence in development between Djibouti and a synthetic model of Djibouti if it didn't have international intervention.

Despite several similar questions and attempts at the offered answers, I have still be struggling with the error:

unit.variable not found as numeric variable in foo

I have tried several different dataprep() strategies and still cannot run the code.

ddSMI <- as.data.frame(ddSMI) %>%   
   mutate(LifeYrs = as.numeric(LifeYrs),
          PedYrs = as.numeric(PedYrs),
          Health.Index.Total = as.numeric(Health.Index.Total),
          Income.Index.Total = as.numeric(Income.Index.Total),
          SchoolMean = as.numeric(SchoolMean),
          Cno = as.numeric(Cno))

I am trying to produce a synthetic control model and have been using different iterations of this code. Though I have changed the class to numeric successfully, I still get the same error. Here is the head of my data for reprex

head(ddSMI)
# A tibble: 6 x 8
   Year   Cno Country PedYrs LifeYrs          
  <dbl> <dbl> <chr>   <chr>  <chr>            
1  2000     1 Algeria 6.31   69.5999999999999…
2  2001     1 Algeria 6.23   69.2             
3  2002     1 Algeria 6.28   69.5             
4  2003     1 Algeria 6.32   71.0999999999999…
5  2004     1 Algeria 6.36   71.4000000000000…
6  2005     1 Algeria 6.39   71.7             
# … with 3 more variables: SchoolMean <chr>,
#   Health Index Total <chr>,
#   Income Index Total <chr>

Please see the code below.

dataprep.out <- dataprep(foo = ddSMI,
                         predictors = c("LifeYrs", "PedYrs", "Health.Index.Total", "Income.Index.Total", "SchoolMean"),
                         predictors.op = "mean", # the operator
                         time.predictors.prior = 2007:2008, #the entire time frame from the #beginning to the end
                         special.predictors = list(
                           list("HDI Rank", 2000:2020, "mean"),
                           list("LifeYrs", seq(2007,2008,2), "mean"),
                           list("PedYrs", seq(2007,2008,2), "mean"),
                           list("Health Index Total", seq(2007, 2008, 2), "mean"),
                           list("Income Index Total", seq(2007,2008, 2), "mean"),
                           list("School Mean", seq(2007, 2008, 2), "mean")),
                         dependent = "HDI Rank", #dv
                         unit.variable = "Cno", #identifying unit numbers
                         unit.names.variable = "Country", #identifying unit names
                         time.variable = "Year", #time period
                         treatment.identifier = 5,#the treated case
                         controls.identifier = c(2:4, 6:15),#the control cases; all others #except number 5
                         time.optimize.ssr = 2007:2008,#the time-period over which to optimize
                         time.plot = 2000:2020)#the entire time period before/after the treatment

Here is a helpful resource on the Synth package which I used to help guide/ troubleshoot: "Synth: An R Package for Synthetic Control Methodsin Comparative Case Studies"

My data is in the same format and yet...can't get it to run! It would be immensely appreciated if anyone can crack this!

1

There are 1 best solutions below

0
On

I had a similar error, although it had nothing to do with the unit variable being numeric (it is basically the first error message in the code: see here).

Make sure your object is a dataframe, and only a dataframe. I would recommend checking the data structure with the "synth.data" example that is provided with the package. Given your code is suggesting your object is also a tibble (tbl_df), this might be the reason for the error.

is.data.frame(synth.data)
[1] TRUE
class(synth.data)
[1] "data.frame"
is.data.frame(DATA)
[1] TRUE
class(DATA)
[1] "tbl_df"     "tbl"        "data.frame"
DATA <- as.data.frame(DATA)
class(DATA)
[1] "data.frame"