This is the piece of code i'm having troubles with:
pump_recipe <- recipe(status_group ~ ., data = data) %>%
step_impute_median(all_numeric_predictors()) %>%
step_impute_knn(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_normalize(all_numeric_predictors())
prepared_rec <- prep(pump_recipe)
The error:
Error:
! Column name `funder_W.D...I.` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `stop_vctrs()`:
! Names must be unique.
x These names are duplicated:
* "funder_W.D...I." at locations 1807 and 1808.
Backtrace:
1. recipes::prep(pump_recipe)
2. recipes:::prep.recipe(pump_recipe)
4. recipes:::bake.step_dummy(x$steps[[i]], new_data = training)
8. tibble:::as_tibble.data.frame(indicators)
9. tibble:::lst_to_tibble(unclass(x), .rows, .name_repair)
...
16. vctrs `<fn>`()
17. vctrs:::validate_unique(names = names, arg = arg)
18. vctrs:::stop_names_must_be_unique(names, arg)
19. vctrs:::stop_names(...)
20. vctrs:::stop_vctrs(class = c(class, "vctrs_error_names"), ...)
Error:
Caused by error in `stop_vctrs()`:
! Names must be unique.
x These names are duplicated:
* "funder_W.D...I." at locations 1807 and 1808.
So basically it seems like the step_dummy
step is doing something strange, and creating a duplicated column here. I don't know why this is happening. This is the data I'm working with:
You are having levels in
funder
andinstaller
that are so similar thatstep_dummy()
creates labels of the same name. The error says thatfunder_W.D...I.
appears twice.If we do some filtering on the
funder
column we see that there are 3 different names that match.Neither
"W.D.&.I."
or"W.D & I."
are valid names sostep_dummy()
tries to fix them. This yields"funder_W.D...I."
for both.You can fix this by using
textrecipes::step_clean_levels()
, this make sure that the levels of these variables stay valid and non-overlapping.Note: As you say, I would imagine that
"W.D.&.I."
,"W.D & I."
and"W.D &"
all refer to the same entity. You should take a look to see if you can collapse these levels manually.