In tidymodels I want to create a workflow based on a recipe and a model specification. It works when I do NOT include step_pca(); but when I include step_pca() as a setting I get error. Please see repex blow.
(It works fins if I do not use the workflow(); but then I loose functionality including updating roles)
x1 <- c(1, 6, 4, 2, 3, 4, 5, 7, 8, 2)
x2 <- c(1, 3, 4, 2, 3, 4, 5, 7, 8, 2)
id <- c(1:10)
y <- c(1, 4, 2, 5, 6, 2, 3, 6, 2, 4)
df1_train <- tibble(x1, x2, id, y)
# NA works with workflow
step_PCA_PREPROCESSING = NA
# Does not work with workflow
step_PCA_PREPROCESSING = 0.9
# My recipe
df1_train_recipe <- df1_train %>%
recipes::recipe(y ~ .) %>%
recipes::update_role(id, new_role = "id variable") %>%
recipes::step_center(recipes::all_predictors()) %>%
recipes::step_scale(recipes::all_predictors()) %>%
# Optional step_pca
{
if (!is.na(step_PCA_PREPROCESSING)) {
if (step_PCA_PREPROCESSING >= 1) {
recipes::step_pca(., recipes::all_predictors(), num_comp = step_PCA_PREPROCESSING)
} else if (step_PCA_PREPROCESSING < 1) {
recipes::step_pca(., recipes::all_predictors(), threshold = step_PCA_PREPROCESSING)
} else {
.
}
} else {
.
}
} %>%
recipes::prep()
# Model specifications
model_spec <- parsnip::linear_reg() %>%
parsnip::set_engine("glmnet")
# Create workflow (to know variable roles from recipes)
df1_workflow <- workflows::workflow() %>%
workflows::add_recipe(df1_train_recipe) %>%
workflows::add_model(model_spec)
# Fit model
mod <- parsnip::fit(df1_workflow, data = df1_train)
Thanks in advance
I think the best way to do what you're talking about is to use the ability of
step_pca()
to havenum_comp
set to zero, meaning no PCA decomposition. This is pretty convenient for your use case, becausethreshold
will overridenum_comp
.Created on 2020-12-06 by the reprex package (v0.3.0.9001)