In R selecting X first PCAs components in recipe in tidymodels

88 Views Asked by At

I would like to select the X first number of PCA components after they've been computed within a recipe. I then want to add this recipe in a workflow.

Please see example data below.

library(tidymodels)
x1 <- c(1, 6, 4, 2, 3, 4, 5, 7, 8, 2)
x2 <- c(1, 3, 4, 2, 3, 4, 5, 7, 8, 2)
x3 <- c(1, 3, 4, 2, 3, 4, 5, 7, 8, 2)
x4 <- c(1, 3, 4, 2, 3, 4, 5, 7, 8, 2)
id <- c(1:10)
y <- c(1, 4, 2, 5, 6, 2, 3, 6, 2, 4)
df1_train <- tibble(x1, x2,  x3,  x4, id, y)

step_PCA_PREPROCESSING = 4
selectXfirstPCA = 3

# My recipe
df1_train_recipe <- df1_train %>%
  recipes::recipe(y ~ .) %>%
  recipes::update_role(id, new_role = "id variable") %>%
  recipes::step_pca(., recipes::all_predictors(), num_comp = step_PCA_PREPROCESSING) %>% 
  recipes::update_role(tidyselect::num_range("PC", 1:selectXfirstPCA), new_role = "predictor")


# To then continue like below: 

# Model specifications
model_spec <- parsnip::linear_reg() %>% 
  parsnip::set_engine("glmnet") 

# Create workflow (to know variable roles from recipes)
df1_workflow <- workflows::workflow() %>%
  workflows::add_recipe(df1_train_recipe) %>%
  workflows::add_model(model_spec) 

# Fit model
mod <-  parsnip::fit(df1_workflow, data = df1_train)

1

There are 1 best solutions below

0
On

It is simply the num_comp argument in the step_pca(). As per the documentation, num_comp is "The number of PCA components to retain as new predictors. If num_comp is greater than the number of columns or the number of possible components, a smaller value will be used."
So your code could simply be
recipes::step_pca(., recipes::all_predictors(), num_comp = selectXfirstPCA) and remove the last update_role() line.