step_BoxCox() with negative data

479 Views Asked by At

My understanding is that the step_BoxCox() requires a strictly positive variable. However, I tried to apply the step on data that has some negative values, I didn't get an error or a warning. The output had no NA values.
I don't know what is wrong, if my understanding flawed, or am I using a wrong syntax or something.

library(recipes)
library(skimr)

# create dummy data
set.seed(123)
n <- 2e3
x1 <- rpois(n, lambda = 5) # has some zero vals
x2 <- rnorm(n) # has some -ve vals
x3 <- x1 + 10 # is strictly positive
y <- x1 + x2
data <- tibble(x1, x2, x3, y)

# a BocCox recipe
rec <- recipe(y ~ ., data = data) %>% 
  step_BoxCox(all_predictors())
rec
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          3
#> 
#> Operations:
#> 
#> Box-Cox transformation on all_predictors()

# bake
processed <- rec %>% 
  prep() %>% 
  bake(new_data = NULL)

# check output
summary(data)
#>        x1               x2                  x3              y         
#>  Min.   : 0.000   Min.   :-3.047861   Min.   :10.00   Min.   :-2.048  
#>  1st Qu.: 3.000   1st Qu.:-0.654767   1st Qu.:13.00   1st Qu.: 3.349  
#>  Median : 5.000   Median :-0.007895   Median :15.00   Median : 4.843  
#>  Mean   : 4.981   Mean   : 0.011176   Mean   :14.98   Mean   : 4.993  
#>  3rd Qu.: 6.000   3rd Qu.: 0.688699   3rd Qu.:16.00   3rd Qu.: 6.486  
#>  Max.   :14.000   Max.   : 3.421095   Max.   :24.00   Max.   :15.225
summary(processed)
#>        x1               x2                  x3              y         
#>  Min.   : 0.000   Min.   :-3.047861   Min.   :2.076   Min.   :-2.048  
#>  1st Qu.: 3.000   1st Qu.:-0.654767   1st Qu.:2.285   1st Qu.: 3.349  
#>  Median : 5.000   Median :-0.007895   Median :2.398   Median : 4.843  
#>  Mean   : 4.981   Mean   : 0.011176   Mean   :2.388   Mean   : 4.993  
#>  3rd Qu.: 6.000   3rd Qu.: 0.688699   3rd Qu.:2.448   3rd Qu.: 6.486  
#>  Max.   :14.000   Max.   : 3.421095   Max.   :2.756   Max.   :15.225
sum(is.na(processed$x2))
#> [1] 0
skim(processed)

Created on 2021-04-29 by the reprex package (v0.3.0)

0

There are 0 best solutions below