Reference is made towards Fit model using each predictor columns indiviually store results in dataframe, where a dataframe consists one column of a response variable and several columns of predictor variables. The author wished to fit models for the response variable using each of the predictor variables separately, finally creating a dataframe that contains the coefficients of the model. There's an answer https://stackoverflow.com/a/43959567/14435732 down the original question which interests me (copied below).
require(tibble)
require(dplyr)
require(tidyr)
require(purrr)
require(broom)
df <- iris
response_var <- "Sepal.Length"
vars <- tibble(response=response_var,
predictor=setdiff(names(df), response_var))
compose_formula <- function(x, y)
as.formula(paste0("~lm(", y, "~", x, ", data=.)"))
models <- tibble(data=list(df)) %>%
crossing(vars) %>%
mutate(fmla = map2(predictor, response, compose_formula),
model = map2(data, fmla, ~at_depth(.x, 0, .y)))
models %>% unnest(map(model, tidy))
My question is slightly different: now I have a dataframe with several columns of response variables (say Sepal.Length and Sepal.Width) and several columns of predictor variables (say Petal.Length, Petal.Width and Species) (see my first question from Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column). I got a very helpful answer but it would be perfect if I could have kept the names for response and predictor in the formula of the model object.
P.S: I have tried modifying the codes from https://stackoverflow.com/a/43959567/14435732 but encountered several issues:
- When I tried
tibble()
for vars (now that my response_var has several columns), this happens: 'Error: Tibble columns must have compatible sizes.' at_depth()
is defunct.
Is there a way to get a desired output like below? (copied from https://stackoverflow.com/a/43959567/14435732)
# A tibble: 9 x 7
response predictor term estimate std.error statistic
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Sepal.Length Sepal.Width (Intercept) 6.5262226 0.47889634 13.627631
2 Sepal.Length Sepal.Width Sepal.Width -0.2233611 0.15508093 -1.440287
3 Sepal.Length Petal.Length (Intercept) 4.3066034 0.07838896 54.938900
4 Sepal.Length Petal.Length Petal.Length 0.4089223 0.01889134 21.646019
5 Sepal.Length Petal.Width (Intercept) 4.7776294 0.07293476 65.505517
6 Sepal.Length Petal.Width Petal.Width 0.8885803 0.05137355 17.296454
7 Sepal.Length Species (Intercept) 5.0060000 0.07280222 68.761639
8 Sepal.Length Species Speciesversicolor 0.9300000 0.10295789 9.032819
9 Sepal.Length Species Speciesvirginica 1.5820000 0.10295789 15.365506
# ... with 1 more variables: p.value <dbl>
You can use a multi response linear model, where each response is regressed against each predictor separately, so for example:
You get two sets of predictors and tidy does a good job on this:
Now we just need to repeat the above, changing the formula on the RHS , so we can do this by using reformulate, for example:
We write a function to do this, and adding an additional column for the predictor:
And use the amazing purrr (gone are the days of lapply..):
If you have say 80 responses, 120 predictors:
I hope this makes sense now.