Reference is made towards Fit model using each predictor columns indiviually store results in dataframe, where a dataframe consists one column of a response variable and several columns of predictor variables. The author wished to fit models for the response variable using each of the predictor variables separately, finally creating a dataframe that contains the coefficients of the model. There's an answer https://stackoverflow.com/a/43959567/14435732 down the original question which interests me (copied below).
require(tibble)
require(dplyr)
require(tidyr)
require(purrr)
require(broom)
df <- iris
response_var <- "Sepal.Length"
vars <- tibble(response=response_var,
               predictor=setdiff(names(df), response_var))
compose_formula <- function(x, y)
  as.formula(paste0("~lm(", y, "~", x, ", data=.)"))
models <- tibble(data=list(df)) %>%
           crossing(vars) %>%
           mutate(fmla = map2(predictor, response, compose_formula),
                  model = map2(data, fmla, ~at_depth(.x, 0, .y)))
models %>% unnest(map(model, tidy))
My question is slightly different: now I have a dataframe with several columns of response variables (say Sepal.Length and Sepal.Width) and several columns of predictor variables (say Petal.Length, Petal.Width and Species) (see my first question from Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column). I got a very helpful answer but it would be perfect if I could have kept the names for response and predictor in the formula of the model object.
P.S: I have tried modifying the codes from https://stackoverflow.com/a/43959567/14435732 but encountered several issues:
- When I tried tibble()for vars (now that my response_var has several columns), this happens: 'Error: Tibble columns must have compatible sizes.'
- at_depth()is defunct.
Is there a way to get a desired output like below? (copied from https://stackoverflow.com/a/43959567/14435732)
# A tibble: 9 x 7
      response    predictor              term   estimate  std.error statistic
         <chr>        <chr>             <chr>      <dbl>      <dbl>     <dbl>
1 Sepal.Length  Sepal.Width       (Intercept)  6.5262226 0.47889634 13.627631
2 Sepal.Length  Sepal.Width       Sepal.Width -0.2233611 0.15508093 -1.440287
3 Sepal.Length Petal.Length       (Intercept)  4.3066034 0.07838896 54.938900
4 Sepal.Length Petal.Length      Petal.Length  0.4089223 0.01889134 21.646019
5 Sepal.Length  Petal.Width       (Intercept)  4.7776294 0.07293476 65.505517
6 Sepal.Length  Petal.Width       Petal.Width  0.8885803 0.05137355 17.296454
7 Sepal.Length      Species       (Intercept)  5.0060000 0.07280222 68.761639
8 Sepal.Length      Species Speciesversicolor  0.9300000 0.10295789  9.032819
9 Sepal.Length      Species  Speciesvirginica  1.5820000 0.10295789 15.365506
# ... with 1 more variables: p.value <dbl>
 
                        
You can use a multi response linear model, where each response is regressed against each predictor separately, so for example:
You get two sets of predictors and tidy does a good job on this:
Now we just need to repeat the above, changing the formula on the RHS , so we can do this by using reformulate, for example:
We write a function to do this, and adding an additional column for the predictor:
And use the amazing purrr (gone are the days of lapply..):
If you have say 80 responses, 120 predictors:
I hope this makes sense now.