I have a dataset with metabolic data for 2 patient groups (with and without a disease). 40 Patient IDs in rows. Usually metabolic data is heavily influenced by age, sex and BMI. Unfortunately, one of the patient groups is heavier, more likely male and older.
I have several columns with outcome variables, that I want to compare between the disease groups, adjusted for differences simply due to age, sex and BMI. I know how to handle that for one column, but for > 20 it would be highly inconvenient. How would you handle this in R? Is there any specific library, that takes (outcome) variables and confounding variable names as input and returns a new df with outcome1_adjusted, outcome2_adjusted and so on?
Let's stipulate the following data set:
library(dplyr)
set.seed(42)
patient_id <- 1:40
sex <- sample(1:2, 40, replace = TRUE)
age <- rnorm(40, 30, 5)
bmi <- rnorm(40, 25, 3)
group <- rep(1:2, each = 20)
outcome1 <- 20 + 2 * sex + 0.5 * age + 1.5 * bmi + 5 * rnorm(40)
outcome2 <- 15 + 1.5 * sex + 0.8 * age + 1 * bmi + 3 * rnorm(40)
outcome3 <- 25 + 2.5 * sex + 0.7 * age + 2 * bmi + 4 * rnorm(40)
matabolic_data <- data.frame(patient_id, sex, age, BMI = bmi, group, outcome1, outcome2, outcome3)
Please note, that I don't want to simply get the residuals, but predicted values according to the regression, corrected for the confounding variables (which makes understanding easier).
For one column to adjust for one confounder I would do:
model <- lm(outcome ~ confounder, metabolic_data)
metabolic_data %>%
mutate(dif = coef(model)[2] * (mean(confounder) - confounder),
outcome_adjusted = outcome + dif)
FYI: Finally I need to display the comparison of the adjusted variables in one table with p-values for comparison (for which I wanted to use tbl_summary of the gt_summary lib and I already know how to use).