When is it necessary to use quotation marks to refer to a column in my dataframe and when can I do it without?

63 Views Asked by At

I'm fairly new in working with R, so I might not be naming so things right ;-) I would like to plot the results of three different linear models. To do so, I loop over my column names, which I saved in a vector with quotation marks.

lambdas <- c("lambda_of_antibody1", "lambda2", "lambda3")

Within my loop I calculate the linear model and have to use get(), so that I can access the column name. Then I plot the regression line using sjPlot::plot_model However, I get the error message that "Some of the specified terms were not found in the model. Maybe misspelled?".

    for (i in seq_along(lambdas)) {
        fit <- lm(sd_cplx ~ get(lambdas[i]), data = df_dpcrsum)
        plt <- sjPlot::plot_model(
            fit,
            type = "pred",
            terms = lambdas[i]
        ) +
            geom_point(
                data = df_dpcrsum,
                aes(
                    x = get(lambdas[i]),
                    y = sd_cplx
                )
            )

        plots[[i]] <- plt
    }

My question is, how do I correctly access the columns of my dataframe? It's probably quite simple but I couldn't figure out a solution.

I realised that the name of the term in model is get(lambdas[i]) and not the actual column name. So I tried to rename the coefficients using names(fit$coefficients). But I get the same error message. There is probably a way to use enquo() but I'm afraid I'm missing the deeper understanding behind the functions. So, I would be very greatful for any of your help! Please let me know if I provided enough information or if I missed some important details.

Here is an example of my dataframe:

> head(df)
# A tibble: 6 × 5
  lambda_of_antibody1 mean_cplx sd_cplx lambda2 lambda3
                <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
1                0.05    0.538     8.14  0.0525  0.0526
2                0.1     0.442    16.7   0.11    0.111
3                0.15    0.696    25.4   0.173   0.176
4                0.2     0.0541   35.1   0.24    0.248
5                0.25    0.479    45.5   0.312   0.328
6                0.3     0.358    55.9   0.39    0.417
2

There are 2 best solutions below

1
On

As you did not provide any sample data, it could not be verified. Try this (assuming your dataframe is named df):

plots <- list()

for (i in seq_along(lambdas)) {
  fit <- lm(sd_cplx ~ df[[lambdas[i]]], data = df)
  plt <- sjPlot::plot_model(
    fit,
    type = "pred",
    terms = lambdas[i]  ### need to verify the terms in your dataframe
  )   +
    geom_point(
      data = df,
      aes(
        x = .data[[lambdas[i]]],
        y = sd_cplx
      )
    )
  
  plots[[i]] <- plt
}
0
On

Found the solution:

lambdas <- c("lambda_of_antibody1", "lambda2", "lambda3")
plots <- list()

for (i in seq_along(lambdas)) {
        formula <- as.formula(paste("sd_cplx ~", lambdas[i]))
        fit <- lm(formula, data = df_dpcrsum)
        plt <- sjPlot::plot_model(
            fit,
            type = "pred",
            terms = lambdas[i]
        ) +
            geom_point(
                data = df_dpcrsum,
                aes(
                    x = .data[[lambdas[i]]],
                    y = sd_cplx
                )
            )

        plots[[i]] <- plt
    }

Thank you for your input, YBS :-)