long format linear regression performed over each unique two column pairing in r (when duplicates exist in another column). Determine when response=0

84 Views Asked by user11057680 At 10 October 2023 at 20:23

have a df in quasi-long format:

df <- data.frame(X= c(1,2,1,2,1,2,1,2),
                 Y= c(1,2,1,2,1,2,1,2),
                 variable = c("A","A","B","B","A","A","B","B"),
                 value = c(.5,.6,1000,1000,.7,.8,2000,2000))

the goal is, for each unique X,Y pairing (there are 2) , 1.) perform an lm with variable A (its associated value) serving as the response variable and variable B serving as the predictor variable (its associated value), then 2.) determine (bind if possible) the predictor variable value that generates a response value of 0.

I have converted the variable column to factor and then tried pivot_wider, pivot_longer, dcast, etc. to no avail obviously because the variable has duplicate names (factor levels if converting to factor). Without further pivoting, it is likely possible to run the lm's and to determine the 0 point, then bind it to the original df for each unique X,Y pairing.

this is an example of the desired output with the final column to be populated:

desired <-  data.frame(X= c(1,2),
                   Y= c(1,2),
                   pred_value_B_when_response_A_equals_0 = c(NA,NA))

per request, here is the original wide format

df_wide <- data.frame(X=c(1,2),
                      Y=c(1,2),
                      A=c(.5,.6),
                      B=c(1000,1000),
                      A1=c(.7,.8),
                      B1=c(2000,2000))

and then I performed

df <- reshape2::melt(df_wide,
                      id.vars = c("X","Y"))

Original Q&A

There are 2 best solutions below

Onyambu On 10 October 2023 at 22:28

Using the wide data, rearange it to look as shown below:

new_dat <- df_wide %>%
  rename_with(~str_replace(.x,'(A|B)$', '\\10')) %>%
  pivot_longer(A0:B1, names_to = c('.value', 'grp'),
               names_pattern = '(.)(.)')

     X     Y grp       A     B
  <dbl> <dbl> <chr> <dbl> <dbl>
1     1     1 0       0.5  1000
2     1     1 1       0.7  2000
3     2     2 0       0.6  1000
4     2     2 1       0.8  2000

Now in the data above, you can easily groupby X and Y and run linear model of A against B:

new_dat %>% 
  summarise(model=list(lm(A~B)), .by = c(X,Y))

# A tibble: 2 × 3
      X     Y model 
  <dbl> <dbl> <list>
1     1     1 <lm>  
2     2     2 <lm>

You can now use the linear models per group fitted to do predictions, etc

user11057680 On 11 October 2023 at 01:22

new_dat <- pivot_longer(df_wide,cols = -c(X,Y), 
                                  names_to = c('.value', 'grp'), 
                                  names_pattern = '([a-zA-Z]+)(\\d+)')

solves the issue.

long format linear regression performed over each unique two column pairing in r (when duplicates exist in another column). Determine when response=0

There are 2 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in PIVOT

Related Questions in LM

Related Questions in MELT

Trending Questions

Popular # Hahtags

Popular Questions