How do I assign group level value - based on row level values - to df using dplyr

58 Views Asked by At

I have the following decision rules:

RELIABILITY LEVEL     DESCRIPTION
LEVEL I               Multiple regression
LEVEL II              Multiple regression + mechanisms specified (all interest variables)
LEVEL III             Multiple regression + mechanisms specified (all interest + control vars)

The first three columns are the data upon which the 4th column should be reproduced using dplyr.

The reliability level should be the same for the whole table (model)... I want to code it using dplyr.

Here is my try so far... As you can see, I can't get it to be the same for the whole model

library(tidyverse)
library(readxl)
library(effectsize)

df <- read_excel("https://github.com/timverlaan/relia/blob/59d2cbc5d7830c41542c5f65449d5f324d6013ad/relia.xlsx")

df1 <- df %>%
  group_by(study, table, function_var) %>%
  mutate(count_vars = n()) %>%
  ungroup %>%
  group_by(study, table, function_var, mechanism_described) %>%
  mutate(count_int = case_when(
    function_var == 'interest' & mechanism_described == 'yes' ~ n()
    )) %>%
  mutate(count_con = case_when(
    function_var == 'control' & mechanism_described == 'yes' ~ n()
    )) %>% 
  mutate(reliable_int = case_when(
    function_var == 'interest' & count_vars/count_int == 1 ~ 1)) %>%
  mutate(reliable_con = case_when(
    function_var == 'control' & count_vars/count_con == 1 ~ 1)) %>%
  # group_by(study, source) %>%
  mutate(reliable = case_when(
    reliable_int != 1 ~ 1,
    reliable_int == 1 ~ 2,
    reliable_int + reliable_con == 2 ~ 3)) %>%
  # ungroup() %>%
1

There are 1 best solutions below

0
On

The code settled on is:

library(tidyverse)
library(readxl)

df <- read_excel("C:/Users/relia.xlxs")
df <- df %>% select(-reliability_score)

test<-df %>% group_by(study,model,function_var) %>%
  summarise(count_yes=sum(mechanism_described=="yes"),n=n(),frac=count_yes/n) %>%
  mutate(frac_control=frac[function_var=="control"],
         frac_interest=frac[function_var=="interest"]) %>%
  mutate(reliability = case_when(
    frac_control == 1 & frac_interest != 1 ~ -99,
    frac_control != 1 & frac_interest != 1 ~ 2,
    frac_interest == 1 & frac_control != 1 ~ 3,
    frac_interest ==1 & frac_control == 1 ~ 4)) %>% group_by(study,model) %>% summarise(reliability=mean(reliability))

df_reliability<-left_join(df,test)
View(df_reliability)

However, I would prefer to do this all within one dplyr pipe. If anyone has a solution I would love to hear it...