Wrangling dataframes with Dplyr

59 Views Asked by At

I'm trying to find a neat Dplyr solution to convert this dataframe;

Rule <- c('Rule 1', 'Rule 1', 'Rule 1', 'Rule 1', 'Rule 2', 'Rule 2', 'Rule 2')
Condition <- c('1 of 4', '2 of 4', '3 of 4', '4 of 4', '1 of 3', '2 of 3', '3 of 3')
Clause <- c('Temperature > 60', 'Temperature < 90', 'Rain = 0', 'Wind < 20', 'Temperature > 55', 'Temperature < 85', 'Rain <= 2')
Lift <- c('1.30', '1.30', '1.30', '1.30', '1.60', '1.60', '1.60')
Coverage <- c('20%','20%','20%','20%','35%','35%','35%')
DF <- data.frame(Rule, Condition, Clause, Lift, Coverage)

Into this dataframe;

Rule <- c('Rule 1', 'Rule 1', 'Rule 1', 'Rule 1','', 'Rule 2', 'Rule 2', 'Rule 2')
Condition <- c('1 of 4', '2 of 4', '3 of 4', '4 of 4','', '1 of 3', '2 of 3', '3 of 3')
Clause <- c('Temperature > 60', 'Temperature < 90', 'Rain = 0', 'Wind < 20','', 'Temperature > 55', 'Temperature < 85', 'Rain <= 2')
Lift <- c('', '', '', '1.30', '','', '', '1.60')
Coverage <- c('','','','20%','','','','35%')
Result <- data.frame(Rule, Condition, Clause, Lift, Coverage)

Notice new blank rows which separates rules and repetitive Lift and Coverage metrics have been removed. Only retaining the Lift and Coverage from the final row of each rule.

1

There are 1 best solutions below

0
On BEST ANSWER

You can create a blank row to insert in every Rule :

empty_df <- data.frame(matrix('', nrow = 1, ncol = ncol(DF), 
                       dimnames = list(NULL, names(DF))))

Split the data for each unique Rule, replace the repeating values in Lift and Coverage column with blank add empty_df and combine the result.

library(dplyr)

DF %>%
  group_split(Rule) %>%
  purrr::map_df(~.x %>% 
        mutate(across(c(Lift, Coverage), 
              ~replace(., duplicated(., fromLast = TRUE), ''))) %>%
        bind_rows(empty_df)
        ) %>%
  #Remove the blank row from last `Rule`. 
  slice(-n())

#    Rule     Condition Clause             Lift   Coverage
#  <chr>    <chr>     <chr>              <chr>  <chr>   
#1 "Rule 1" "1 of 4"  "Temperature > 60" ""     ""      
#2 "Rule 1" "2 of 4"  "Temperature < 90" ""     ""      
#3 "Rule 1" "3 of 4"  "Rain = 0"         ""     ""      
#4 "Rule 1" "4 of 4"  "Wind < 20"        "1.30" "20%"   
#5 ""       ""        ""                 ""     ""      
#6 "Rule 2" "1 of 3"  "Temperature > 55" ""     ""      
#7 "Rule 2" "2 of 3"  "Temperature < 85" ""     ""      
#8 "Rule 2" "3 of 3"  "Rain <= 2"        "1.60" "35%"