R add columns indicating start and end for a sequence within columns

522 Views Asked by At

I have data as below.

names=c(rep("a",4),rep("b",5),rep("c",2))
time=c(1,2,3,4,1,2,3,4,5,1,2)
dd=data.frame(names,time)


dd <- group_by(dd, names)
dd <- mutate(dd, seq=seq_along(names))
extr <- summarise(dd, minw=min(time), maxw=max(time))


> dd
Source: local data frame [11 x 3]
Groups: names

   names time seq
1      a    1   1
2      a    2   2
3      a    3   3
4      a    4   4
5      b    1   1
6      b    2   2
7      b    3   3
8      b    4   4
9      b    5   5
10     c    1   1
11     c    2   2
> extr
Source: local data frame [3 x 3]

  names minw maxw
1     a    1    4
2     b    1    5
3     c    1    2

The final output that I need is as below. I want to add two columns - first_indicator and last_indicator which will have value "yes" if combination of name and sequence have the first and last value respectively. How could i do it using dd and extr dataframes that i am generating above?

   names time seq first_indicator last_indicator
1      a    1   1             yes               
2      a    2   2                               
3      a    3   3                               
4      a    4   4                            yes
5      b    1   1             yes               
6      b    2   2                               
7      b    3   3                               
8      b    4   4                               
9      b    5   5                            yes
10     c    1   1             yes               
11     c    2   2                            yes
3

There are 3 best solutions below

0
On BEST ANSWER

In base R, use ave:

dd$first <- dd$time==1                              #1 is always the start of a group
dd$last  <- dd$time==ave(dd$time,dd$names,FUN=max)  #check against max group value

#   names time first  last
#1      a    1  TRUE FALSE
#2      a    2 FALSE FALSE
#3      a    3 FALSE FALSE
#4      a    4 FALSE  TRUE
#5      b    1  TRUE FALSE
#6      b    2 FALSE FALSE
#7      b    3 FALSE FALSE
#8      b    4 FALSE FALSE
#9      b    5 FALSE  TRUE
#10     c    1  TRUE FALSE
#11     c    2 FALSE  TRUE

Using data.table you could do something like:

library(data.table)
setDT(dd)[,c("first","last") := list(time==1,time==.N), by=names]
0
On

Perhaps not the most elegant answer, but...

extr$first_indicator <- rep("yes", nrow(extr))
extr$last_indicator <- rep("yes", nrow(extr))

dd <- merge(dd, extr[c(1,2,4)], by.x = 1:2, by.y = 1:2, all = TRUE)
dd <- merge(dd, extr[c(1,3,5)], by.x = c(1,3), by.y = 1:2, all = TRUE)

Should work. You could, of course, wrap it all in a function if you needed to do this multiple times.

0
On

You could do:

dd %>% 
  group_by(names) %>% 
  mutate(first = ifelse(first(time) == time, "yes", ""), 
         last = ifelse(last(time) == time, "yes", ""))

Which gives:

#Source: local data frame [11 x 4]
#Groups: names
#
#   names time first last
#1      a    1   yes     
#2      a    2           
#3      a    3           
#4      a    4        yes
#5      b    1   yes     
#6      b    2           
#7      b    3           
#8      b    4           
#9      b    5        yes
#10     c    1   yes     
#11     c    2        yes