Select first observed data and utilize mutate

Question

Select first observed data and utilize mutate

310 Views Asked by b222 At 11 June 2015 at 17:02

I am running into an issue with my data where I want to take the first observed ob score score for each individual id and subtract that from that last observed score.

The problem with asking for the first observation minus the last observation is that sometimes the first observation data is missing.

Is there anyway to ask for the first observed score for each individual, thus skipping any missing data?

I built the below df to illustrate my problem.

help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20),
                   ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3),
                   score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4))

   id ob score
1   5  1    NA
2   5  2     2
3   5  3     3
4   5  4     4
5   5  5     3
6  12  1     7
7  12  2     3
8  12  3     4
9  17  1     3
10 17  2     4
11 20  1    NA
12 20  2     1
13 20  3     4

And what I am hoping to run is code that will give me...

   id ob score  es
1   5  1    NA  -1
2   5  2     2  -1
3   5  3     3  -1
4   5  4     4  -1
5   5  5     3  -1
6  12  1     7   3
7  12  2     3   3
8  12  3     4   3
9  17  1     3  -1
10 17  2     4  -1
11 20  1    NA  -3
12 20  2     1  -3
13 20  3     4  -3

I am attempting to work out of dplyr and I understand the use of the 'group_by' command, however, not sure how to 'select' only first observed scores and then mutate to create es.

Original Q&A

There are 3 best solutions below

Jthorpe On 11 June 2015 at 17:52

This solution is a little verbose, only b/c it relies on a couple of helper functions FIRST and LAST:

# The position (indicator) of the first value that evaluates to TRUE.
LAST  <-  function (x, none = NA) {
    out <- FIRST(reverse(x), none = none)
    if (identical(none, out)) {
        return(none)
    }
    else {
        return(length(x) - out + 1)
    }
}
# The position (indicator) of the last value that evaluates to TRUE.
FIRST  <-  function (x, none = NA) 
{
    x[is.na(x)] <- FALSE
    if (any(x)) 
        return(which.max(x))
    else return(none)
}

# returns the difference between the first and last non-missing values
diff2  <-  function(x)
    x[LAST(!is.na(x))] - x[FIRST(!is.na(x))]


library(dplyr)
help %>% 
    group_by(id) %>% 
    arrange(ob) %>% 
        summarise(diff = diff2(score))

Yifei On 12 June 2015 at 07:08

library(dplyr)

temp <- help %>% group_by(id) %>% 
     arrange(ob) %>%
     filter(!is.na(score)) %>% 
     mutate(es = first(score) - last(score)) %>%
     select(id, es) %>%
     distinct()

help %>% left_join(temp)

**MrFlick** · Accepted Answer · 2015-06-11T17:40:05.530000

I would use first() and last() (both dplyr function) and na.omit() (from the default stats package.

First, I would make sure your score column was a numberic column with proper NA values (not strings as in your example)

help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20),
       ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3),
       score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4))

then you can do

library(dplyr)
help %>% group_by(id) %>% arrange(ob) %>% 
    mutate(es=first(na.omit(score)-last(na.omit(score))))

Select first observed data and utilize mutate

There are 3 best solutions below

Related Questions in R

Related Questions in DPLYR

Trending Questions

Popular # Hahtags

Popular Questions