Best way to calculate growth rate with different timepoints

92 Views Asked by At

I have some data that looks like this

enter image description here

I want to calculate growth rate of each tumor (cm/month) and then eventually calculate an average growth rate.

I have previously done this manually for each time frame and then average all the growth rates per tumor together. For example, rate for tumor 1 would be (-0.2/2 + 0.7/10 + -0.1/11)/3.

Is there a better/more elegant way of doing this in R?

2

There are 2 best solutions below

0
Onyambu On
library(tidyverse)
df %>%
   group_by(ID) %>%
   mutate(growth_rate = c(NA, diff(size)/diff(time)))

# A tibble: 15 × 6
# Groups:   ID [3]
      ID  size Biopsy_Date Date_of_Scan  time growth_rate
   <int> <dbl> <chr>       <chr>        <int>       <dbl>
 1     1   2.2 10/29/2020  10/29/2020       0    NA      
 2     1   2   2/11/2020   1/15/2021        2    -0.100  
 3     1   2.7 2/26/2019   11/10/2021      12     0.07   
 4     1   2.6 6/8/2017    10/24/2022      23    -0.00909
 5    10   5.4 11/19/2015  11/19/2015       0    NA      
 6    10   5.8 6/29/2018   12/18/2019       8     0.0500 
 7    10   5.4 9/27/2018   5/27/2020       13    -0.0800 
 8    10   5.8 6/8/2017    12/15/2020      20     0.0571 
 9    10   5.9 11/15/2019  12/29/2021      32     0.00833
10    10   5.9 4/23/2018   12/22/2022      44     0      
11    17   3.3 11/19/2015  11/19/2015       0    NA      
12    17   3.2 6/29/2018   7/28/2021       12    -0.00833
13    17   3.6 9/27/2018   8/8/2022        24     0.0333 
14    17   4   6/8/2017    7/28/2023       36     0.0333 
15    17   3.1 10/21/2016  8/4/2023        36  -Inf      
0
NicChr On

We can also calculate the average growth rates directly.

Note that this method incorporates the gaps in time into the calculation, i.e the average relative difference per unit time (or month in this case).

library(dplyr)
df %>%
  mutate(growth_rate = (size / first(size) ) ^ (1 / (time - first(time))),
         .by = id)
   id size biopsy_date date_of_scan time growth_rate
1   1  2.2  2020-10-29   2020-10-29    0   1.0000000
2   1  2.0  2020-02-11   2021-01-15    2   0.9534626
3   1  2.7  2019-02-26   2021-11-10   12   1.0172127
4   1  2.6  2017-06-08   2022-10-24   23   1.0072897
5  10  5.4  2015-11-19   2015-11-19    0   1.0000000
6  10  5.8  2018-06-29   2019-12-18    8   1.0089724
7  10  5.4  2018-09-27   2020-05-27   13   1.0000000
8  10  5.8  2017-06-08   2020-12-15   20   1.0035793
9  10  5.9  2019-11-15   2021-12-29   32   1.0027711
10 10  5.9  2018-04-23   2022-12-22   44   1.0020146
11 17  3.3  2015-11-19   2015-11-19    0   1.0000000
12 17  3.2  2018-06-29   2021-07-28   12   0.9974390
13 17  3.6  2018-09-27   2022-08-08   24   1.0036321
14 17  4.0  2017-06-08   2023-07-28   36   1.0053580
15 17  3.1  2016-10-21   2023-08-04   36   0.9982648

For example, for id 1 at time 2, the size reduced on average by ~ 5% each month. For id 1, at time 12, the size increased on average by ~ 1.7% since time 0.

We can verify that these rates are correct by multiplying each initial size by the last rate to the power of the number of time steps (our time variable).

df %>%
  group_by(id) %>%
  mutate(growth_rate = (size / first(size) ) ^ (1 / (time - first(time)))) %>%
  summarise(actual_last_size = last(size),
            estimated_last_size = first(size) * (last(growth_rate)^(last(time))))
# A tibble: 3 × 3
     id actual_last_size estimated_last_size
  <int>            <dbl>               <dbl>
1     1              2.6                2.6 
2    10              5.9                5.90
3    17              3.1                3.10

Data

structure(list(id = c(1L, 1L, 1L, 1L, 10L, 10L, 10L, 10L, 10L, 
10L, 17L, 17L, 17L, 17L, 17L), size = c(2.2, 2, 2.7, 2.6, 5.4, 
5.8, 5.4, 5.8, 5.9, 5.9, 3.3, 3.2, 3.6, 4, 3.1), biopsy_date = structure(c(18564, 
18303, 17953, 17325, 16758, 17711, 17801, 17325, 18215, 17644, 
16758, 17711, 17801, 17325, 17095), class = "Date"), date_of_scan = structure(c(18564, 
18642, 18941, 19289, 16758, 18248, 18409, 18611, 18990, 19348, 
16758, 18836, 19212, 19566, 19573), class = "Date"), time = c(0L, 
2L, 12L, 23L, 0L, 8L, 13L, 20L, 32L, 44L, 0L, 12L, 24L, 36L, 
36L)), row.names = c(NA, -15L), class = "data.frame")