How to reference "cells" within a column in R?

1.3k Views Asked by At

I'm trying to calculate numeric ranges based on the moving average of a column of data. I have found a way to use caTools::runmean to produce a column of moving averages, and I know how to work with this in Excel to produce the columns I want, but I would love to know a way to do all of this in one R script.

Here is my simplified reproducible example for R.

library(tidyverse)
library(caTools)
data <- as_tibble(data.frame(
  Index = as.integer(c(18,19,21,22,23,25,26,29)),
  mydbl = c(8.905,13.31,15.739,17.544,19.054,20.393,21.623,22.764)))

data <- data %>% 
  mutate(avg = runmean(mydbl,
                       k = 2,
                       alg = "exact",
                       endrule = "NA"))

This tibble will look like this:

> data
# A tibble: 8 x 3
  Index mydbl   avg
  <int> <dbl> <dbl>
1    18  8.90  NA  
2    19 13.3   11.1
3    21 15.7   14.5
4    22 17.5   16.6
5    23 19.1   18.3
6    25 20.4   19.7
7    26 21.6   21.0
8    29 22.8   22.2

To produce the remaining data I want, I exported this to Excel with write_csv(data,...) and the final table is shown below. The first value in dbl_i is the formula =B2-ABS(C3-B2) (the difference between mydbl and the next avg subtracted from mydbl to create an equidistant lower limit). The last value in dbl_f is the formula =B9+ABS(C9-B9) (the difference between mydbl and the avg added to mydbl to create an equidistant upper limit). The other values in the two columns are just direct references to the avg column.

Index   mydbl   avg     dbl_i   dbl_f
18      8.905   NA      6.7025  11.1075
19      13.31   11.1075 11.1075 14.5245
21      15.739  14.5245 14.5245 16.6415
22      17.544  16.6415 16.6415 18.299
23      19.054  18.299  18.299  19.7235
25      20.393  19.7235 19.7235 21.008
26      21.623  21.008  21.008  22.1935
29      22.764  22.1935 22.1935 23.3345

Yes, the dbl_i is just the avg column but with the first value being =B2-abs(C3-B2). And the dbl_f column is the same as the avg column except it's moved up one, and the final value is =B9+abs(C9=B9). Ultimately it seems the real problem lies in finding a way to reproduce the Excel calculations D2=B2-ABS(C3-B2) and E9=B9+ABS(C9-B9).

Does anyone know how they would reproduce these calculations in R? I was looking for a way to create a formula in R that could be the equivalent of B2-ABS(C3-B2), but could not find one, unless I create a matrix instead. Do I have to create a matrix?

Thanks for your time.

1

There are 1 best solutions below

3
On BEST ANSWER
data %>% 
  mutate(
    avg = zoo::rollmean(mydbl, 2, align="right", fill=NA),
    dbl_i = if_else(row_number() == 1L, mydbl - abs(lead(avg) - mydbl), avg),
    dbl_f = if_else(row_number() == n(), mydbl + abs(avg - mydbl), lead(avg))
  )
# # A tibble: 8 x 5
#   Index mydbl   avg dbl_i dbl_f
#   <int> <dbl> <dbl> <dbl> <dbl>
# 1    18  8.90  NA    6.70  11.1
# 2    19 13.3   11.1 11.1   14.5
# 3    21 15.7   14.5 14.5   16.6
# 4    22 17.5   16.6 16.6   18.3
# 5    23 19.1   18.3 18.3   19.7
# 6    25 20.4   19.7 19.7   21.0
# 7    26 21.6   21.0 21.0   22.2
# 8    29 22.8   22.2 22.2   23.3

Honestly it's not the most elegant, but it gets the job done.

(BTW: I'm using zoo::rollmean because I don't have caTools installed, but it's the same effect I believe.)