Function for identifying outliers

124 Views Asked by At

I have a dataset that looks like this:

data <- structure(list(Date = structure(c(-2208988800, -2208902400, -2208816000, 
-2208729600, -2208643200, -2208556800, -2208470400, -2208384000, 
-2208297600, -2208211200, -2208124800, -2208038400, -2207952000
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), count = c(4668.8, 
4476.9, 4945, 5275.7, 15013.1, 14418, 14059.1, 14043.5, 14142.2, 
14904.2, 13849.9, 14712.1, 8793.9)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -13L))
Date count
01-01-1900 4,668.80
02-01-1900 4,476.90
03-01-1900 4,945.00
04-01-1900 5,275.70
05-01-1900 15,013.10
06-01-1900 14,418.00
07-01-1900 14,059.10
08-01-1900 14,043.50
09-01-1900 14,142.20
10-01-1900 14,904.20
11-01-1900 13,849.90
12-01-1900 14,712.10
13-01-1900 8,793.90

I am trying to write a function that adds columns based on whether the previous cell is an outlier. I am hoping for a dataset that looks like this:

Date count Outlier_T1 Outlier_T2 Outlier_T3 Outlier_T4 Outlier_T5 Outlier_T6 Outlier_T7 Outlier_T8 Outlier_T9 Outlier_T10 Outlier_T11 Outlier_T12 Outlier_T13
01-01-1900 4,668.80 0 0 0 0 0 0 0 0 0 0 0 0 0
02-01-1900 4,476.90 0 0 0 0 0 0 0 0 0 0 0 0 0
03-01-1900 4,945.00 0 0 0 0 0 0 0 0 0 0 0 0 0
04-01-1900 5,275.70 0 0 0 0 0 0 0 0 0 0 0 0 0
05-01-1900 15,013.10 1
06-01-1900 14,418.00 1
07-01-1900 14,059.10 1
08-01-1900 14,043.50 1
09-01-1900 14,142.20 1
10-01-1900 14,904.20 1
11-01-1900 13,849.90 1
12-01-1900 14,712.10 1
13-01-1900 8,793.90 1

Until the fourth row, there aren't any outliers. But, the fifth row is an outlier, therefore outlier_t5 = 1. Now, that outlier_t5 equals 1, it is exempt from the analysis, therefore outlier_t5 = NA, but outlier_t6 = 1 (because the first four rows and the sixth row are part of the next outlier calculation) ... and so on.

I would really appreciate some help here.

0

There are 0 best solutions below