check in which range a number falls and return a new column with corresponding shift

78 Views Asked by At

I have two data frames ch and shift. In data frame ch i have a column with name pos which contains numbers as shown below. In data frame shift i have three columns with name shft, start and end.

ch <- structure(list(pos = c(3.25, 3.75, 4.25, 4.75, 5.25, 5.75, 6.25, 
6.75, 7.25, 7.75, 0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3.25, 3.75, 
4.25, 4.75, 5.25, 5.75, 6.25, 6.75, 7.25, 7.75, 0.25, 0.75, 1.25, 
1.75, 2.25)), .Names = "pos", row.names = c(NA, -31L), class = "data.frame")

head(ch)
   pos
1 3.25
2 3.75
3 4.25
4 4.75
5 5.25
6 5.75

shift <- structure(list(shift = structure(c(2L, 3L, 2L, 4L, 3L, 4L, 3L, 
1L, 4L, 1L, 4L, 2L, 1L, 2L, 1L, 3L, 2L), .Label = c("A", "B", 
"C", "D"), class = "factor"), start = c(0, 0.25, 0.75, 1.25, 
1.75, 2.25, 2.75, 3.25, 3.75, 4.25, 4.75, 5.25, 5.75, 6.25, 6.75, 
7.25, 7.75), end = c(0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3.25, 
3.75, 4.25, 4.75, 5.25, 5.75, 6.25, 6.75, 7.25, 7.75, 8)), .Names = c("shft", 
"start", "end"), class = "data.frame", row.names = c(NA, -17L
))

head(shift)

   shft  start end
1     B  0.00 0.25
2     C  0.25 0.75
3     B  0.75 1.25
4     D  1.25 1.75
5     C  1.75 2.25
6     D  2.25 2.75

I want to check the each number from pos column in ch dataframe with the range listed in start and end columns of shift data frame and the respective shft such as A,B,C,D has to be assigned in a column with a name shift.

the comparison has to be like >= start and < end.

by looking at a question at stackoverflow i found a solution but it's working like in which range the value will fall

library(data.table)
T1 <- data.table(ch)
T2 <- data.table(shift)
setkey(T2, start, end)
T1[, c("start", "end") := pos] 
foverlaps(T1, T2)

with the above command line, the result is like for 0.25 i will get a row with shift B and one row with shift c. in my dataframe ch i have 31 rows and after excuting above scripts in the result i have 62 rows.

can some tell me how can i perform the comparison(>= start and < end) not just falling the range. data frame ch in reality will be having numbers not only what i have shown like 0.25, 3.25, 7.25 but also 3.14, 0.89,7.25,6.93,5,46.

1

There are 1 best solutions below

8
Shape On

Given that all of your partitions overlap: EDIT: I realize now you wanted >= start, rather than <= end, which is an easy fix, cut takes right = FALSE

base R:

ch$shift <- cut(ch$pos,breaks = c(0,shift$end),labels = shift$shft, right = FALSE)

with dplyr:

ch <- ch %>% mutate(shift = cut(pos,breaks = c(0,shift$end),labels = shift$shft, right = FALSE))