about filter in R

40 Views Asked by At

I have (df) has (ID), (Adm_Date), (ICD_10), (points). and it has 1,000,000 rows. (Points) represent value for (ICD_10) (ID): each one has many rows (Adm_Date) from 2010-01-01 to 2018-01-01. I want the sum (points) without duplicate for filter rows starting from (Adm_date) to 2 years previous back from (Adm_Date) by (ID).
The periods like these: 01-01-2010 to 31-01-2012, 01-02-2010 to 29-02-2012, 01-03-2010 to 31-03-2012,...... so on to the last date 01-12-2016 to 31-12-2018. my problem is with the filter of the dates. It does not filter the rows based on period date. It does sum (points) for each (ID) without duplicates for all data from the 2010 to 2018 period instead of summing them per period date for each (ID).

I used these codes

start.date= seq(as.Date (df$Adm_Date))
end.date = seq(as.Date (df$Adm_Date+ years(-2)))

Sum_df<- df %>% dplyr::filter(Adm_Date >=start.date & Adm_Date<=end.date) %>%  
  group_by(ID) %>%
  mutate(sum_points = sum(points*!duplicated(ICD_10)))

but the filiter did not work, because it does sum (points) for each (ID) from all dates from the 2010 to 2018 instead of summing them per period date for each (ID).

sum_points will start from 01-01-2012, any Adm_Date >= 01-01-2012 I need to get their sum. If I looked at the patient has ID=11. I will sum points from row 3 to row 23, Also I need to ignore repeat ICD_10 (e.g. G81, and I69 have repeated in this period). so results show like this ID(11), Adm_Date(07-05-2012), sum_points(17), while the sum points for the same patient at Adm_Date(13-06-2013) I will sum from row 11 to row 27 because look back for 2 years from Adm_Date. So, ID(11), Adm_Date(13-06-2013), sum_points(14.9) I have about a half million of ID and more than a million rows.

I hope I explained it well. Thank you

enter image description here

0

There are 0 best solutions below