So basically the data looks like this
The Unique ID repeats, the integer is always different, and the year_month goes from 2001_1 (jan 2001) to 2001_12(Dec 2001) AND it repeats for more years 2002_1-12, 2003_1-12
The Unique ID is an individual, the integer is the likelihood of finding that individual during that particular year_month.
I need to calculate the mean likelihood of finding the individual for each month throughout all years.
So I can say for individual 1, the probability of finding them in January is X , in February is X
So my first thought was aggregate by Unique ID and then combine/average probability for each month.
There are ~3.5 thousand unique IDs in each excel sheet. Each has a integer and then a year_month. I merged all excel sheets and now have ~ 1.6 million rows.
I don't know if it's bc the data is so big but I can't seem to figure this out.