This is my first post, so hopefully I explain what I need to do properly. I am still quite new to R and I may have read posts that answer this, but I just can't for the life of me understand what they mean. So apologies in advance if this has already been answered.
I have a very large data set of GPS locations from radiocollars and there are inconsistent numbers of locations for each day. I want to go through the dataset and select a single data point for each day based on the accuracy level of the GPS signal.
So it essentially looks like this.
Accuracy Month Day Easting Northing Etc
5 6 1 ####### ######## #
3.2 6 1 ####### ######## #
3.8 6 1 ####### ######## #
1.6 6 2 ####### ######## #
4 6 3 ####### ######## #
3.2 6 3 ####### ######## #
And I want to pull out the most accurate point for each day (the lowest accuracy measure) while keeping the rest of the associated data.
Currently I have been using the tapply function
datasub1<-subset(data,MONTH==6)
tapply(datasub1$accuracy, datasub1$day, min)
Using this method I can successfully retrieve the minimum values, one for each day, however I cannot take the associated coordinates and timing, and all the other important information along with it, and as the data set is nearly 300 000 rows, I really can't do it by hand.
So essentially, I need to get the same results as the tapply, but instead of individual points, I need the entire row that that point is found in.
Thanks in advance to anyone that could lend a hand. If you need any more information, let me know, I'll try my best to get it to you.
You can use
ddply
: it cuts a data.frame into pieces (one per day) and applies a function to each piece.