Spatial data with duplicates and missing points

566 Views Asked by At

I am analysing data from an egg survey. Data is available from different points in the North Sea, some stations are recorded double at different dates. The sea should be covered by 0.5 x 0.5 degree squares. I have two questions for which I couldn't find any solution yet:

  1. How do I replace the points with duplicated locations and different dates with a mean value? I know how to remove duplicates or how to replace them by max or min but couldn't find a way how to calculate a mean.

  2. How do I calculate interpolated values for the missing points, based on neighbouring cells. Interpolated values should be calculated as long and only if at least two recorded points are neighbouring.

I tried with setting a grid, but did not come very far as I couldn't find a way how to tell R when to interpolate and when not.

Sample data:

egg_data <- structure(list(Latitude = c(54.25, 54.25, 54.25, 54.25, 54.25, 
54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 54.25, 54.25, 54.25, 53.25, 58.25, 57.75, 
57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 56.75, 
56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 
56.75, 56.75, 56.75, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 
56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 
56.25, 56.75, 56.75, 56.75), Longitude = c(6.25, 5.25, 5.25, 
4.25, 4.25, 3.25, 3.25, 2.25, 2.25, 1.25, 1.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 1.25, 1.25, 2.25, 2.25, 3.25, 3.25, 4.25, 
4.25, 5.25, 5.25, 5.25, 5.25, 4.25, 4.25, 3.25, 3.25, 2.25, 2.25, 
1.25, 1.25, 0.25, 0.25, 0.25, 0.25, 1.25, 1.25, 0.25, 0.25, 0.25, 
0.25, 3.25, 3.25, 3.25, 2.75, 2.25, 1.75, 1.25, 0.75, 0.25, 0.25, 
0.25, 0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3.25, 3.75, 4.25, 4.75, 
5.25, 5.75, 6.25, 5.75, 5.25, 4.75, 4.25, 3.75, 3.25, 2.25, 1.75, 
1.25, 0.75, 0.25, 0.25, 0.75, 1.25, 1.75, 1.75, 1.25, 0.75), 
    Eggs = c(9L, 6L, 4L, 20L, 57L, 14L, 35L, 18L, 4L, 1L, 3L, 
    100L, 1L, 201L, 0L, 51L, 52L, 23L, 19L, 4L, 5L, 23L, 11L, 
    18L, 7L, 7L, 14L, 6L, 3L, 4L, 20L, 13L, 19L, 5L, 16L, 23L, 
    28L, 11L, 9L, 12L, 19L, 62L, 6L, 3L, 15L, 110L, 57L, 0L, 
    14L, 3L, 3L, 8L, 94L, 62L, 7L, 19L, 511L, 59L, 283L, 308L, 
    20L, 44L, 61L, 24L, 10L, 10L, 15L, 6L, 8L, 12L, 32L, 2L, 
    5L, 10L, 21L, 4L, 1L, 19L, 3L, 4L, 4L, 17L, 51L, 108L, 1213L, 
    132L, 4L, 0L, 0L, 0L)), .Names = c("Latitude", "Longitude", 
"Eggs"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", 
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", 
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", 
"82", "83", "84", "85", "86", "87", "88", "89", "90"))

Thank you very much!!

1

There are 1 best solutions below

1
On

Add a factor for each location

egg_data <- within(egg_data, Location <- paste("(", Latitude, ", ", Longitude, ")", sep = "") )

EDIT: There's no point in being fancy about this, since we want to reverse the process shortly.

egg_data <- within(egg_data, 
  Location <- paste(Latitude, Longitude, sep = ",")
)

Then there are loads of ways of getting the mean.

means_by_location <- with(egg_data, tapply(Eggs, Location, mean))

or

library(plyr)
means_by_location2 <- ddply(egg_data, .(Location), summarise, Mean.eggs = mean(Eggs))

or

means_by_location3 <- aggregate(Eggs ~ Location, egg_data, mean)

or

means_by_location4 <- with(egg_data, by(Eggs, Location, mean))

EDIT: For the next bit, you want to hav the result in a data frame, so use method 2 or 3.

Add the latitude and longitude back in to your new dataset. (Lots of ways of doing this.)

lat_long <- strsplit(means_by_location2$Location, ",")
means_by_location2$Latitude <- sapply(lat_long, function(x) x[1]) 
means_by_location2$Longitude <- sapply(lat_long, function(x) x[2])

This is your first question answered.


For the second question, you need to think a bit more. Take a look a plot of eggs by location.

library(ggplot2)
(p <- ggplot(means_by_location2, aes(Longitude, Latitude, colour = log10(Mean.eggs  +1))) +
  geom_point() +
  scale_colour_gradient(low = "#FFFFFF", high = "#0000FF", space = "Lab")
)

Are you interpolating north to south, or east to west, or with all neighbouring points? There are lots of different possibilities and they may have different answers. It's a nontrivial task to say which interpolation is best.