I am trying to model species distributions and I am using species presence data only. I would like to correct my models for sampling bias in the region. To achieve this, I fit a model using gstat and wanted to use terra::interpolate, to be able to weight down at the end my self-generated species absences in the places where there are few species surveys.
I expected my output at the end to display somehow the distribution of the amount of sampled species in a grid cell (sum), and I was expecting to use this factor to weight down my absences, as explained above. My output is a linear gradient and I do not think that I am getting the output I need.
Here is a snippet of the data
df <- data.frame(x = c(8.875, 8.875, 9.375, 9.625, 9.875, 10.125,
6.125, 6.375, 6.625, 7.125, 7.375, 7.625, 8.875, 9.125, 9.375,
9.625, 9.875, 10.125, 10.375, 10.875, 11.125, 2.875, 3.125, 3.375,
3.625, 3.875, 4.125, 4.375, 4.625, 4.875),
y = c(37.625, 37.375,
37.375, 37.375, 37.375, 37.375, 37.125, 37.125, 37.125, 37.125,
37.125, 37.125, 37.125, 37.125, 37.125, 37.125, 37.125, 37.125,
37.125, 37.125, 37.125, 36.875, 36.875, 36.875, 36.875, 36.875,
36.875, 36.875, 36.875, 36.875),
sum = c(0, 0, 0, 4, 10, 0, 2,
10, 10, 10, 10, 10, 1, 10, 5, 10, 10, 10, 10, 10, 10, 10, 10,
10, 7, 10, 10, 10, 10, 10))
Here is my code:
#Define the extent and resolution of the blank raster for interpolate
extent <- c(-25.36097, 51.4157, -46.9825, 37.55986)
cell_size <- c(0.25, 0.25)
#Create the blank raster with the specified extent and resolution
blankRaster <- rast(ext(extent), res=.25)
res(blankRaster) <- cell_size
# Fit the model. The "sum" of species in a cell is dependant of the coordinates, therefore sum~x+y
model <- gstat(id = "sum", formula = sum~x+y, locations = ~x+y, data =df)
#Interpolate results
interp <- terra::interpolate(blankRaster, model, xyNames = c("x", "y"))
plot(interp[[1]])
I tried fitting the model in different ways and using different maps when using terra::interpolate, but the output is always pretty similar.
Any help would be much appreciated!

The results seem reasonable given your data and choice of extent. That is easier to see with a smaller extent; and considering that all southern values are high, and the most northern values are low.
Perhaps least squares interpolation is not what you are after. You could try