How do I know how well my clustering of geospatial data has worked?

99 Views Asked by At

I have a number of coordinates points, each associated with a particular landmark however they have varying and unknown degrees of accuracy. For each of these landmarks I have the coordinates of when a visitor says they are 'at the landmark'.

I would like to use the 'at landmark' coordinates to improve the accuracy of the landmarks for future visitors. However, as I change the parameters of the clustering algorithm, I really have no way of knowing whether I'm improving the likelihood of having actually improved upon existing locations or not, on average.

I would like to create an objective function which I could use as a proxy for this - any thoughts?

Note that google maps API calls will likely be unreliable due to imperfect addresses of the landmarks.

2

There are 2 best solutions below

1
On

One example is the posterior of a Gaussian Mixture Model. You can find some examples here: https://ch.mathworks.com/help/stats/clustering-using-gaussian-mixture-models.html

There are of course other clustering algorithms. Which one are you using?

1
On

If you want to reduce all these user tags to a single coordinate, I would suggest (except at the dateline) to simply use the median.

The reason is that the median has a very high breakdown point, i.e., it is robust to outliers.