Clustering origin/destination points

320 Views Asked by At

I have 1000 geo-points (lat, long) as origin/destination points. There is also a historical data that shows the cost of traveling between some of the O-D pairs. For some of the O-Ds there is no record in the dataset and some have multiple records with different costs (e.g. because of seasonality).

I want to cluster these 1000 points to a few clusters (e.g. 20) not only based on their location (lat, long), but also considering the average cost of travel and shared destination points.

I appreciate if you could let me know if you have any suggestion on clustering these data.

1

There are 1 best solutions below

0
On

You have to deal somehow with missing values - assign some given label for them or take some mean/median value. Then you can use any algorithm you want (different types of features can be used together as an input to the algorithm)

If there is not too many dimensions of the data and you know more or less how many cluster there may be, k-means algorithm should work good.

If you want to visualize your data and clusters on 2d and 3d, and you'll have more features, you will have to apply dimensionality reduction (PCA, t-SNE).