Cluster a list of geographic points by distance and constraints

Question

Cluster a list of geographic points by distance and constraints

2.4k Views Asked by Alex At 04 April 2025 at 22:04

I have a delivery app, and I want to group orders (each order has a lat and lng coordinates) by location proximity (linear distance) and constraints like max orders and max total products (each order has an amount of products) inside a group.

For proximity grouping I used DBSCAN

coordinates = [[lat,lng],[lat,lng]],[lat,lng]],[lat,lng]],[lat,lng]]]
distance_matrix = squareform(pdist(coordinates, (lambda u,v: haversine(u,v))))

#eps=0.1 => 100m radius, 50m linear
db = DBSCAN(eps=0.1, min_samples=2, metric='precomputed')
results = db.fit(distance_matrix)

How can I add constraints in this functionality?

Is there anyway of doing this by using something else than DBSCAN or HDBSCAN ?

Original Q&A

There are 2 best solutions below

**Gilseung Ahn** · Answer 1

Unfortunately, I think the model you want should be developed from the scratch.

Your question can be modeled as following optimization model.

Objective function: minimize the number of clusters, K Constraint (1) The size of every cluster is equal or smaller than S (parameter) (2) The number of orders in every cluster is equal or smaller than O (parameter) (3) For a sample x and its cluster Ck, dist(x, Ck) be the minimum among dist(x, C1), dist(x, C2), ..., dist(x, CK).

Solving this problem would require a lot of efforts...

**ASH** · Answer 2

This is an interesting question. I suppose it can be done a bunch of different ways. Here is one solution for you to condsider.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import seaborn as sns; sns.set()
import csv


df = pd.read_csv('C:\\your_path\\properties_2017.csv')
# df.head(10)
df = df.head(10000)

df.shape


df.dropna(axis=0,how='any',subset=['latitude','longitude'],inplace=True)

# Variable with the Longitude and Latitude
X=df.loc[:,['parcelid','latitude','longitude']]
X.head(10)

K_clusters = range(1,10)
kmeans = [KMeans(n_clusters=i) 

for i in K_clusters]
Y_axis = df[['latitude']]
X_axis = df[['longitude']]
score = [kmeans[i].fit(Y_axis).score(Y_axis)

for i in range(len(kmeans))] # Visualize
plt.plot(K_clusters, score)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Curve')
plt.show()

kmeans = KMeans(n_clusters = 10, init ='k-means++')
kmeans.fit(X[X.columns[1:3]]) # Compute k-means clustering.X['cluster_label'] = kmeans.fit_predict(X[X.columns[1:3]])centers = kmeans.cluster_centers_ # Coordinates of cluster centers.labels = kmeans.predict(X[X.columns[1:3]]) # Labels of each pointX.head(10)

X['cluster_label'] = kmeans.fit_predict(X[X.columns[1:3]])
centers = kmeans.cluster_centers_ # Coordinates of cluster centers.
labels = kmeans.predict(X[X.columns[1:3]]) # Labels of each pointX.head(10)

X.head(5)

X = X[['parcelid','cluster_label']]
X.head(5)


clustered_data = df.merge(X, left_on='parcelid', right_on='parcelid')
clustered_data.head(5)

centers = kmeans.cluster_centers_
print(centers)


X=df.loc[:,['parcelid','latitude','longitude']]
X.plot.scatter(x = 'latitude', y = 'longitude', c=labels, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.5)

data = X labels = kmeans.labels_

plt.subplots_adjust(bottom = 0.1)
plt.scatter(data.iloc[:, 1], data.iloc[:, 2], c=kmeans.labels_, cmap='rainbow') 

for label, x, y in zip(labels, data.iloc[:, 1], data.iloc[:, 2]):
    plt.annotate(
        label,
        xy=(x, y), xytext=(-20, 20),
        textcoords='offset points', ha='right', va='bottom',
        bbox=dict(boxstyle='round,pad=0.5', fc='red', alpha=0.5),
        arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))

plt.show()

# labels pointing to each data point (this is a big jumbled together; you should probably select fewer data points to analyze).

Reference:

https://levelup.gitconnected.com/clustering-gps-co-ordinates-forming-regions-4f50caa7e4a1

Data source:

https://www.kaggle.com/c/zillow-prize-1/data

Cluster a list of geographic points by distance and constraints

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in CLUSTER-ANALYSIS

Related Questions in LATITUDE-LONGITUDE

Related Questions in HDBSCAN

Trending Questions

Popular # Hahtags

Popular Questions