Clustering algorithms usually take into account that what might be perceived as a reasonable cluster by a human being is ambiguous and the computed solution is supposed to generalize and predict well.
This is why I am hesitant to just use the tried-and-tested algos for my specific situation - which does not mean that I am sure those won't work or might actually be optimal. I just would like to verify that.
So let's look at the following example.
Essentially the clusters are obvious and except for exceptions unabmiguous because they are actually linearly separable. The data I refer to is two-dimensional. The clusters follow an unknown distribution with a mode and are independent.
What algorithm performs (speed, robustness, simplicity) well for this specific cluster patterns?
rotate <- function(xy, deg, cen) {
xy <- xy - cen
return(c(
xy[1] * cos(deg) - xy[2] * sin(deg),
xy[2] * cos(deg) + xy[1] * sin(deg)
) + cen)
}
G <- expand.grid(1:2,1:2)
S <- list()
N <- 100
for(i in 1:nrow(G)) {
set <- data.frame(x = rgamma(N,3,2)*0.2 + G[i,1], y=rgamma(N,3,2)*.1 + G[i,2])
S[[i]] <- t(apply(set,1,rotate,runif(1,0,pi),c(mean(set[,1]),mean(set[,2]))))
}
S <- do.call(rbind, S)
plot(S)
Standard k-means clustering would work well and fast for the picture you gave. In general, k-means clustering will work well for pictures like yours except in situations where some of your clusters are skinny separated ellipsoids and the center of one ellipsoid is near the far away points of another. If that's the case, then you're probably better off using one of the clustering ideas that greedily clusters together points that are closest to one another, and then hierarchically keeps merging nearby groups of points until a threshold on distance between groups of points is reached (or until you reach the desired number of clusters if you know the number of clusters ahead of time).
The only thing about k-means clustering is that if you use it off-the-shelf, then you need to know how many clusters you want to have. There are ways to choose the number of clusters based on the data though, if you don't know how many clusters to choose, take a look online if you're interested.