SMILE xmeans gave wrong clustering

59 Views Asked by At

I am testing xmeans clustering with this 2 dimensional array and smile-core 2.4.0.

import smile.clustering.xmeans

val arrData = Array(Array(0.0,0,0),
  Array(0.0,0,0),
  Array(0,0.0,0),
  Array(0,0,0.0),
  Array(0,0,0.0),
  Array(100,100.0,100),
  Array(100,100,100.0),
  Array(100.0,100,100),
  Array(100,100.0,100),
  Array(100,100,100.0),
  Array(1000,1000.0,1000),
  Array(1000,1000,1000.0),
  Array(1000,1000.0,1000),
  Array(1000.0,1000,1000),
  Array(1000,1000.0,1000),
  Array(1000,1000,1000.0))

val fitX = xmeans(arrData, 10)

println("k: " + fitX.k)
println("size: " + fitX.centroids.size)
println("centroids: " + fitX.centroids(0)(0)+"-"+fitX.centroids(0)(1)+"-"+fitX.centroids(0)(2))
println("distortion: " + fitX.distortion)
for (a<-0 to fitX.y.length)  println("y: "+a+" "+ fitX.y(a))

I dont understand why it gave the folloing output as it is very clear that the elements are 0,100,1000. There should not be just one cluster and the centroid is just averages of the 3 features. Did I do anything wrong?

k: 1
size: 1
centroids: 406.25-406.25-406.25
distortion: 1.0228125E7
y: 0 0
y: 1 0
y: 2 0
y: 3 0
....
y:15 0
1

There are 1 best solutions below

0
On BEST ANSWER

Just tried another array much longer, length=233,

Array(Array(1.2,2.2,3.2)
....
Array(33.4,43.4,53.4)
...
Array(121.1,171.1,221.1))

It gave two centroids. So it seemed xmeans has a requirement on min number of data rows.