Multi-dimensional scaling for a very large dataset in R

316 Views Asked by At

I'm fairly new to R and I'm looking to nicely visualise my k-means segmentation of 368k customers with 36 variables into the 6 segments.

To do this I believe I need to MDS the dataset before I can plot it but I always seem to encounter the same problem that R cannot allocate a vector length that size.

I have been looking at other methods such as NMDS but still coming up with similar issues. Wondering if it is even possible to store the scaling values as something other than a vector as a work around?

Aware that maybe this is simply too big to visualise and maybe not even worthwhile doing so anyway.

Any tips or guidance welcomed.

The code I am trying to use is:

d=dist(MyData, method = "euclidean")

which gives me the 'Error: cannot allocate vector of size 1.4 Mb' message

I was then intending to use the following code to fit and plot graph:

fit=cmdscale(d,eig=TRUE, k=2)
p = ggplot(data.frame(MyData), aes(fit$points[,1], fit$points[,2], color =  factor(Kmeans$cluster))) 
p <- p + theme(axis.title.y = element_text(size = rel(1.5), angle = 90))
p <- p + theme(axis.title.x = element_text(size = rel(1.5), angle = 00))
p= p + theme(axis.text=element_text(size=16,angle=90),axis.title=element_text(size=20,face="bold")) + geom_point(size=4)
p= p + theme(legend.text = element_text(size = 14, colour = "black"))
p= p + theme(legend.title = element_text(size = 18, colour = "black"))
p= p  + theme(legend.key.size = unit(1.5, "cm"))
p
0

There are 0 best solutions below