Plotting clusters of nominal data in R

637 Views Asked by At

Imagine we have 7 categories (e.g. religion), and we would like to plot them not in a linear way, but in clusters that are automatically chosen to be nicely aligned. Here the individuals within groups have the same response, but should not be plotted on one line (which happens when plotting ordinal data).

So to sum it up:

  • automatically using available graph space

  • grouping without order, spread around canvas

  • individuals remain visible; no overlapping

  • would be nice to have the individuals within groups to be bound by some (invisible) circle

Are there any packages designed for this purpose? What are keywords I need to look for?

Example data:

religion <- sample(1:7, 100, T)
# No overlap here, but I would like to see the group part come out more. 
plot(religion)  
2

There are 2 best solutions below

0
On BEST ANSWER

After assigning coordinates to the center of each group, you can use wordcloud::textplot to avoid overlapping labels.

# Data
n <- 100
k <- 7
religion <- sample(1:k, n, TRUE)
names(religion) <- outer(LETTERS, LETTERS, paste0)[1:n]
# Position of the groups
x <- runif(k)
y <- runif(k)
# Plot
library(wordcloud)
textplot(
  x[religion], y[religion], names(religion), 
  xlim=c(0,1), ylim=c(0,1), axes=FALSE, xlab="", ylab=""
)

wordcloud

Alternatively, you can build a graph with a clique (or a tree) for each group, and use one of the many graph-layout algorithms in igraph.

library(igraph)
A <- outer( religion, religion, `==` )
g <- graph.adjacency(A)
plot(g)
plot(minimum.spanning.tree(g))

igraph

2
On

In the image you linked each point has three numbers associated: coordinates x and y and group (color). If you only have one information for each individual, you can do something like this:

set.seed(1)

centers <- data.frame(religion=1:7, cx=runif(7), cy=runif(7))

eps <- 0.04

data <- within(merge(data.frame(religion=sample(1:7, 100, T)), centers),
{
    x <- cx+rnorm(length(cx),sd=eps)
    y <- cy+rnorm(length(cy),sd=eps)
})

with(data, plot(x,y,col=religion, pch=16))

Note that I'm creating random centers for each group and also creating small displacements around these centers for each observation. You'll have to play around with parameter eps and maybe set the centers manually if want to pursue this path.