I've got a two-column dataset with about 30000 clusters and 10 factors like this:
cluster-1 Factor1
cluster-1 Factor2
...
cluster-2 Factor2
cluster-2 Factor3
...
And I would like to represent the co-occurrence of factors in the clusterset. Something like "Factor1+Factor3+Factor5 in 1234 clusters", and so on for the different combinations. I thought I could so something like a pie chart, but with 10 factors, I take there can be too many combinations.
What would be a good way of representing this?
There is one good programming question in here that should be addressed:
How do I count the number of co-occurrences of factors in the different clusters?
First simulate some data:
Then here is the code that could be used to tabulate the number of times each combination of factors occurs in the clusters:
This can be represented as a simple pie chart, for example,
but simple counts like this are often most efficiently displayed as a sorted table. For more on this, check out Edward Tufte.