Using Certain Values in Correspondence Analysis/MCA in R

111 Views Asked by At

I am pretty new to R and I use it for certain problems around data science, espacially more complex Methods Like MCA.

Right now I want to do an MCA with 8 Variables which are 0/1 Coded. I Only want to Include the 1 Values of all these 8 Variables in my Analysis with the MCA. That does not mean, that I want to exclude any Case that has a 0. I want all cases and all variables to plot, but I dont want the Missing Values to count for the MCA. I just want only the 1 to be in the final plot of the mca.

The data looks like this

V1 V2 V3 ...
1  0  0
0  1  1
1  1  1
1  0  1

I work with the Package Soc.ca

My code so far

attach(data_spss)

desired_levels <- c("0", "1")

for (col in colnames(data_spss)) { data_spss[[col]] <- factor(data_spss[[col]], levels = desired_levels)}

if (!require(soc.ca)) {install.packages("soc.ca")} 
library (soc.ca)

active <- data_spss
options (passive=0)

result <- soc.mca(active, sup = NULL, identifier = NULL, passive = getOption("passive", default = "Missing"),
weight = NULL, Moschidis = FALSE,detailed.results = FALSE)

The error message I get is

Error in rowSums(ind.reduced[, varlist.long.red == unique(varlist)[i]]) : 'x' must be an array of at least two dimensions

1

There are 1 best solutions below

0
On

I don't know the package Soc.ca, but from my experience with the ca package from Michael Greenacre and Oleg Nenadic I can tell:

  1. If you want that level 0 is used in the mca calculation but not plotted, you could manipulate the output of the Soc.ca procedure, i.e. delete the coordinates of level 0 and plot only the remaining coordinates.

  2. Another possibility is to produce an indicator matrix of your data, then collapse all the columns indicating 0 into one single column and deploy an ordinary ca (not a mca) to this data. This is largely equivalent to the mca solution, but this way the plot doesn‘t get cluttered by the 0-values. I read on this idea in Fionn Murtagh's book on correspondence analysis here.

  3. The error message seems to indicate a problem not related to your question, maybe your data contains columns with constant values?

  4. Regarding the treatment of missing values within the ca: If you have 0, 1 and NA in your data and want to factor out the variance contributed by NA values, you could treat NA-values as a third value. If these values are non random, they will get their own dimension in the ca-solution. Then omit that NA dimension from the plot. Most of the time the NAs will show up on the first or second dimension of the ca solution. If it's the first dimension, plot only the second and the third dimension, if it's the second, plot the first and the third.

PS With a reproducible minimal example we can elaborate further on that.