How to change variables into quantitative?

20.8k Views Asked by At

I have a data matrix (900 columns and 5000 rows), which I would like to do a pca on..

The matrix looks very well in excel (meaning all the values are quantitative), but after I read my file in R and try to run the pca code , i get an error saying that "The following variables are not quantitative" and I get a list of non-quantitative variables.

So in general, some variables are quantitative and some are not. See the example as follows. When I check for variable 1, it is correct and quantitative.. (randomly some variables are quantitative in the file) When I check for variable 2, it is incorrect and non-quantitative.. (randomly some variables like this are non-quantitative in the file)

> data$variable1[1:5]
[1] -0.7617504 -0.9740939 -0.5089303 -0.1032487 -0.1245882

> data$variable2[1:5]
[1] -0.183546332959017 -0.179283451229594 -0.191165669598284 -0.187060515423038
[5] -0.184409474669824
731 Levels: -0.001841783473108 -0.001855956210119 ... -1,97E+05

So my question is, how can I change all the non-quantitative variables into quantitative ??

Making the file short does not help , as the values get quantitative on its own. I do not know whats happening. So here is the link for my original file <- https://docs.google.com/file/d/0BzP-YLnUNCdwakc4dnhYdEpudjQ/edit

I also tried the answers given below, but it still doesnt help.

So let me show what exactly I had done,

> data <- read.delim("file.txt", header=T)
> res.pca = PCA(data, quali.sup=1, graph=T)
Error in PCA(data, quali.sup = 1, graph = T) :
The following variables are not quantitative:  batch
The following variables are not quantitative:  target79
The following variables are not quantitative:  target148
The following variables are not quantitative:  target151
The following variables are not quantitative:  target217
The following variables are not quantitative:  target266
The following variables are not quantitative:  target515
The following variables are not quantitative:  target530
The following variables are not quantitative:  target587
The following variables are not quantitative:  target620
The following variables are not quantitative:  target730
The following variables are not quantitative:  target739
The following variables are not quantitative:  target801
The following variables are not quantitative:  target803
The following variables are not quantitative:  target809
The following variables are not quantitative:  target819
The following variables are not quantitative:  target868
The following variables a
In addition: There were 50 or more warnings (use warnings() to see the first 50)
3

There are 3 best solutions below

1
On

R considers your variables as factors, as mentioned by Arun. Therefore it makes a data.frame (which in fact is a list). There are numerous ways to solve this problem, one would be converting it into a data matrix in the following way;

matrix <- as.numeric(as.matrix(data))
dim(matrix) <- dim(data)

Now you can run your PCA on the matrix.

Edit:

Extending the example a bit, the second part of charlie's suggestion won't work. Copy the following session and see how it works;

d <- data.frame(
 a = factor(runif(2000)),
 b = factor(runif(2000)),
 c = factor(runif(2000)))

as.numeric(d) #does not work on a list (data frame is a list)

as.numeric(d$a) # does work, because d$a is a vecor, but this is not what you are 
# after. R converts the factor levels to numeric instead of the actual value.

(m <- as.numeric(as.matrix(d))) # this does the rigth thing
dim(m)                        # but m loses the dimensions and is now a vector

dim(m) <- dim(d)              # assign the dimensions of d to m

svd(m)                        # you can do the PCA function of your liking on m
3
On

By default, R coerces strings to factors. This can result in unexpected behavior. Turn off this default option with:

      read.csv(x, stringsAsFactors=F)

You can, alternatively, coerce factors to numeric with

      newVar<-as.numeric(oldVar)
0
On

as.numeric(as.character(data$variable2[1:5])), use as.character to get string representation of labels of factor variable first, then convert them with as.numeric