I have labelled the data matrix for PCA. How to colour them according to each label in PCA using r?

377 Views Asked by At

My data matrix has 100 rows and 900 columns. Here each row represents a IR spectra. The column represents the wavenumbers. The first 23 rows belong to different IR spectra from the same sample (i.e spectra from 23 different positions in the sample). Similarly I have measured 5 samples each with certain no.of observations. For ex: 1-23 rows belongs to sample 1, 24:40 belongs to sample 2. Now I want to colour the scores in my PCA score plot according to the sample colours and label the colour with the sample name. Like, 23 scores in blue and then a label referring Sample 1.

I have added an extra column named label, to my data matrix referring the sample names. But I do not how to proceed further?

1

There are 1 best solutions below

0
On

I was using the packages "factoextra", "sf" for this. Here df is the data frame that contains the data for PCA. Here I added another column referring to the labels of my data. In the code,col.ind= df$lab.id says that I have taken the labelling id (labels) as the color index. Hence in the resulting PCA score plot, my scores were colour coded according to their labels.

fviz_pca_ind(PCA,axes=c(1,2),title="PC1 vs PC2",label="none",geom.ind="point",col.ind=df$lab.id,palette="lancet",addEllipses=FALSE, ellipse.level=0.95,pointsize=2,
             repel = TRUE,   # Avoid text overlapping,
             legend.title="Disease ",mean.point=FALSE,xlab=paste0("PC1: ",round(Variance_xplained[1]*100,1),"%"),ylab=paste0("PC2: ",round(Variance_xplained[2]*100,1),"%")

)