How to select certain rows of a data set in R to then use in a function?

299 Views Asked by At

I am trying to find the Mahalanobis Distance between the different species in the iris dataset in R. I was able to find the distance between setosa and versicolor by the following code:

library(HDMD)

#To get Mahalanobis distances between Setosa and Versicolor,
set.vers<-pairwise.mahalanobis(x=iris[1:100,1:4], grouping=iris[1:100,]$Species)
md= sqrt(set.vers$distance)

However, I am struggling to do the same for setosa and virginica. I am not sure how to select the first 50 rows and last 50 rows of the data set (i.e. not have any versicolor data)

2

There are 2 best solutions below

0
On BEST ANSWER

Here is a way to get all combinations of levels in iris$Species with combn and compute the Mahalanobis distances.

library(HDMD)

inx <- sapply(levels(iris$Species), function(l) which(iris$Species == l), simplify = FALSE)
inx <- combn(inx, 2, function(x) unlist(x), simplify = FALSE)
set.vers_all <- lapply(inx, function(i) {
  pairwise.mahalanobis(x = iris[i, 1:4], grouping = droplevels(iris$Species[i]))
})
set.vers_all
3
On

This is a basic subsetting question. You want to subset based on Species, something along the lines of (not tested)

ss <- iris[iris$Species %in% c("Setosa", "Virginica"), ]
pairwise.mahalanobis(x = ss, grouping = ss$Species)

You can of course change the species pair you want to compare in many ways.