Mahalanobis distance with multiple observations, variables and groups

659 Views Asked by At

For the iris data set, I am trying to find the Mahalanobis distances between each pair of species. I have tried the following but have had no luck. I tried the following:

group <- matrix(iris$Species) 
group <- t(group[,-5])

variables <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
varibles <- as.matrix(iris[,variables])

mahala_sq <- pairwise.mahalanobis(x=variables, grouping=group)

But get the error message

Error in pairwise.mahalanobis(x = variables, grouping = group) : nrow(x) and length(grouping) are different

1

There are 1 best solutions below

4
On

This works:

HDMD::pairwise.mahalanobis(x=iris[,1:4], grouping=iris$Species)
  • x should be a numeric matrix of observations (columns=variables, rows=observations)
  • grouping should be a "vector of characters or values designating group classification for observations" with length equal to nrow(x)

I realized in editing your question that the problem stems from a typo (you assigned varibles instead of variables); if you fix that typo, your code seems to work (at least doesn't throw an error). (I still claim that my solution is simpler ...)

if you wanted to be a little more careful you could use x <- iris[colnames(x) != "Species"] (or a subset(select=) or dplyr::select() analog) to refer to the omitted column by name rather than position.

If you want (for some reason) to run this analysis with a single response variable, you need to use drop=FALSE to prevent a one-column matrix from being collapsed to a vector, i.e. use x=iris[,1,drop=FALSE]