Handling singular matrix with Mahalanobis

439 Views Asked by At

I have a data frame of groups for which I would like to calculate the Mahalanobis distance for each group.

I am trying to apply the Mahalanobis function to hundreds of groups and one particular group is causing an issue due to the small sample size (only two rows).

My data looks as follows:

foo <- data.frame(GRP = c("a","a","b","b","b"),X = c(1,1,15,12,50),
                      Y = c(2.17,12.44,50,70,100))

I have borrowed a function idea from here and it looks as follows:

auto.mahalanobis <- function(temp) {
 mahalanobis(temp, 
             center = colMeans(temp, na.rm=T), 
             cov = cov(temp, use="pairwise.complete.obs"),
             tol=1e-20,
             inverted = FALSE
             )
}

Based on a suggestion here I added the tol argument to the auto.mahalanobis function to avoid issues when calculating the covariance matrix with small numbers.

I then tried using this function with my data set and am getting the following error about singular matrices:

 z <- foo %>% group_by(GRP) %>% mutate(mahal = auto.mahalanobis(data.frame(X,Y)))

Error: Problem with `mutate()` input `mahal`.
x Lapack routine dgesv: system is exactly singular: U[1,1] = 0
i Input `mahal` is `auto.mahalanobis(data.frame(X, Y))`.
i The error occurred in group 1: GRP = "a".

The same function works well with other groups that have a larger sample sizes, is there a suggested way to fix this issue or to skip such groups when the sample is too small?

1

There are 1 best solutions below

0
On

The simplest way to do this would probably be:

auto.mahalanobis <- function(temp) {
 m <- try(silent=TRUE,
           mahalanobis(temp, 
             center = colMeans(temp, na.rm=TRUE), 
             cov = cov(temp, use="pairwise.complete.obs"),
             tol=1e-20,
             inverted = FALSE
             ))
 if (!inherits(m,"try-error")) return(m)
 return(rep(NA_real_, length(temp))
}

(untested: a Real Programmer would probably use tryCatch() instead)

If you think the problem is only going to occur when n==2 you could use an if clause, e.g. if (length(temp)<min_length) return(rep(NA_real_,length(temp))).

Alternatively you could make a hacked version of mahalanobis() that used a generalized inverse (MASS::ginv) instead of regular matrix inversion (solve); I think that might (?) work as a drop-in replacement, but haven't checked the math.