Computing the Tukey median

298 Views Asked by At

I am trying to compute the data depth of two variables with the following function:

library(depth)
x <- data.frame(data$`math score`, data$`reading score`)


depth(1000, x, method = "Tukey", approx = FALSE, eps = 1e-8, ndir = 1000)

the first variable after depth is u which stands for Numerical vector whose depth is to be calculated. Dimension has to be the same as that of the observations. I have 1000 observations however I get the following error message:

Error in depth(1000, x, method = "Tukey", approx = FALSE, eps = 1e-08,  : 
  Dimension mismatch between the data and the point u.

Does someone know how to solve this issue? Thank you in advance!

1

There are 1 best solutions below

0
On

If you look at the documentation for the function depth, it says:

u    Numerical vector whose depth is to be calculated. Dimension has to be the same as that of the observations.

So u has to be a point in multidimensional space represented by a vector with n components, whereas x has to be a matrix or data frame of m by n components, (m rows for m points). You are comparing u to all the other multidimensional points in the set x to find the minimum number of points that could share a half-space with u.

Let's create a very example in two dimensional space:

library(depth)

set.seed(100)

x <- data.frame(x = c(rnorm(10, -5, 2), rnorm(10, 5, 2)), y = rnorm(20, 0, 2))

plot(x)

enter image description here

The depth function calculates the depth of a particular point relative to the data. So let's use the origin:

u <- data.frame(x = 0, y = 0)
points(u, col = "red", pch = 16)

enter image description here

Naively we might think that the origin here has a depth of 10/20 points (i.e. the most obvious way to partition this dataset is a vertical line through the origin with 10 points on each side, but instead we find:

depth(u, x) 
#> [1] 0.35

This indicates that there is a half-space including the origin that only contains 0.35 of the points, i.e. 7 points out of 20:

depth(u, x) * nrow(x)
#> [1] 7

And we can see that visually like this:

abline(0, -0.07)
points(x[x$y < (-0.07 * x$x),], col = "blue", pch = 16)

enter image description here

Where we have coloured these 7 points blue.

So it's not clear what result you expect from the depth function, but you will need to give it a value of c(math_score, reading_score) where math_score and reading_score are test values for which you want to know the depth.