I am trying to compute the data depth of two variables with the following function:
library(depth)
x <- data.frame(data$`math score`, data$`reading score`)
depth(1000, x, method = "Tukey", approx = FALSE, eps = 1e-8, ndir = 1000)
the first variable after depth is u which stands for Numerical vector whose depth is to be calculated. Dimension has to be the same as that of the observations. I have 1000 observations however I get the following error message:
Error in depth(1000, x, method = "Tukey", approx = FALSE, eps = 1e-08, :
Dimension mismatch between the data and the point u.
Does someone know how to solve this issue? Thank you in advance!
If you look at the documentation for the function
depth
, it says:So u has to be a point in multidimensional space represented by a vector with n components, whereas x has to be a matrix or data frame of m by n components, (m rows for m points). You are comparing u to all the other multidimensional points in the set x to find the minimum number of points that could share a half-space with u.
Let's create a very example in two dimensional space:
The
depth
function calculates the depth of a particular point relative to the data. So let's use the origin:Naively we might think that the origin here has a depth of 10/20 points (i.e. the most obvious way to partition this dataset is a vertical line through the origin with 10 points on each side, but instead we find:
This indicates that there is a half-space including the origin that only contains 0.35 of the points, i.e. 7 points out of 20:
And we can see that visually like this:
Where we have coloured these 7 points blue.
So it's not clear what result you expect from the
depth
function, but you will need to give it a value ofc(math_score, reading_score)
wheremath_score
andreading_score
are test values for which you want to know the depth.