Select entire row based on calculation done to column in data.table

82 Views Asked by At

I understand that data.table allows you to do computations based on groups within a column. For example.

Reproducible example

iris[,.SD[which.min(Petal.Width)], by=Species]

generating

 Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1:     setosa          4.9         3.1          1.5         0.1
2: versicolor          4.9         2.4          3.3         1.0
3:  virginica          6.1         2.6          5.6         1.4

I want every row where the minimum is met; not just the first, something that is easily achieved in a DF:

for example this:

 Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
10           4.9         3.1          1.5         0.1     setosa
13           4.8         3.0          1.4         0.1     setosa
14           4.3         3.0          1.1         0.1     setosa
33           5.2         4.1          1.5         0.1     setosa
38           4.9         3.6          1.4         0.1     setosa
58           4.9         2.4          3.3         1.0 versicolor
61           5.0         2.0          3.5         1.0 versicolor
63           6.0         2.2          4.0         1.0 versicolor
68           5.8         2.7          4.1         1.0 versicolor
80           5.7         2.6          3.5         1.0 versicolor
82           5.5         2.4          3.7         1.0 versicolor
94           5.0         2.3          3.3         1.0 versicolor
135          6.1         2.6          5.6         1.4  virginica

What I don't want is just the first instance of where the minima is met:

This would be equivalent to doing something like this using a data.frame

iris        
iris <- as.data.frame(iris) #in case reader does not start new R session    

f.min <- function(spec) {
spec.sub <- iris[iris$Species==spec,]
min.rows <- spec.sub[spec.sub$Petal.Width == min(spec.sub$Petal.Width),]
}             

do.call(rbind, lapply(levels(iris$Species), f.min ))

There are some powerful features in data.table which are worth learning. Hence why I would like to know the equivalent in data.table.

1

There are 1 best solutions below

0
On

Try:

iris[,.SD[which.min(Petal.Width)], by=Species]

This will give you the minimas but does not show ties.

      Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1:     setosa          4.9         3.1          1.5         0.1
2: versicolor          4.9         2.4          3.3         1.0
3:  virginica          6.1         2.6          5.6         1.4

A dplyr solution showing the ties as well would be:

require(dplyr)
require(magrittr)
iris %>% 
  group_by(Species) %>% 
  filter(rank(Petal.Width, ties.method= "min") == 1)

Source: local data table [13 x 5] Groups: Species

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           4.9         3.1          1.5         0.1     setosa
2           4.8         3.0          1.4         0.1     setosa
3           4.3         3.0          1.1         0.1     setosa
4           5.2         4.1          1.5         0.1     setosa
5           4.9         3.6          1.4         0.1     setosa
6           4.9         2.4          3.3         1.0 versicolor
7           5.0         2.0          3.5         1.0 versicolor
8           6.0         2.2          4.0         1.0 versicolor
9           5.8         2.7          4.1         1.0 versicolor
10          5.7         2.6          3.5         1.0 versicolor
11          5.5         2.4          3.7         1.0 versicolor
12          5.0         2.3          3.3         1.0 versicolor
13          6.1         2.6          5.6         1.4  virginica

The 'ties.method' parameter is where you can select what should be displayed.

Hope this helps.