I'm currently using a code to order my data in R from minimum to maximum values within group.
Where multiple species share the same minimum value, the species with the lower second value(if exist) is assigned a lower rank order.
library(tidyverse)
species <- data.frame(
species = c("dog", "dog", "cat", "cat", "fish", "fish", "lion"),
overall.percentage = c(12, 13, 20, 12, 20, 50, 12)
)
find_min <- function(x, rank) {
if (rank > length(x)) return(-Inf)
x[row_number(x) == rank]
}
rank <- species |>
summarise(
min_1 = find_min(overall.percentage, 1L),
min_2 = find_min(overall.percentage, 2L),
.by = species
) |>
mutate(rank = row_number(pick(min_1, min_2))) |>
select(species, rank)
species |>
left_join(rank, join_by(species))
#> species overall.percentage rank
#> <chr> <dbl> <int>
#> 1 dog 12 2
#> 2 dog 13 2
#> 3 cat 20 3
#> 4 cat 12 3
#> 5 fish 20 4
#> 6 fish 50 4
#> 7 lion 12 1
I would like to refactor the code so when multiple species share also the second/third/N minimum value, the species with the lower N+1 value is assigned a lower rank order.
And when two species have the same minimum values, the species with less columns will assign the lower rank.
So the output of the following:
species <- data.frame(
species = c("dog", "dog", "dog", "cat", "cat", "cat", "lion", "lion", "Fish"),
overall.percentage = c(11, 12, 14, 11, 12, 13, 11, 12, 20)
)
will be:
#> species overall.percentage rank
#> <chr> <dbl> <int>
#> 1 dog 11 3
#> 2 dog 12 3
#> 3 dog 14 3
#> 4 cat 11 2
#> 5 cat 12 2
#> 6 cat 13 2
#> 7 lion 11 1
#> 8 lion 12 1
#> 8 Fish 20 4
We can programmatically create N minimum value and then the rank.
Note that I am using
find_min(rank,x)instead offind_min(x,rank)