How to create a unique identifier based on other column in R

273 Views Asked by At

I have a data frame with five thousands rows. I need to create a new column with a unique identifier based on column "gender", then the number 21, and a sequential number starting on 0001. It is important that the sequential number restarts with a different letter in column "gender" (gender + 21 + seq#).

df <- data_frame(
  name = c("A", "B", "C", "D", "E", "F", "G", "H", "I"),
  gender = c("F", "F", "F", "M","M","F","M","F","F")
)

df
name  gender
  <chr> <chr> 
1 A     F     
2 B     F     
3 C     F     
4 D     M     
5 E     M     
6 F     F     
7 G     M     
8 H     F     
9 I     F

With unique identifier:

df
name  gender  id
1 A     F     F210001
2 B     F     F210002
3 C     F     F210003
4 D     M     M210001
5 E     M     M210002
6 F     F     F210004
7 G     M     M210003
8 H     F     F210005
9 I     F     F210006

Any help on how to achieve this will be very appreciated.

2

There are 2 best solutions below

1
On

An option is paste with rowid

library(dplyr)
library(stringr)
library(data.table)
df1 <- df %>% 
          mutate(id = str_c(gender, rowid(gender) + 210000))

Or do a group_by/row_number

df1 <- df %>%
        group_by(gender) %>%
        mutate(id = str_c(cur_group(), row_number() + 210000)) %>%
        ungroup
0
On

in base R you could use ave:

transform(df, group = ave(gender, gender, FUN = function(x)sprintf("%s21%04d",x,seq(x))))

  name gender   group
1    A      F F210001
2    B      F F210002
3    C      F F210003
4    D      M M210001
5    E      M M210002
6    F      F F210004
7    G      M M210003
8    H      F F210005
9    I      F F210006