How to create a column for unique IDs replacing the old unique IDs in a large dataset, as large as around 26000 observations?
I have a dataset with 26000 observations and need to create a unique ID for each year in the dataset. For example, for 2000 I have about 2000 IDs for individuals and the new dataset will consist of a new number for each individual in 2000 for example 20001. Similarly, for every year from 2000 to 2018, I need to create unique IDs with a year number and the already existing ID number and the total number of observations in the dataset is 26668. How to do it in R?
I tried this
New2 <- df1 %>% mutate(NewID = 20000 + (year - min(year)) * 10000 + id)
but this is not generating a unique ID for 2002, 2004 etc. For example for year 2000 the already existing ID for an individual is 1. The new column should look like 20001. For 2002 it should look like 20021. There are about 2000-4000 observations for each year and the years range from 2000-2018. How to resolve this in R?
Simply use the
newIDsyou have just created.Or do it in one step.
Edit
This actually works for any number of observations.
If you don't like different lengths of the IDs, you can try
sprintfwhere%05ddefines number of digits of the number part.To avoid hard coding the
5, you could do this hack.Data: