Add a new column based on last part of original column in R

29 Views Asked by At

I am trying to add a new column with the year information on an existing column. Currently, I have a column titled County_and_Quarter, which has values of both the County in Pennsylvania and the Year and the Quarter ("Adams_2018 Q1"). I am trying to make a new column based only on the year from that column.

My code currently reads

PA_State_County_Level_Sentencing_with_Year <- PA_State_County_Level_Sentencing %>%
  mutate(Year = if_else(str_detect(County_and_Quarter, "2013"), 
                        "2013",
                        County_and_Quarter))

If I then add 2014, however, the 2013 value is then removed from my new Year column. How do I keep all existing values in my new Year column?

2

There are 2 best solutions below

0
On

Rather than detecting each case one at a time, you should extract the year values from the string. A direct approach would use stringr::str_extract to look for a regex pattern, like 4 digits in a row:

PA_State_County_Level_Sentencing |>
  mutate(Year = str_extract(County_and_Quarter, "[0-9]{4}"))

However, for your format "county name_year quarter", a more complete solution might use tidyr::separate to first break off the county based on the _ separator, and then use it again to separate the year and the quarter based on the space separator. With this you should end up with 3 columns, County, Year, and Quarter, which should make further work convenient. (Note you can specify cols_remove = FALSE if you want to keep the old columns. The default will drop them.)

library(tidyr)
PA_State_County_Level_Sentencing |>
  separate_wider_delim(County_and_Quarter, delim = "_", names = c("County", "QY") |>
  separate_wider_delim(QY, delim = " ", names = c("Year", "Quarter"))
0
On

This is definitely a good candidate for the tidyr::separate_ family of functions. In complex separations with multiple output columns, I like separate_wider_regex, which can often obviate the need for multiple calls to separate(...) |>separate(...):

Example data

library(dplyr)

PA_State_County_Level_Sentencing_with_Year <-
    tibble(index = 1, County_and_Quarter = "Adams_2018 Q1")

# A tibble: 1 × 2
  index County_and_Quarter
  <dbl> <chr>             
1     1 Adams_2018 Q1  

Solution

library(tidyr)

separate_wider_regex(PA_State_County_Level_Sentencing_with_Year,
                     cols = County_and_Quarter,
                     patterns = c(County = "[[:alpha:]]+",
                                  "_",
                                  year = "\\d{4}",
                                  " *Q",
                                  Quarter = "[1-4]"))
# A tibble: 1 × 4
  index County year  Quarter
  <dbl> <chr>  <chr> <chr>  
1     1 Adams  2018  1