Separating bound First and Surnames with a space using gsub in r

144 Views Asked by At

I have a character vector where some First and Surnames are separated with a space and some are not. I need to separate with a space those character strings where First names and last names are not separated. Each names begins with a capital.

e.g. in

x <- c("John Lennon", "GeorgeHarrison", "RingoStarr")

I would like George and Ringo's names to be separated by a space while leaving John's as-is.

After looking on SO I tried

gsub("([[:upper:]][[:lower:]])", "\\1 \\2", x)

but that yielded

"Jo hn Le nnon" "Ri ngoSt arr" 

To be honest I don't have a clue what I'm doing when it comes to regular expressions (Just bought a book on it a minute ago on Amazon but can't wait that long).

Help much appreciated

1

There are 1 best solutions below

1
On BEST ANSWER

You can use PERL look-ahead:

gsub("([[:lower:]])(?=[[:upper:]])", "\\1 ", x, perl = TRUE)
# [1] "John Lennon"     "George Harrison" "Ringo Starr" 

Explore this on regex101 for more, and read about look-around regex here.


Upon further inspection of your attempt, you made two crucial mistakes:

  • You switched [:upper:] and [:lower:]
  • You captured only one group, when you really wanted to capture two groups

You can make slight changes to your own approach:

gsub("([[:lower:]])([[:upper:]])", "\\1 \\2", x)