Why is my regex backreference in R being reversed when I use one backslash with gsub?

229 Views Asked by At

I do not understand why I am required to use two backslashes to prevent a reversal of my backreference. Below, I detail how I discovered my problem:

I wanted to transform a character that looks like this:

x <- 53/100 000

And transform it to look like this:

53/100000

Here are a few ideas I had before I came to ask this question:

I thought that I could use the function gsub to remove all spaces that occur after the / character. However, I thought that a regex solution might be more elegant/efficient.

At first, I didn't know how to backreference in regex, so I tried this:

> gsub("/.+\\s",".+",x) [1] "53.+000"

Then I read that you can backreference captured patterns using \1 from this website. So I began to use this:

> gsub("/.+\\s","\1",x) [1] "53\001000"

Then I realized that the backreference only considers the wildcard match. But I wanted to keep the / character. So I added it back in:

> gsub("/.+\\s","/\1",x) [1] "53/\001000"

I then tried a bunch of other things, but I fixed it by adding an extra backslash and enclosing my wildcard in parentheses:

> gsub("/(.+)\\s","/\\1",x) [1] "53/100000"

Moreover, I was able to remove the / character from my replacement by inserting the left parenthesis at the beginning of the pattern:

> gsub("(/.+)\\s","\\1",x) [1] "53/100000"

Hm, so it seemed two things were required: parentheses and an extra backslash. The parentheses I understand I think, because I believe the parentheses indicate what is the part of text that you are backreferencing.

What I do not understand is why two backslashes are required. From the reference website it is said that only \l is required. What's going on here? Why is my backreference being reversed?

1

There are 1 best solutions below

1
On BEST ANSWER

The extra backslash is required so that R doesn't parse the "\1" as an escape character before passing it to gsub. "\\1" is read as the regex \1 by gsub.