Which regex condition could I use in to capture a math formula with units in R?

80 Views Asked by At

I am looking at codes in odf formulas that look a bit like this: {500mgl} over {4.05grams} Example

I want to use a regex with gsub in R to enclose in brackets all of the elements with the pattern

([0-9]+)([A-Za-z]+)

to avoid some units not displaying in the denominator. However, if I do this, the respective units will end up separated from the real number: 4,{0.5g} So what I want to enclose first the numbers with the commas:

a<-"4,05g"
gsub("([0-9]+)(\\,)([0-9]+)([A-Za-z]+)","{\\1\\2\\3\\4}",a)

and then, enclose with brackets the pattern:

([0-9]+)([A-Za-z]+)

but only if there is not an opening bracket before the pattern. I've tried searching the web for how look back syntax works with regex, however, I get pretty confused with how it works within R's gsub. I tried things like this:

gsub("([^\\.])([0-9]+)([A-Za-z]+)","{\\2\\3}",a)
gsub("(?[\\.])([0-9]+)([A-Za-z]+)","{\\2\\3}",a)
gsub("(!\\.?)([0-9]+)([A-Za-z]+)","{\\2\\3}",a)

but honestly I have no idea what I'm doing.

EDIT: I think that the exemption for the previous character must be not a bracket but a comma. That way one would avoid the output

"0,3g
" 0,{3g}"

but be able to do

"30g"
"{30g}"
1

There are 1 best solutions below

0
Wiktor Stribiżew On

You can use

x <- "4,05g"
gsub("(\\d+(?:,\\d+)?[[:alpha:]]*)", "{\\1}", x)

See the R demo and the regex demo.

Details:

  • ( - Group 1 start (necessary as gsub does not support backreferences to the whole match):
    • \d+ - one or more digits
    • (?:,\d+)? - an optional sequence of a comma and one or more digits
    • [[:alpha:]]* - zero or more letters
  • ) - end of the group.

The \1 in the replacement is the value of Group 1.