I want to keep a string of character inside a complex string. I think that I can use regex to do keep the thing that I need. Basically, I want to keep only the information between the \"
and \"
in Function=\"SMAD5\"
. I also want to keep the empty strings: Function=\"\"
df=structure(1:6, .Label = c("ID=Gfo_R000001;Source=ENST00000513418;Function=\"SMAD5\";",
"ID=Gfo_R000002;Source=ENSTGUT00000017468;Function=\"CENPA\";",
"ID=Gfo_R000003;Source=ENSGALT00000028134;Function=\"C1QL4\";",
"ID=Gfo_R000004;Source=ENSTGUT00000015300;Function=\"\";", "ID=Gfo_R000005;Source=ENSTGUT00000019268;Function=\"\";",
"ID=Gfo_R000006;Source=ENSTGUT00000019035;Function=\"\";"), class = "factor")
This should look like this:
"SMAD5"
"CENPA"
"C1QL4"
NA
NA
NA
So far that What I was able to do:
gsub('.*Function=\"',"",df)
[1] "SMAD5\";" "CENPA\";" "C1QL4\";" "\";" "\";" "\";"
But I'm stuck with a bunch of \";"
. How can I remove them with one line?
I tried this:
gsub('.*Function=\"' & '.\"*',"",test)
But it's giving me this error:
Error in ".*Function=\"" & ".\"*" :
operations are possible only for numeric, logical or complex types
You may use
See the regex demo
Details:
.*
- any 0+ chars as many as possible up to the last...Function=\"
- aFunction="
substring([^\"]*)
- capturing group 1 matching 0+ chars other than a"
.*
- and the rest of the string.The
\1
is the backreference restoring the contents of the Group 1 in the result.