Is there an equivalent of vb.net's instr in R

228 Views Asked by At

I am rewriting my vb.net code in R and have come to a roadblock. The code in vb.net essentially counts the number of characters in a string that do not occur in a string of allowed characters. The code in vb.net is:

StringtoConvert="ABC"
strAllowedChars="AC"
For i= 1 to len(StringtoConvert)
  If InStr(1, strAllowedChars, StringtoConvert(i))=0 then
  disallowed=disallowed+1
  Else
  End If
Next

I can see how to do this in R using loops to search the string for each of the allowed characters but is there a way in R to do this using an aggregate like the strAllowedChars above?

The str_count function of the stringr package in R is the closest that I have found but it looks matches to the entire strAllowedChars rather than looking at each character independently. How can I test the StringtoConvert to make sure it contains only the strAllowedChars as individual characters. In other words in the example above if a character in StringtoConvert does not match one of the characters in strAllowedCharacters then I need to either identify it as such and use another call to replace it or replace it directly.

The R code that I have tried is:

    library(stringr)
    testerstring<-"CYA"
    testpattern<-"CA"
    newtesterstring<-str_count(testerstring,testpattern)
    print(newtesterstring)

The desired output is the number of characters in the StringtoConvert that are disallowed based on the allowed characters-strAllowedChars. I will then use that in a loop to change any disallowed character to a "G" using an if then statement so it would also be desirable if I could skip the step of counting and instead just replace any disallowed character with a "G".

3

There are 3 best solutions below

1
On BEST ANSWER

Here's an approach with str_replace_all. We can generate a regular expression to identify characters that are not in a set. For example, [^AC] matches any characters not A or C:

library(stringr)
StringtoConvert="ABC"
strAllowedChars="AC"
str_replace_all(StringtoConvert,paste0("[^",strAllowedChars,"]"),"G")
#[1] "AGC"

set.seed(12345)
sample(LETTERS,50,replace = TRUE) %>% paste(collapse = "") -> StringtoConvert2
str_replace_all(StringtoConvert2,paste0("[^",strAllowedChars,"]"),"G")
#[1] "GGGGGGGGGGGGGGGGGGAGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGG"
4
On

You could use strsplit to get each character in strAllowedChars and then subtract the no of allowed characters in StringtoConvert from the total no of characters in StringtoConvert.

That will give you the total no of disallowed characters in StringtoConvert, if that's what you are after.

StringtoConvert <- "ABCrrrrr"
strAllowedChars <- "ACT"
disallowed <- nchar(StringtoConvert) - sum(stringr::str_count(StringtoConvert, strsplit(strAllowedChars,"")[[1]]))

disallowed

To replace all but the allowed characters with 'G' you can try this.

> StringtoConvert <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> strAllowedChars <- "ACT"
> 
> stringr::str_replace_all(StringtoConvert, paste0("[^", strAllowedChars, "]"), "G")
[1] "AGCGGGGGGGGGGGGGGGGTGGGGGG" 
0
On

Using R Base only:

StringtoConvert="ABC"
strAllowedChars="AC"
Res=nchar(StringtoConvert)-sum(strsplit(StringtoConvert,"")[[1]] %in% strsplit(strAllowedChars,"")[[1]])