Compact way to filter a vector base on two vectors of position in R

48 Views Asked by At

I have a vector of text, e.g.

library(stringi)
MWE <- stri_rand_strings(200, 10, pattern = "[A-Za-z0-9]")

My actual example is not random, so I get to find some recurring occurences of pattern I want to keep. I am therefore able to grep the start and the end of my pattern, and get two vectors :

sequence_start <- c(9,44,56,73,85,98,110,122,140,152,164,176,188)
sequence_end <- c(14,49,61,78,91,103,115,127,145,157,169,181,193)

This is the easy pattern, so nearly all of my sequences are of the same length of 5, but 1. one is of 6 and 2. for more general reasons I would like to do from the two vectors mentionned.

My desired output is a sequence of extracts of my MWE based on the aforementioned start and end sequences, i.e. MWE[9:14], MWE[44:49] etc.

I can do that with a for loop (although I have an warning) :

Desired_Output <- rep(NA,length(sequence_start))
for (i in (1:length(sequence_start))){
  Desired_Output[i] = MWE[sequence_start[i]:sequence_end[i]]
}

But I try to improve a bit my coding skills, and have understood for loops should be avoided as much as possible, so I am wondering what better ways there could be to do that. I am open as to which format the output is. Ideally, code readability is a factor, as I work with people even less fluent in R than I am !

1

There are 1 best solutions below

3
On BEST ANSWER

One option without an explicit loop is using Map():

MWE[unlist(Map(seq, sequence_start, sequence_end))]

Unless milliseconds are important I think the loop is fine. But I don't think the current loop is doing what you want? Here is a modification:

Desired_Output <- list()
for (i in (1:length(sequence_start))){
  Desired_Output[[i]] = MWE[sequence_start[i]:sequence_end[i]]
}
Desired_Output <- unlist(Desired_Output)