Extracting from numerical string only some digits

49 Views Asked by At

I have a folder full of raster files. They come by group of 12 where each one of them is a band (there are 12 bands) of the satellite Sentinel 2. I simply want to create a loop that goes through the folder and first identify the two bands that I am interested in (Band 4 et 5). To process them in pairs from the same set, I am trying to extract from the Band 4 the date of the photo in a string, that I will the use to retrieve the Band 5 from the same date;

There the problem comes. The names are like this : T31UER_20210722T105619_B12.jp2, but I manage to extract only the numbers from it and get rid of the 31 and this gives me : 20190419105621042

The core of my question is then, how can I select only a small part (YYYY/MM/DD) of this string ?

here is the piece of code. As you can see, my method is to select the part I want deleted. But it doesn't work for the second step where the part coming after the date changes all the time, except for the 042. thank you very much !

for (f in files){
  #Load band 4
  Bande4 <- list.files(path="C:/Users/Perrin/Desktop/INRA/Raster/BDA/Images en vrac", 
                       pattern ="B04.jp2$", full.names=TRUE)
  #Copy the date
  x <- gsub("[A-z //.//(//)]", "", Bande4)
  y <- gsub("31", "", x)
  z <- gsub("??? this part changes for every file!", "", y)

  #Load the matching Band 5
  Bande5 <- list.files(path="C:/Users/Perrin/Desktop/INRA/Raster/BDA/Images en vrac", 
                       pattern = z, full.names=TRUE)
  #Calculate NDVI
  NDVI <- ((Bande5 - Bande4)/(Bande5- Bande4))

  #Save the result
  r4 <- writeRaster(z, "C:/Users/Perrin/Desktop/INRA/Raster/BDA/Images en vrac", format="GTiff", overwrite=TRUE)
  
}
2

There are 2 best solutions below

1
On BEST ANSWER

You can use substr to extract certain characters from a string, e.g.:

substr(z, 1, 8)
[1] "20210722"

If your names are always in the same format, you can directly use substr without gsub first:

substr(Bande4, 8, 15)
# e.g. with
substr("T31UER_20210722T105619_B12.jp2", 8, 15)
[1] "20210722"
0
On

you can select the date because it's a string 8 digit long between and underscore and a capital letter (here I assume it's always "T")

str <- "T31UER_20210722T105619_B12.jp2"

sub("(.*_)([[:digit:]]{8})(T.*)", "\\2", str)
#> [1] "20210722"

I describe the string as a regex and only gather the second part of it (parts being delimited by parenthesis).

I hope it will match all your raster !