A bit of an odd one here. I have a list of file roots and I want to extract the terminal file name from each root. An ugly composite of stringr functions does the job by detecting the last "/" character in the string and then extracting from the back.
Now the odd thing is that the function works fine when applied to any one string individually, but doesnt seem to apply itself properly when passed down a data.table:
require(data.table)
require(stringr)
file_list <- data.table(file_root = c("~/dat/stuff/thing.csv",
"~/dat/stuff/thingy.csv",
"~/dat/otherstuff/thinger.csv"))
file_root <- "~/dat/otherstuff/thinger.csv"
success <- str_sub(file_root,-(str_length(file_root) - max(str_locate_all(file_root,"/")[[1]])),-1)
#> success
#[1] "thinger.csv"
file_list[, extract := str_sub(file_root,-(str_length(file_root) - max(str_locate_all(file_root,"/")[[1]])),-1)]
#> head(file_list)
#file_root extract
#1: ~/dat/stuff/thing.csv thing.csv
#2: ~/dat/stuff/thingy.csv thingy.csv
#3: ~/dat/otherstuff/thinger.csv tuff/thinger.csv Final result is incorrect
I can put together strsplit function which does the job using sapply down the data table, however in practice the file_list will be several hundred thousands of rows long and sapply will take an inordinately long period of time.
find_name <- function(X) {as.character(data.table(strsplit(X,"/")[[1]])[NROW(data.table(strsplit(X,"/")[[1]]))])}
file_list[,extract := sapply(file_root,find_name)]
So my questions are. Any idea why the original function isnt working, and how to fix it? Alternatively how can I get the find_name function to work faster?
Thanks in advance....
the basename suggestion by Arun works very nicely, e.g.
would still be interesting to find the reason for the strange stringr results, but this solution works for my immediate problem.
Cheers