I currently started using shinyTree
for one of my applications and I'm having trouble finding an efficient way in which to turn my directory into a list. My assumption is that the easiest way is to use something like Rcpp
to take advantage of C++'s speed, but I'm not married to that idea. If that is the route to take however, my skill set in that arena is virtually zero, so I'm hoping someone might be able to provide a couple snippets of code to get me started in the right direction.
Here is the code I'm currently using to achieve what I'm trying to do:
create_directory_tree = function(root) {
tree = list()
file_lookup = data.frame(id=character(0), file_path=character(0), stringsAsFactors=FALSE)
files = list.files(root, all.files=F, recursive=T, include.dirs=T)
walk_directory = function(tree, path) {
fp = file.path(root, path)
is_dir = file.info(fp)$isdir
if (is.null(is_dir) | is.na(is_dir)) {
print(fp)
return(NULL)
}
path = gsub("'|\"", "", path)
folders = str_split(path, "/")[[1]]
if (is.na(dir) | is.null(dir)) {
print(paste("Failed:", fp))
return(NULL)
}
if (is_dir) {
txt = paste("tree", paste("$'", folders, "'", sep="", collapse=""), " = numeric(0)", sep="")
} else {
txt = paste("tree", paste("$'", folders, "'", sep="", collapse=""), " = structure('', sticon='file')", sep="")
}
eval(parse(text = txt))
return(tree)
}
for (i in 1:length(files)) {
tmp = data.frame(id=paste0("j1_", i), file_path=file.path(root, files[i]), stringsAsFactors=FALSE)
file_lookup = rbind(file_lookup, tmp)
tree = walk_directory(tree, files[i])
save(tree, file_lookup, file="www/dir_tree.Rdata")
}
}
This is taking an absurdly long time and I'm hoping there is something better. Thanks in advance.
The issue is you are growing the
data.frame
byrbind
inChances are the directory with
root
has lots and lots of content and, thus, the slow down happens when constantly copying and recreating thedata.frame
. You already have a length of the number of files (e.g.length(files)
) so precreate thedata.frame
withAlso, you are aiming to constantly save the progress of the object within the
for
loop, which is an I/O bottleneck. I would move:outside the loop.
Lastly, there are several posts on Rcpp Gallery that would be ideal tutorial posts.