I have a data that is stored in several fixed width files (perhaps not fitting memory). It would be great to be able to read that using arrow
.
p <- 'path_to_my_files'
# p contains: a.txt, b.txt, c.txt,.... all fwf
dic <- fread('fwf_dictionary.csv') # contains two columns varname and lenght indicating the variables (and position) in the fwf file
Is there a read_fwf_arrow(path,dic)
function? I imagine that a combination of read_delim_arrow
(with a never occurring delimiter) with dplyr parsing for each column would be able to do the job, but I don't know how to loop through the variables in dic
to extract each variable.
read_fwf_arrow <- function(path,dic){
p %>% read_delimn_arrow(delim='#') %>%
loop_extracting_cols_that_can_run_on_arrow
}
Right now there is no fixed width reading in arrow. However we can make use of some
readr
functionality and how Datasets work. The process below involves creating an intermediate object, so not reading directly from the fixed width files. Sometimes that's ok, sometime not. Hopefully useful to you though: