The fst
package http://www.fstpackage.org/fst/ offers multithreaded compression and reading and writing for data frames.
I'm running Bayesian models with brms
that are large and slow. I want to save the results to disk for future re-use. Using saveRDS()
with compress = "xz"
they are ~200MB on disk and, of course, take forever to compress, and a lot of time to read in and decompress.
fst
implements fast, multithreaded zstd compression.
x <- list(
a = mtcars,
b = as.POSIXlt("2021-01-01 14:00:05"))
saveRDS(compress_fst(serialize(x, NULL), compression = 100),
file = "test_fst.RDS")
x2 <- unserialize(decompress_fst(readRDS("test_fst.RDS")))
all.equal(x, x2)
Returns TRUE
and the other quick tests I have done suggest that this all works fine.
Am I missing any downsides or drawbacks to taking an arbitrary R
object, serializing it, passing it to compress_fst()
and then writing the compressed object to disk?