I want to use for a large dataset the fst_table function from the package "fstpackage" found here: https://github.com/fstpackage/fsttable.
devtools::install_github("fstpackage/fsttable")
library(fsttable)
nr_of_rows <- 1e6
x <- data.table::data.table(X = 1:nr_of_rows, Y = LETTERS[1 + (1:nr_of_rows) %% 26])
fst::write_fst(x, "1.fst")
ft <- fst_table("1.fst")
I can extract rows and columns of the created file, however, is it possible to do operations like:
ft[X == 1,]
as in a standard data.table? or can I create a key of this data.table for fast serialization? My goal with this is to extract data using values of the columns without loading all the dataset into the memory.
Original
Unfortunately,
fsttable
only works to load the dataset and select columns/rows. Although in the documentation of the package says:The reality is that regular data.table operations such as the one you mentioned can not be performed (at least with version
0.1.3
). The main reason behind it is that we are in fact not working with a data.table object, but rather with a data.table interface:However, the data from the
fsttable
object can be "pulled" as a vector and then be filtered. Following your example:And then filtered, for example:
I presume there should be an easy way to convert a
fsttable
object to a genuinedata.table
by pulling each variable and then binding all them together.Edit
Actually,
read_fst()
offst
package (available in CRAN, by the same author) has an argument to upload datasets asdata.table
, no need tofsttable
package