I have a script which produces several intermediate data files which would significantly exceed the maximum number of rows in R (2^31-1). My system is large enough to store the data (e.g. I can store matrices of that size, but not transform them to long format), but I don't know which file formats can appropriately deal with the data. I want to achieve two things simultaneously: (1) store data with more than 2^31 rows and (2) continue using data.table (or similar) functionality while processing the data.
I know that there are methods for achieving (1) like the arrow package, but my understanding is that these file formats then require a whole other way of processing the data, preventing (2). From what I understand the bit64 package cannot be used to 'cheat' R and get index numbers for more rows.
Basically I have written a whole bunch of code already building on data.table functionality, and I would prefer to continue using that instead of rewriting everything. Is there a solution for that?
Sorry no reproducible example (not sure it's appropriate for this question).
Data formats for very large data while preserving data.table functionality
109 Views Asked by Nils R At
0
There are 0 best solutions below
Related Questions in R
- How to make an R Shiny app with big data?
- How do I keep only specific rows based on whether a column has a specific value?
- Likert scale study - ordinal regression model
- Extract a table/matrix from R into Excel with same colors and stle
- How can I solve non-conformable arguments in R netmeta::discomb (Error in B.matrix %*% C.matrix)?
- Can raw means and estimated marginal means be the same ? And when?
- Understanding accumulate function when .dir is set to "backwards"
- Error in if (nrow(peaks) > 0) { : argument is of length zero Calls: CopywriteR ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> Execution ha
- How to increase quality of mathjax output?
- Convert the time intervals to equal hours and fill in the value column
- How to run an R function getpoints() from IPDfromKM package in an R shiny app which in R pops up a plot that utilizes clicks to capture coordinates?
- Replace NA in list of dfs in certain columns and under certain conditions
- R and text on Cyrillic
- The ts() function in R is returning the correct start and frequency but not end value which is 1 and not 179
- TROUBLING with the "DROP_NA" Function
Related Questions in DATA.TABLE
- R Reshape Wide-to-Long without specifying stubs
- How to create a sub-table of another table with specific column entries in R?
- Create a new table with the percentage quantity amount of individual counterparties compared to the total quantity
- Create a new data table with mean of specific column by quarters
- Replacing elements of column names with associated strings
- Rstudio monthly discount factor interpolation for data table
- How to loop data.table function through table
- Is there way to expand rows in a data frame by conditioning on two columns?
- keep value when using data.table::fcase() instead of dplyr::case_when()
- Assitance optimizing a nested loop?
- data.table: Keep original column name when applying a function inside a 'by=variable' statement
- R: get function in a data.table with ifelse criteria
- data.table shift() in v1.15.2 not working when rows are subset in i by column - `DT[i == TRUE, (cols) := shift(), by = col]`
- Find correlation between *values* in columns
- R data table group by and unique values
Related Questions in APACHE-ARROW
- How do I locally host an Apache Arrow Flight server using Go and retrieve in Javascript?
- Alternatives for distinct(.keep_all = TRUE) in arrow?
- R arrow query extremely slow first time, fast thereafter?
- Is there any way to stream to a parquet file in Ruby?
- parquet StreamReader giving blank values for few columns, and correct for another?
- How can I order an arrow2 Chunk by a given column in rust?
- How can I read a reqwest::Response object's bytes_stream() with an implementer of arrow_array::RecordBatchReader?
- how to create a dataframe in Rust so it can be used in DataFusion?
- how to create a polars-arrow `Array` from raw values (`&[u8]`)
- How to group arrow table by column value in C++?
- arrow::open_dataset, hive partitioning, and number-like strings
- One-hot-encoding while loading data with arrow-rs
- SQL query on arrow duckdb workflow in R
- Arrow RecordBatch as Polars DataFrame
- apache arrow - array of variant type
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?