my tasks:
- load from the database matrices whose dimension is bigger than my
RAM by using (
pandas.read_sql(...)
- database ispostresql
) - operate on the
numpy
representation of such matrices (bigger than my RAM) usingnumpy
the problem: I get a memory error
when even loading the data from the database.
my temporary quick and dirty solution: loop over chunks of the aforementioned data (so importing parts of the data at a time) thus allowing RAM to handle the workload. The issue at play here is speed related. time is significantly higher and before delving into Cython
optimization and the like, I wanted to know whether there were some solutions (either in the forms of data structures like using the library shelving
or the HDF5 format
) to solve the issue