load and operate on matrices bigger than RAM - python - numpy - pandas

399 Views Asked by At

my tasks:

  1. load from the database matrices whose dimension is bigger than my RAM by using (pandas.read_sql(...) - database is postresql)
  2. operate on the numpy representation of such matrices (bigger than my RAM) using numpy

the problem: I get a memory error when even loading the data from the database.

my temporary quick and dirty solution: loop over chunks of the aforementioned data (so importing parts of the data at a time) thus allowing RAM to handle the workload. The issue at play here is speed related. time is significantly higher and before delving into Cython optimization and the like, I wanted to know whether there were some solutions (either in the forms of data structures like using the library shelving or the HDF5 format) to solve the issue

0

There are 0 best solutions below