Manage python structures stored in a file as if they are in memory?

54 Views Asked by At

I want to manage many files in such a way that the file stays on disk and my app work with part of the data.

I have to manage 2 types of files text-files/book-like, cvs-files/time-series. For every file I may generate multiple dimentionally reduced copies, which i want to keep and cache so i dont have to regenerate them.

I can see two ways of doing this:

1. create my own lib that uses mem-mapping
2. use tool as DASK

Dask seem like a good choice, but I can not find a way for the Bag object to iterate in a loop and/or range-access i.e.

for i in bag_obj[2:10] :  .....

bag_obj[5:10]

I can only do .take()

Second is there a way to map a LIST to a file and do list operations as normal list as if it is in memory.


I came up with it , is this the best :

def slice(self, pfrom, pto):
    assert self.bag is not None
    self.bag.take(pto)[pfrom:]

but does not work cause returns computed() value ;(

1

There are 1 best solutions below

0
On

this may be a solution ?

from dask.bag.core import Bag
def slice(self, pfrom, pto): return self.take(pto)[pfrom:]
Bag.slice = slice