python bsddb doesnt flush when removing entries

530 Views Asked by At

I got a python BSDDB database. Obviously, it's stored on the hard drive. When I remove some entries, the file on the drive does not get any smaller (consequently - it grows quite fast...)

utDelList   = []
urlsDelList = []
for ut in iter(self.urls2task):

    tmp = string.split(ut, ":")
    uid = tmp[1]
    url = cPickle.loads(self.urls[int(uid)])
    urlsDelList.append(uid)             
    utDelList.append(ut)                
    del self.urlsDepth[uid]
    del self.urlsStatus[uid]
    del url

for ut in utDelList:
    del self.urls2task[ut]

for uid in urlsDelList:
    del self.urls[int(uid)]

(...)
#synchronize all files
self.sync() 

My last hope was to force the flush in a savage way - by closing and opening the files again

#close all files & start them again, eg
self.tasks.close()
self.urls2task.close()
self.tasks = bsddb.rnopen(filepath)
self.urls2task = bsddb.hashopen

the crucial element here is the self.tasks entry; it grows the fastest and biggest of all files. Does pickling-save change anyhow the way of removing it? And, once again - why do the files still keep the entries after removing them? Id be grateful 4any suggestions (first post here :))

3

There are 3 best solutions below

0
On

There is probably no way to get space back from a btree database by itself. The best you can do is db_dump all data in a text file and create a new db with db_load of that file.

0
On

Did you try to use the db.compact() method ?

According to the documentation :

compact(start=None, stop=None, flags=0, compact_fillpercent=0, compact_pages=0, compact_timeout=0)

Compacts Btree and Recno access method databases, and optionally returns unused Btree, Hash or Recno database pages to the underlying filesystem.

The method returns the number of pages returned to the filesystem.

Sounds like it should reduce size of the database on disk

0
On

you should compact your base as described into http://www.jcea.es/programacion/pybsddb_doc/db.html#db-methods