I am toying with certain caching algorithm, which is challenging somewhat.
Basically, it needs to allocate lots of small objects (double arrays, 1 to 256 elements), with objects accessible through mapped value, map[key] = array
. time to initialized array may be quite large, generally more than 10 thousand cpu cycles.
By lots I mean around gigabyte in total. objects may need to be popped/pushed as needed, generally in random places, one object at a time. lifetime of an object is generally long, minutes or more, however, object may be subject to allocation/deallocation several times during duration of program.
What would be good strategy to avoid memory fragmentation, while still maintaining reasonable allocate deallocate speed?
I am using C++, so I can use new and malloc. Thanks.
I know there a similar questions on website, Efficiently allocating many short-lived small objects, are somewhat different, thread safety is not immediate issue for me.
my development platform is Intel Xeon, linux operating system. Ideally I would like to work on PPC linux as well, but it is not the most important for me.
Create a slotted allocator:
Allocator is created with many pages of memory, each of equal size (512k, 256k, the size should be tuned for your use).
The first time an object asks this allocator for memory it allocates a page. Allocating a page consists of removing it from the free list (no search, all pages are the same size) and setting the size of objects that will be allocated on this page. Typically this size calculated by taking the requested size and rounding it up to the nearest power of 2. Subsequent allocations of the same size just require a little bit of pointer math and incrementing the number of objects on the page.
Fragmentation is prevented because slots are all the same size and can be refilled on subsequent allocations. Efficiency is maintained (in some cases improved) because there is no memheader per allocation (which makes a big difference when the allocations are small, once allocations become large this allocator starts to waste almost 50% of available memory).
Both allocations and deallocations can be performed in constant time (no searches of the free list for correct slots). The only tricky part about a deallocation is that you don't typically want a memheader preceding the allocation, so you have to figure out the page and index in the page yourself... It's saturday and I haven't had my coffee so I don't have any good advice about doing that, it's easy enough to figure out from the deallocated address, though.
Edit: This answer is a bit long winded. As usual boost has your back.