How to deal with object deletion in continuous allocation?

1k Views Asked by At

I've recently found benefits of Data Oriented Design. It looks very impressive. One of points is grouping data by type and access, not all together in objects, but in arrays, to prevent cache misses and for better processing.

So in game we still have instances and user is able to destroy any of them (not only last in array). I can't figure out how to effectively deal with object deletion in middle of array.

I have one idea: to have isAlive value, but that will will cause quite big impact on number of conditions, because every objects get checked many times in processing, drawing,...

Another idea is to shift whole array to fill space that has to be deleted, but this will consume much resources at deletion.

How can man deal with this in DOD?

So put up the requirements:

  • it has to be array(s) in order to reduce cache misses destribed in DOD
  • it has to have fast random position object deletion, max o(log n)
  • objects can't move since they were created, because they could be referenced at unknown places, so it will cause program misbehavior
2

There are 2 best solutions below

3
On

it depends on the constraints and workload, but one approach is to swap the deleted element with the last element in the array, then remove the deleted element from the end (pop_back). this assumes the order of the array isn't particularly important. another approach is a sparse array, which might work in environments where the memory budget is fixed.

EDIT: if you're maintaining external pointers into the array, they can be managed using smart pointers whose underlying addresses are updated (shared_ptr::reset) when an array element is moved. you would end up with 2 parallel arrays of the same length:

typedef std::vector<thing> thingVec;
thingVec things;

// smart pointers for each object
std::vector<boost::shared_ptr<thingVec::iterator>> references;

in your createThing function, you'd return a shared_ptr with a custom deleter (which would automagically do the array update once all references have been extinguished):

http://www.boost.org/doc/libs/1_51_0/libs/smart_ptr/sp_techniques.html#static

the smart pointers can point to structures with multiple fields that are ultimately stored in different arrays, as long as the custom deleter knows which arrays to compact when an element is deleted.

0
On

It's actually quite easy, don't use direct references. Use a layer of indirection such as IDs. For example:

Let's say you have a FooManager that manages all your Foo "objects" (not objects in the OOP sense, a collection of arrays for each Foo property). As I understand it, what you do right now is just return an index. Let's say Foo #42 is the Foo which data is located at index 42 of all the arrays. Now you want to delete Foo #42, which blows a hole in your array. You could move all other array entries but then Foo #43 no longer points at the actual index 43.

So we add an ID table and instead of returning the index, you return an ID. When you create a new Foo, you add its data to the first free index in the arrays (let's say 42). Then you generate an new unused ID (let's say 1337). Now you can update the ID table (a (hash)map is great for this) to say ID 1337 points to index 42. Now you can return the ID 1337 to the calling function. (Notice how the calling function never finds out where the data is actually stored, that's irrelevant information)

Next time the Foo needs to be updated from another piece of code, the ID is used. So the FooManager's setBar function is called with ID 1337 and the new Bar value as arguments. FooManager looks up 1337 in its ID table, finds out it's still located at index 42 and updates the Bar array index 42 with the new Bar value.

Now this is where this system gets it's value, let's remove Foo 1337. FooManager's removeFoo is called with ID 1337 as argument. It looks up 1337, it's located at index 42. However, in the mean time, 10 new Foos have been added. So what we can now do is just replace the values at index 42 with the values at 52, effectively moving 52 to 42. This would cause a big problem in the old system, but now, we only need to update the index table. So we look up which ID points to 52 (let's say it's 1401) and update it to 42. Next time someone tries to update the Foo with ID 1401, it looks it up in the index table and finds it is located at index 42.

So we've kept the data in continuous memory, a remove only costs a very low number of data copies operations (one copy for each property of Foo) and the code "outside" of FooManager never even realizes something happened. It even solves dead refence issues. Suppose some code still has the deleted 1337 ID (dangling refence, this is bad!), when it tries to access/change it, FooManager looks it up, sees 1337 doesn't exist (any more) and can generate a nice clean warning/error and callstack, which allows you to directly identify which code still has the dangling reference!

There is only one disadvantage, which is an extra lookup in the ID table. Now a hash table can be really fast, but it's still an extra operation for each time you modify a Foo object. However, in most cases, access from outside of a manager happens far less than access inside of a manager. When you have a BulletManager, it will update each bullet every frame, but accessing a Bullet to change/request something and calls to create/remove bullets are faaaar less likely to occur. If this is the other way around though, you should probably update your data structures to optimize for that situation. Then again, in such a situation, it doesn't matter that much any more where data is located in memory so you could live with either "holes" in your arrays or even using completely different data layouts.