I have a collection of TPersistent objects. I want to index them by their memory location (as main index) and by their properties (using rtti). A collection may have several property based indexes (based on different properties).
What is the strategy and data structure to solve this problem, so I can delete invalid objects (those that already been destroyed) from the collection without access violation when dealing with property based indexes?
Edit: to clarify things, I am hoping to implement this kind of interfaces (class and methods):
type
TMyItem=class(TPersistent)
public
property HasProp[Name: string]: Boolean read GetHasProp;
property PropValue[Name: string]: Variant read GetPropValue write SetPropValue;
end;
TMyCollection=class(TPersistent)
private
FMainList: TSortedList;
FIndexes : TIndexes;
public
function Add(AItem: TMyItem): Integer;
procedure Extract(AItem: TMyItem);
procedure Delete(AItem: TMyItem);
// and new sorted list based on the given property name to the FIndexes
procedure AddIndex(const APropName: string);
// remove sorted list correspoding to the given property name from FIndexes
procedure RemoveIndex(const APropName: string);
// just to see if the item is in the collection
function Find(AItem: TMyItem): Boolean;
// try to find first item which property specified by APropName has value specified by APropValue
function Find(const APropName: string; const APropValue: Variant; var AItem: TMyItem): Boolean;
end;
The FMainList contains list of pointers to TMyItem instances. Sorted by casting the pointers to NativeInt. So it will be no problem if I'am finding invalid objects. However in property based index, I am sorting TMyItems based on their property value. Therefore EAccessViolation will be raised if I am trying to find entry for invalid TMyItem (one that already been destroyed), since I need to fetch the property value.
My current idea to solve this is to store the position of a TMyItem in each property based index into main record that is stored in FMainList. But this approach also requires updating all positions each time new item is added or removed. This is what I want to avoid. So is there any other better mechanism?
These kind of questions generally lead to the choice between saving or calculating the index, which is synonymous to the choice between speed and memory. It also kind of depends on usage: are those
Find
routines called often?Like you say yourself already: saving each index in a separate array brings all kind of synchronization issues with it, along with a significant extra need for memory.
Personally, I would calculate/get each index on request. Sure, it will take some time to iterate over all the items, but when the count stays, say, under 100K or even a higher number, I am sure it will not suffer from any delay. And when you base the design on
TCollection
like menjaraz comments, you do not have to worry about deleted items: there will be none.Update:
Because you want to search for values in properties with arbitrary names for which the use of RTTI is needed, this iterative task may things slow down a bit. To eliminate that, I wrote this optimization for you. It is based on excluding the items in the search operation which do not have the property. For this I store the property names the collection contains in its items along with to which class they belong. The only limitation is that there can be no duplicate property names, but I suspect you will combine classes with the same property names in a common ancestor anyway. (It ís however possible to add classes with the same property names, as long as the second is inherited from the first.)