C1, C2,... are callback classes.
They derived from a common interface CBase with the callback CBase::f().
All of them override CBase::f() with final modifier.
I have to register ~50 instance of any class that derived from C1, and ~50 instance of any class that derived from C2.
(see @@ in the below code for example)
Main objective: When I call allF(), C1::f() / C2::f() of every registered instances have to be called.
Here is a simplified version, it works (Full demo) :-
#include <iostream>
#include <vector>
class CBase{
public: virtual void f(){std::cout<<"CBase"<<std::endl;}
};
class C1 : public CBase{
public: virtual void f() final{std::cout<<"C1"<<std::endl;}
};
class C2 : public CBase{
public: virtual void f() final{std::cout<<"C2"<<std::endl;}
};
This is the callback registering :-
//-------- begin registering -----
std::vector<CBase*> cBase;
void regis(CBase* c){
cBase.push_back(c);
}
void allF(){ //must be super fast
for(auto ele:cBase){
ele->f(); //#
}
}
int main() {
C1 a;
C1 b;
C2 c; //@@
//or ... class C2Extend : public C2{}; C2Extend c;
regis(&a);
regis(&b);
regis(&c);
allF(); //print C1 C1 C2
}
Problem
According to the profile result, if I can avoid the v-table cost at #, I would get significant performance gain.
How to do it elegantly?
My poor solution
A possible workaround is : create many arrays to store each CX (Full demo):-
//-------- begin registering -----
std::vector<C1*> c1s;
std::vector<C2*> c2s;
void regis(C1* c){
c1s.push_back(c);
}
void regis(C2* c){
c2s.push_back(c);
}
void allF(){ //must be super fast
for(auto ele:c1s){
ele->f(); //#
}
for(auto ele:c2s){
ele->f(); //#
}
}
int main() {
C1 a;
C1 b;
C2 c;
regis(&a);
regis(&b);
regis(&c);
allF(); //print C1 C1 C2
}
It is very faster.
However, it is not scale well.
After a few development cycle, C3,C4, etc were born.
I have to create std::vector<C3*>,std::vector<C4*>, ... manually
My approach lead to maintainability hell.
More information (edited)
In the worst case, there are at most 20 classes. (C1 to C20)
In real case, C1,C2,... are special type of data-structures.
All of them require special initialization (f()) at a precisely-correct time.
Their instances are constructed at various .cpp.
Thus, an array storage std::vector<CBase*> cBase; caching all of them would be useful.
For example, C1 is map 1:1, C2 is map 1:N, C3 is map N:N.
Together with a custom allocator, I can achieve unearthly data locality.
More note: I don't care about order of callback. (Thank Fire Lancer)
Your "poor solution" starts looking much better when you automate it using templates. Our goal: store
c1s,c2s, etc in a single vector.To do this, we need to map derived types to consecutive integers. A simple way to do that is to use a global counter, and a function template that increments and stores it every time it is instantiated.
The first call to
indexForType<T>()will reserve a new index forT, and return the same one on subsequent calls.Then, we need a way to erase enough information about our callback vectors so we can store them and call the correct
fon them.callwill hold a function that iterates over the pointers, downcasts them and callsf. Just like your solution, this factors out all of the calls to a single type into only one virtual call.CbVeccould holdCBase *instead ofvoid *, but I'll explain that choice later.Now we need a function to populate
groupsupon requesting aGroupfor some type:Here you can see that we use a lambda expression to generate the downcasting functions. The reason I've chosen to store
void *'s instead ofCBase *'s is that the performance-sensitive downcast in there becomes a no-op, while a base-to-derived cast might have required pointer adjustments (and further complications in case of virtual inheritance).Finally, the public API. All of the above has been defined inside
namespace detail_callbacks, and we just need to put the pieces together:And there you go! New derived callbacks are now automatically registered.
See it live on Coliru