How can I get a progress information from TBB
parallel_for ?
tbb::parallel_for(tbb::blocked_range<size_t>(0,1000),classA);
Rick's suggestion of using concurrent_unordered_map is a good one. Here is another way, which is essentially the same idea at a high level, but uses other TBB mechanisms in order to avoid dealing with explicit thread ids.
The zero_allocator is necessary here to close a timing hole between allocation and initialization of an element in concurrent_vector.
#include <tbb/tbb.h>
typedef size_t ProgressType;
typedef tbb::atomic<ProgressType> ProgressCounter;
tbb::enumerable_thread_specific<ProgressCounter> LocalCounters;
// zero_allocator is essential here.
tbb::concurrent_vector<ProgressCounter*, tbb::zero_allocator<ProgressCounter*> > LocalCounterPointers;
void AddToProgress(ProgressType delta) {
bool exists;
auto& i = LocalCounters.local(exists);
i += delta;
if( !exists )
// First time we've seen this local counter.
LocalCounterPointers.push_back(&i);
}
ProgressType GetProgress() {
ProgressType sum = 0;
size_t n = LocalCounterPointers.size();
for( size_t i=0; i<n; ++i )
// "if" deals with timing hold where slot in LocalCounterPointers was allocated but not initialized.
if( auto* j = LocalCounterPointers[i] )
sum += *j;
return sum;
}
// Can be called asynchronously.
void ClearProgress() {
size_t n = LocalCounterPointers.size();
for( size_t i=0; i<n; ++i )
// "if" deals with timing hold where slot in LocalCounterPointers was allocated but not initialized.
if( auto* j = LocalCounterPointers[i] )
*j = 0;
}
// Demo code
#include <iostream>
int main() {
ClearProgress();
tbb::parallel_for( tbb::blocked_range<int>(0, 1000),
[&]( tbb::blocked_range<int> r ) {
for( int i=r.begin(); i!=r.end(); ++i ) {
AddToProgress(1);
std::cout << "progress = " << GetProgress() << std::endl;
}
}
);
}
If you only need to count how many iterations were executed to the moment, a simple solution could be to use a global atomic counter:
However if the amount of work per iteration is small and HW concurrency is big, atomic increments of a shared variable can add noticeable overhead. For example, I would be careful with this method on Intel's Xeon Phi coprocessors.