Multithread performance drops down after a few operations

143 Views Asked by At

I encountered this weird bug in a c++ multithread program on linux. The multithreaded part basically executes a loop. One single iteration first loads a sift file containing some features. And then it queries these features against a tree. Since I have a lot of images, I used multiple threads to do this querying. Here is the code snippets.

struct MultiMatchParam
{
    int thread_id;
    float *scores;
    double *scores_d;
    int *perm;
    size_t db_image_num;
    std::vector<std::string> *query_filenames;
    int start_id;
    int num_query;
    int dim;
    VocabTree *tree;
    FILE *file;
};

// multi-thread will do normalization anyway
void MultiMatch(MultiMatchParam &param)
{
    // Clear scores
    for(size_t t = param.start_id; t < param.start_id + param.num_query; t++)
    {
        for (size_t i = 0; i < param.db_image_num; i++)
            param.scores[i] = 0.0;

        DTYPE *keys;
        int num_keys;

        keys = ReadKeys_sfm((*param.query_filenames)[t].c_str(), param.dim, num_keys);

        int normalize = true;
        double mag = param.tree->MultiScoreQueryKeys(num_keys, normalize, keys, param.scores);

        delete [] keys;
    }
}

I run this on a 8-core cpu. At first it runs perfectly and the cpu usage is nearly 100% on all 8 cores. After each thread has queried several images (about 20 images), all of a sudden the performance (cpu usage) drops drastically, down to about 30% across all eight cores.

I doubt the key to this bug is concerned with this line of code.

double mag = param.tree->MultiScoreQueryKeys(num_keys, normalize, keys, param.scores);

Since if I replace it with another costly operations (e.g., a large for-loop containing sqrt). The cpu usage is always nearly 100%. This MultiScoreQueryKeys function does a complex operation on a tree. Since all eight cores may read the same tree (no write operation to this tree), I wonder whether the read operation has some kind of blocking effect. But it shouldn't have this effect because I don't have write operations in this function. Also the operations in the loop are basically the same. If it were to block the cpu usage, it would happen in the first few iterations. If you need to see the details of this function or other part of this project, please let me know.

1

There are 1 best solutions below

1
On

Use std::async() instead of zeta::SimpleLock lock