Need help about MapViewOfFile performance

93 Views Asked by At

I'm very confused by a strange issue I'm having.

Consider the following code:

    testTimer.Start();

    u64 startPositionInBytes = startPosition * sizeof(DBTransaction);
    u64 chunkSizeInBytes = chunkSize * sizeof(DBTransaction);

    u64 viewOffset = (startPositionInBytes / allocGranularity) * allocGranularity;

    u64 mapViewSize = (startPositionInBytes % allocGranularity) + chunkSizeInBytes;
    if (mapViewSize > (queryThread->FileSize - viewOffset))
        mapViewSize = (queryThread->FileSize - viewOffset);

    u64 dataReadOffset = startPositionInBytes - viewOffset;

    DWORD high = (DWORD)(viewOffset >> 32);
    DWORD low = (DWORD)(viewOffset & 0xFFFFFFFF);
    u8* fileData = (u8*)MapViewOfFile(queryThread->FileMap, FILE_MAP_READ, high, low, mapViewSize);
    if (fileData == nullptr)
    {
        RALogError("Unable to create a file view into mapped file at position %llu (chunk size %llu). Error was: %s. Application can not proceed", startPosition, chunkSize, GetLastSystemError().c_str());
    }
    elapsed = testTimer.Measure();
    queryThread->MapGlobalTime += elapsed;

    DBTransaction* transaction = (DBTransaction*)(fileData + dataReadOffset);

    for (u32 i = 0; i < chunkSize; ++i)
    {
        testTimer.Start();
        //DBTransaction* transaction = (DBTransaction*)pBytes;

        queryThread->TotalRead++;

        elapsed = testTimer.Measure();
        queryThread->CopyTime += elapsed;

        testTimer.Start();
        
        bool allConditionsPassed = true;

        for (u32 j = 0; j < QueryFilters.size(); ++j)
        {
            RAQueryFilter* filter = &QueryFilters[j];
            bool atLeastOneTrue = false;
            for (u32 k = 0; k < filter->Conditions.size(); ++k)
            {
                QueryCondition* condition = &filter->Conditions[k];

                if (condition->Compare(transaction, condition))
                    atLeastOneTrue = true;

                if (atLeastOneTrue)
                    break;
            }

            if (atLeastOneTrue == false)
            {
                allConditionsPassed = false;
                break;
            }
        }
        elapsed = testTimer.Measure();
        queryThread->MatchGlobalTime += elapsed;
        
        testTimer.Start();
        if (allConditionsPassed)
        {
            queryPointer->AddToResult(queryThread, queryPointer, transaction);
            queryThread->TotalMatches++;
        }
        elapsed = testTimer.Measure();
        queryThread->ComputeGlobalTime += elapsed;

        transaction ++;
    }

This is a thread that reads a chunk of file mapped on memory. There are 10 of them reading from different chunks of the same file. If the file is small (2 GB) this code can read the whole thing and find all matches in almost no time.

When the size of the file increases the time required increases as well (which is expected), but it increases exponentially and not linearly as I would expect. Working with a 12GB file (20+ million transactions) the whole process takes 26 seconds which is insane.

I decided to profile the code in the thread and here is my confusion: MapViewOfFile takes a bunch of milliseconds so it´s not the bottleneck. The time is almost entirely spent in the function Compare, but if I comment out this function (all transactions match), the time is exactly the same but it´s spent in AddToResult.

If I also comment out this function and copy the transaction from the buffer into a local variable (and then I add a random counter used elsewhere to avoid the compiler optimizing out the whole code), the time is yet again the same but is spent in the copy.

My guess is that the operation taking the time is actually the dereferencing of the pointer to DBtransaction, whenever it happens first.

I'm not sure about what is going on. Can anyone explain, and provide some suggestions for how to fix the issue?

Here is the code that spawn the threads:

u32 requiredThreads = AvailableThreadsCount;
    u64 transactionsPerThread = (u64)ceil((f32)dataToRead / (f32)requiredThreads);

    // Make sure to not make thread load too small. If the number of transactions is less than 500K, reduce the number of used threads
    if (transactionsPerThread < RA_MINIMUM_THREAD_TRANSACTION_COUNT)
    {
        requiredThreads = (u32)ceil((f32)dataToRead / (f32)RA_MINIMUM_THREAD_TRANSACTION_COUNT);
        transactionsPerThread = RA_MINIMUM_THREAD_TRANSACTION_COUNT;
    }



    // Spawn the threads
    u32 i = 0;
    while (dataToRead > 0)
    {
        u64 chunkSize = dataToRead;
        if (chunkSize > transactionsPerThread)
            chunkSize = transactionsPerThread;

        RA_ASSERT(i < requiredThreads, "Too many threads created!");
        sQueryThread* queryThread = &QueryThread[i];

        #ifdef USE_FILE_MAPPING
        queryThread->Set(startPosition, chunkSize, testReadChunk, TransactionsHistory->FileSizeInBytes, TransactionsHistory->ReadFileMapHandle);
        #else 
        queryThread->Set(startPosition, chunkSize, 0, NULL);
        #endif

        //RALog("Spawning thread %i", i);
        queryThread->Handle = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)DoLinearSearchQueryThread, queryThread, 0, &queryThread->ThreadID);

        i++;
        startPosition += chunkSize;
        dataToRead -= chunkSize;
    }

I basically split the file in chunks and assign one of them to each thread

1

There are 1 best solutions below

0
Pepijn Kramer On

MapViewOfFile uses virtual memory managment so it start trashing. With enough threads accessing different parts of the file the OS is starting to swap pages in and out of memory very quickly. So you really need to make sure that when using multiple threads the data they work on can stay in physical memory to avoid trashing from happening.