Windows7 memory management - how to prevent concurrent threads from blocking

209 Views Asked by At

I'm working on a program consisting of two concurrent threads. One (here "Clock") is performing some computation on a regular basis (10 Hz) and is quite memory-intensive. The other one (here "hugeList") uses even more RAM but is not as time critical as the first one. So I decided to reduce its priority to THREAD_PRIORITY_LOWEST. Yet, when the thread frees most of the memory it has used the critical one doesn't manage to keep its timing.

I was able to condense down the problem to this bit of code (make sure optimizations are turned off!): while Clock tries to keep a 10Hz-timing the hugeList-thread allocates and frees more and more memory not organized in any sort of chunks.

#include "stdafx.h"
#include <stdio.h>
#include <forward_list>
#include <time.h>
#include <windows.h>
#include <vector>

void wait_ms(double _ms)
{
    clock_t endwait;
    endwait = clock () + _ms * CLOCKS_PER_SEC/1000;
    while (clock () < endwait) {}   // active wait
}
void hugeList(void)
{
    SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_LOWEST);
    unsigned int loglimit = 3;
    unsigned int limit = 1000;
    while(true)
    {
        for(signed int cnt=loglimit; cnt>0; cnt--)
        {
            printf(" Countdown %d...\n", cnt);
            wait_ms(1000.0);
        }
        printf(" Filling list...\n");
        std::forward_list<double> list;
        for(unsigned int cnt=0; cnt<limit; cnt++)
            list.push_front(42.0);
        loglimit++;
        limit *= 10;
        printf(" Clearing list...\n");
        while(!list.empty())
            list.pop_front();
    }
}
void Clock()
{
    clock_t start = clock()-CLOCKS_PER_SEC*100/1000;
    while(true)
    {
        std::vector<double> dummyData(100000, 42.0);    // just get some memory
        printf("delta: %d ms\n", (clock()-start)*1000/CLOCKS_PER_SEC);
        start = clock();
        wait_ms(100.0);
    }
}

int main()
{
    DWORD dwThreadId;

    if (CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)&Clock, (LPVOID) NULL, 0, &dwThreadId) == NULL)
        printf("Thread could not be created");
    if (CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)&hugeList, (LPVOID) NULL, 0, &dwThreadId) == NULL)
        printf("Thread could not be created");

    while(true) {;}
    return 0;
}

First of all I noticed that allocating memory for the linked list is way faster than freeing it. On my machine (Windows7) at around the 4th iteration of the "hugeList"-method the Clock-Thread gets significantly disturbed (up to 200ms). The effect disappears without the dummyData-vector "asking" for some memory in the Clock-Thread.

So,

  1. Is there any way of increasing the priority of memory allocation for the Clock-Thread in Win7?
  2. Or do I have to split both operations onto two contexts (processes)?

Note that my original code uses some communication via shared variables which would require for some kind of IPC if I chose the second option.

Note that my original code gets stuck for about 1sec when the equivalent to the "hugeList"-method clears a boost::unordered_map and enters ntdll.dll!RtIInitializeCriticalSection many many times. (observed by systinernals process explorer)

Note that the effects observed are not due to swapping, I'm using 1.4GB of my 16GB (64bit win7).

edit:

just wanted to let you know that up to now I haven't been able to solve my issue. Splitting both parts of the code onto two processes does not seem to be an option since my time is rather limited and I've never worked with processes so far. I'm afraid I won't be able to get to a running version in time.

However, I managed to reduce the effects by reducing the number of memory deallocations made by the non-critical thread. This was achieved by using a fast pooling memory allocator (like the one provided in the boost library). There does not seem to be the possibility of explicitly creating certain objects (like e.g. the huge forward list in my example) on some sort of threadprivate heap that would not require synchronisation.

For further reading:

http://bmagic.sourceforge.net/memalloc.html

Do threads have a distinct heap?

Memory Allocation/Deallocation Bottleneck?

http://software.intel.com/en-us/articles/avoiding-heap-contention-among-threads

http://www.boost.org/doc/libs/1_55_0/libs/pool/doc/html/boost_pool/pool/introduction.html

1

There are 1 best solutions below

3
On

Replacing std::forward_list with a std::list, I ran your code on a corei7 4GB machine until 2GB is consumed. No disturbances at all. (In debug build)

P.S

Yes. The release build recreates the issue. I replaced the forward list with an array

double* p = new double[limit];
for(unsigned int cnt=0; cnt<limit; cnt++)
    p[cnt] = 42.0;

and

for(unsigned int cnt=0; cnt<limit; cnt++)
    p[cnt] = -1;
delete [] p;

It does not recreates then. It seems thread scheduler is punishing for asking for lot of small memory chunks.