How could a Delphi 6 TWinControl descendant's WndProc() execute sometimes off the main VCL thread?

1.2k Views Asked by At

I have a Delphi 6 application that is heavily multithreaded. I have a component I created that descends from TWinControl. When I first built it, I used a hidden window and it's WndProc to handle messages, allocated with AllocateHwnd(). Recently I started cleaning up the WndProc's in my code and decided to remove the auxiliary WndProc(). I changed the component to override the base class WndProc() method instead and do its custom windows message handling from there. In that WndProc() I called the inherited handler first and then processed my custom messages (WM_USER offsets), setting the message Result field to 1 if found one of my custom messages and handled it.

One important note. I put a line of code at the top of the WndProc() override that throws an Exception if the current thread id is not the VCL main thread. I wanted to make sure that the WndProc() only executed in the context of the main VCL thread.

After doing this and running my program I ran into something that seems truly bizarre. I ran my program as normal and did various tasks without error. Then, when I went to a TMemo control that resides on the same page as my TWinControl descendant. If I clicked inside that TMemo control the main thread check in my WndProc() override triggered. I had a breakpoint set on it and when I went to the call stack, there was nothing on it above my WndProc() override.

As far as I can tell, and I've double checked, I do not make explicit calls to the WndProc() override. That's not something I'd ever do. But given that my TWinControl component would have been created on the main VCL thread like all the other components, I can't fathom how the WndProc() override would ever execute in the context of a background thread, especially only when a UI action like a mouse click would happen. I understand how my WndProc() is tied to the TMemo control since all child windows hang off the top level window WndProc(), at least that's my understanding. But since all the component windows would have been created on the main VCL thread, then all their message queues should be executing in that context too, right?

So what kind of a situation could I have created to make my WndProc() run, and only sometimes, in the context of a background thread?

2

There are 2 best solutions below

10
On BEST ANSWER

There are two ways a main thread component's WndProc() method could be called in the context of a worker thread:

  1. the worker thread directly calls into the component's WindowProc property, or its Perform() method.

  2. the worker thread has stolen ownership of the component's window through unsafe usage of the TWinControl.Handle property. The Handle property getter is not thread safe. If a worker thread reads from the Handle property at the exact same moment that the main thread is recreating the component's window (TWinControl windows are not persistent - various runtime conditions can dynamically recreate them without affecting the majority of your UI logic), then there exists a race condition that could allow the worker thread to allocate a new window within its own context (and cause the main thread to leak another window). That would cause the main thread to stop receiving and dispatching messages within its context. If the worker thread has its own message loop then it would receive and dispatch the messages instead, thus calling the WndProc() method in the wrong thread context.

I find it odd that no call stack is being produced, though. There should always be some sort of trace available.

Also, make sure the MainThreadId variable (or whatever you are using to track the main thread) is not simply getting corrupted by accident. Make sure its current value is consistent with its initial value from startup.

Another thing you should do is name all of your thread instances in the debugger (this feature was introduced in Delphi 6). That way, when your thread validation gets tripped, the debugger can show you the exact name of the thread context that is calling your WndProc() method (even without a call stack trace), then you can look for bugs in the code for that thread.

2
On

Remy LeBeau' reply contains the explanation of what I did wrong. I am including this update so you can see the tricky details of a concrete case that shows just how subtle an error keeping a reference to a VCL UI control in a background thread can create. Hopefully this information should help you debug your own code.

Part of my application includes a VCL component I created that descends from TCustomControl who in turn descends from TWinControl. It aggregates a socket and that socket creates a background thread for receiving video from an external device.

When an error occurs, that background thread posts a message to a TMemo control for auditing purposes using PostMessage(). That is where I made my mistake because the window handle (HWND) I use with PostMessage() belongs to a TMemo control. The TMemo control resides on the same form as my component.

When a video connection is lost, the socket that services it is closed and destroyed, but it turns out the background thread servicing it has not exited yet. Now when the socket tries to execute an operation on the defunct socket that it has a reference to, it results in a #10038 socket error (operation on a non-socket). This is where the trouble starts.

When it calls PostMessage() with the TMemo's handle, the TMemo is in a state that it has to recreate the handle on demand, the treacherous problem phenomenon that Remy describes. This means the WndProc() in the recreated TMemo window is now executing in the context of the background thread.

This fits all the evidence. Not only do I get the background thread warning in my overridden WndProc() as mentioned above, but anything done in the TMemo window with the mouse causes a stream of #10038 error messages to appear in the TMemo. This is happening because a loosely coupled cyclic condition now exists between the TMemo, the component's overridden WndProc(), and the background thread, since that thread has a GetMessage loop in its Execute() method.

Every time a windows message is posted to the TMemo control, like from mouse movements, etc., it ends up in the background thread's message queue since it currently owns the window behind the TMemo. Since the background thread is trying to exit and it tries to close the socket on the way out, each close attempt generates another #10038 message to be posted to the TMemo, persisting the loop because each PostMessage() is essentially a self-post now.

I have since added a notification method to the object that manages the background thread that the socket calls in its destructor, letting the thread know its going away and that the reference is invalid. I never thought to do that before because the socket shuts the background thread down during destruction, however I don't wait for a termination event from the background thread. An alternative solution of course would be to wait for the background thread to terminate. Note, had I adopted that approach then this scenario would have ended up in a deadlock instead of it resulting in strange behavior with a TMemo control.

[NOTE to Stack Overflow editor - I am adding this detail as a reply instead of modifying the original message so I don't push Remy's answer that contains the solution far down the page.]