I've recently been looking with PIX for Windows at an application using Direct3D 9 for rendering. What I've noticed is that the first operations of a given frame on render targets or textures that wrap them seem to take a very long time. The system is running Windows 7 and is not out of graphics memory. No thrashing should thus be happening. What I find interesting is that operations on 16-bit floating point surfaces take about double the time as on 8-bit integer surfaces.
Anyone have any explanation for this phenomena?
-Timo
In D3D9, the driver architecture is such that resources have to be validated when they are used. This increases the overhead of many API calls, and is part of the reason you should optimize to do more with fewer API calls.
In addition, on older Windows platforms (e.g Windows XP) the D3D driver was completely in kernel-mode, so API calls would invoke a user-mode to kernel-mode context switch (this is not the case in Windows Vista, 7 or 8, which have a user-mode front-end like OpenGL).
In D3D10, resources are validated only when they are created. Likely because D3D10 is layered on top of WDDM, which made the switch from a full kernel-mode to partially user-mode D3D runtime. In WDDM if the D3D runtime crashes it will not cause a kernel panic (BSOD), so validation is not as important. You do not have to be nearly as paranoid about these things when you're running in user-mode.
Now, as for the performance between 8-bit integer and 16-bit fp, this is actually to be expected. Not so much because one is integer and the other is FP (GPUs are great with FP), but because one is twice the size of the other. GPUs have a lot of memory bandwidth, but you can still improve performance simply by using the smallest data type possible.