I'm writing a 3D graphics library as part of a project of mine, and I'm at the point where everything works, but not well enough.
In particular, my main headache is that my pixel fill-rate is horribly slow -- I can't even manage 30 FPS when drawing a triangle that spans half of an 800x600 window on my target machine (which is admittedly an older computer, but it should be able to manage this . . .)
I ran gprof on my executable, and I end up with the following interesting lines:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.51 9.50 9.50 vSwap
34.86 17.11 7.61 179944 0.04 0.04 grInterpolateHLine
13.99 20.17 3.06 grClearDepthBuffer
<snip>
0.76 21.78 0.17 624 0.27 12.46 grScanlineFill
The function vSwap
is my double-buffer swapping function, and it also performs vsyching, so it makes sense to me that the test program will spend much of its time waiting in there. grScanlineFill
is my triangle-drawing function, which creates an edge list and then calls grInterpolateHLine
to actually fill in the triangle.
My engine is currently using a Z-buffer to perform hidden surface removal. If we discount the (presumed) vsynch overhead, then it turns out that the test program is spending something like 85% of its execution time either clearing the depth buffer, or writing pixels according to the values in the depth buffer. My depth buffer clearing function is simplicity itself: copy the maximum value of a float into each element. The function grInterpolateHLine
is:
void grInterpolateHLine(int x1, int x2, int y, float z, float zstep, int colour) {
for(; x1 <= x2; x1 ++, z += zstep) {
if(z < grDepthBuffer[x1 + y*VIDEO_WIDTH]) {
vSetPixel(x1, y, colour);
grDepthBuffer[x1 + y*VIDEO_WIDTH] = z;
}
}
}
I really don't see how I can improve that, especially considering that vSetPixel
is a macro.
My entire stock of ideas for optimization has been whittled down to precisely one:
- Use an integer/fixed-point depth buffer.
The problem that I have with integer/fixed-point depth buffers is that interpolation can be very annoying, and I don't actually have a fixed-point number library yet. Any further thoughts out there? Any advice would be most appreciated.
You should have a look at the source code to something like Quake - considering what it could achieve on a Pentium, 15 years ago. Its z-buffer implementation used spans rather than per-pixel (or fragment) depth. Otherwise, you could look at the rasterization code in Mesa.