I'm working on an OpenGL implementation of the oculus Rift distortion shader. The shader works by taking the input texture coordinate (of a texture containing a previously rendered scene) and transforming it using distortion coefficients, and then using the transformed texture to determine the fragment color.
I'd hoped to improve performance by pre-computing the distortion and storing it in a second texture, but the result is actually slower than the direct computation.
The direct calculation version looks basically like this:
float distortionFactor(vec2 point) {
float rSq = lengthSquared(point);
float factor = (K[0] + K[1] * rSq + K[2] * rSq * rSq + K[3] * rSq * rSq * rSq);
return factor;
}
void main()
{
vec2 distorted = vRiftTexCoord * distortionFactor(vRiftTexCoord);
vec2 screenCentered = lensToScreen(distorted);
vec2 texCoord = screenToTexture(screenCentered);
vec2 clamped = clamp(texCoord, ZERO, ONE);
if (!all(equal(texCoord, clamped))) {
vFragColor = vec4(0.5, 0.0, 0.0, 1.0);
return;
}
vFragColor = texture(Scene, texCoord);
}
where K is a vec4 that's passed in as a uniform.
On the other hand, the displacement map lookup looks like this:
void main() {
vec2 texCoord = vTexCoord;
if (Mirror) {
texCoord.x = 1.0 - texCoord.x;
}
texCoord = texture(OffsetMap, texCoord).rg;
vec2 clamped = clamp(texCoord, ZERO, ONE);
if (!all(equal(texCoord, clamped))) {
discard;
}
if (Mirror) {
texCoord.x = 1.0 - texCoord.x;
}
FragColor = texture(Scene, texCoord);
}
There's a couple of other operations for correcting the aspect ratio and accounting for the lens offset, but they're pretty simple. Is it really reasonable to expect this to outperform a simple texture lookup?
GDDR memory is pretty high latency and modern GPU architectures have plenty of number crunching capabilities. It used to be the other way around, GPUs were so ill-equipped to do calculations that normalization was cheaper to do by fetching from a cube map.
Throw in the fact that you are not doing a regular texture lookup here, but rather a dependent lookup and it comes as no surprise. Since the location you are fetching from depends on the result of another fetch, it is impossible to pre-fetch / efficiently cache (an effective latency hiding strategy) the memory needed by your shader. That is no "simple texture lookup."
What is more, in addition to doing a dependent texture lookup your second shader also includes the
discard
keyword. This will effectively eliminate the possibility of early depth testing on a lot of hardware.Honestly, I do not see why you want to "optimize" the
distortionFactor (...)
function into a lookup. It uses squared length, so you are not even dealing with asqrt
, just a bunch of multiplication and addition.