I'm constructing a shader program & GL host environment that would do a relatively simple task:
- Run VS -> TCS -> TES -> GS set of shader stages on a number of triangles (patches). TES/TCS do produce additional tesselated triangles.
- Create a simple SSBO consisting of a large enough number of vec4 elements. SSBO will hold the result of tesselation, i.e. coordinates of the tesselated vertices
- During the GS run, GS would write triangle vertices onto the SSBO
in such way that individual triangle's vertices go in the strides of
three, i.e.
(triangle1, vert1)
,(triangle1, vert2)
,(triangle1, vert3)
,(triangle2, vert1)
,(triangle2, vert2)
,(triangle2, vert3)
, etc.
In order to index my SSBO I need to somehow infer the "triangleID". I read the GS/Tess specs carefully and did some experiments. It seems GS built-in inputs just don't have it. gl_PrimitiveIDIn seems to refer to the original index of the triangle as seen in vertex shader and it's not getting incremented during the tessellation.
Finally I came to the idea/workaround of having another SSBO that would have a "primitive counter" that would get incremented with atomicAdd(primCount, 1)
each time GS is executed:
layout ( triangles ) in;
layout ( triangle_strip, max_vertices = 3 ) out;
layout(std140, binding = 1) buffer AuxSSBO {
int primCount;
int lock;
int pad2;
int pad3;
};
layout(std140, binding = 2) writeonly buffer TrianglesSSBO {
writeonly vec4 pos[];
};
void main() {
int idx = atomicAdd(primCount, 1);
for (int i = 0; i < 3; ++i) {
gl_Position = gl_in[i].gl_Position;
gl_PrimitiveID = gl_PrimitiveIDIn;
pos[3 * idx + i] = vec4(gl_Position.xyz, 1.0);
EmitVertex();
}
EndPrimitive();
}
Yesterday I spent tons of time on perfecting the code above as I was getting all sorts of racing conditions on the way. Now, it seems, it does the trick on my GPU/driver, but I'm not entirely sure it's 100% correct.
So my first question is:
- What's the best way to infer the triangle index in my case and is the code above racing-cond free?
My second question is regarding performance:
- I have another implementation of my task, based on transform feedback objects & buffers and I must say it works 4 times faster (2.0 ms vs 0.5 ms per frame). I wonder why SSBO backend is so much slower than TF counterpart and is there something I can do in order to make it run faster?
P.S. full code is here: https://pastebin.com/SagKZEyi
P.S.S Updated code with DrawArraysIndirect() is here: https://gist.github.com/lhog/432af74ba41259e062f18910b5904684
Thanks!