Can someone recommend a fast way to add saturate 32-bit signed integers using Intel intrinsics (AVX, SSE4 ...) ?
I looked at the intrinsics guide and found _mm256_adds_epi16
but this seems to only add 16-bit ints. I don't see anything similar for 32 bits. The other calls seem to wrap around.
This link answers this very question:
https://software.intel.com/en-us/forums/topic/285219
Here's an example implementation: