If a GPU can do N1 single precision operations per second, and N2 double precision operations per second. Is it possible, by mixing (independent) single and double precision operations to achieve N1+N2 total operations per second, or at least something larger than N1 and N2?
On intel/amd CPU, I am pretty sure this is not possible, as both double and single precision share at least some execution resources. But I have no idea if this is true for modern nvidia or amd GPUs.
This question had partly been touched upon in a SuperUser question, where the accepted answer has a fair amount of linkage to external sources, including two talks on using mixed precision arithmetic (this and this). Both of them investigate the use of mixed precision from a correctness standpoint and don't seem to be mainly motivated by performance.
Extending upon that, parametric code that is able to conditionally change some parts of its calculation to use reduced precision (as opposed to classic "doing everything in double") where applicable can yield benefits on both modern AMD and Nvidia GPUs (Intel has yet to reveal such details about their coming GPUs). Data dependency of subsequent operations plays an important role in being able to co-issue operations.
In both cases, writing the code in such a fashion is a necessity, but ultimately one is at the mercy of the compilers to emit such ISA which then the HW (or the driver in NV's case) processes in such a fashion that co-issue of the proper operations happen. Profilers are invaluable in finding out if the magic really did happen under the hood.
Having that said, even if co-issue doesn't happen, FP32 units consume less energy while operating (less bits is less work) and therefore generate less heat, allowing the GPU to maintain boost clocks for longer. Mild performance increases may still be observed regardless of architectural subtleties by not using extra resources when not strictly necessary.