To summarise, my question is: is it possible to decode and play 15 lossily-compressed audio tracks on-the-fly at the same time with under 50ms latency and with no stuttering?


I'm writing a sound library in plain C for a game I'm creating. I'm hoping to have up to 15 audio tracks playing at once with less than 50ms latency.

As of now, the library is able to play raw PCM files (48000Hz packed 16-bit samples), and can easily play 15 sounds at once at 45ms latency without stuttering and with minimal CPU usage. This is on my relatively old Intel Q9300 + SSD machine.

Since raw audio files are huge though, I augmented my library to support playing back OPUS files using opusfile ( I was hoping that I'd still be able to play 15 sounds at once without the audio files taking up 200MB+. How wrong I was - I was only able to play 3 or 4 OPUS tracks at once before I could hear stuttering and other buffer underrun symptoms. CPU usage was also massively increased compared to raw PCM playback.

I also tried including VORBIS support using vorbisfile ( I thought maybe decoding VORBIS on-the-fly wouldn't be as CPU intensive. VORBIS is a little better than OPUS - I can play 5 or 6 sounds at once before stuttering becomes audible (I guess VORBIS is indeed easier to decode) - but this is still nowhere near as good as playing back raw PCM files.

Before I delve into the low-level libvorbis/libopus APIs and investigate other audio compression formats, is it actually feasible to decode and play 15 lossily-compressed audio tracks on-the-fly at the same time with under 50ms latency and with no stuttering on a medium-to-low end desktop computer?

If it helps, my sound library currently calls a function approximately every 15ms which basically does the following (error-handling and post-processing omitted for clarity):

void onBufferUpdateNeeded(int numSounds, struct Sound *sounds,
    uint16_t *bufferToUpdate, int numSamplesNeeded, uint16_t *tmpBuffer) {
    int i, j;
    memset(bufferToUpdate, 0, numSamplesNeeded * sizeof(uint16_t));
    for (i = 0; i < numSounds; ++i) {
        /* Seek to the specified sample number in the already-opened
        file handle. The implementation of this depends on the file
        type (vorbis, opus, raw PCM). */
        seekToSample(sounds[i].fileHandle, sounds[i].currentSample);

        /* Read numSamplesNeeded samples from the file handle into
        tmpBuffer. */
        readSamples(tmpBuffer, sounds[i].fileHandle, numSamplesNeeded);

        /* Add the samples into the buffer. */
        for (j = 0; j < numSamplesNeeded; ++j) {
            bufferToUpdate[j] += tmpBuffer[j];

Thanks in advance for any help!


It sounds like you already know the answer to your own question: NO. Normally, the only advice I would have to questions like these (especially performance-related queries) is to try it and find out if it's possible. But you have already collected that data.

It's true that perceptual/lossy audio codecs tend to be computationally intensive to decode. It sounds like you want to avoid the storage overhead of raw PCM. In that case, if you can safely assume you'll have enough memory reserved for your application, you can decode the audio streams in advance, or employ some caching mechanism to deal with memory constraints. Perhaps this can be offloaded to a different thread (since the Q9300 CPU mentioned in your question is dual core).

Otherwise, you will need to seek out a compressor that has lower computational requirements. You might be interested in FLAC, sponsored by the same organization as Vorbis and Opus. It's lossless, so it won't compress quite as well as the lossy algorithms, but it should be much, much faster to decode.

And if that's still not suitable, browse around on this big list of ~150 audio codecs until you find one that meets your standards. Since you control the client software, you have a lot of choices (vs, e.g., streaming to a web browser).