Delayed audio when capturing process-specific audio loopback interface

132 Views Asked by At

I'm experiencing a 30-50ms delay when playing back the captured audio by the Windows-classic-samples/ApplicationLoopback sample. I noticed this when I played the captured audio over the original process' audio, through the same audio endpoint, at the same time.

I would like to know if the delay I'm experiencing is caused by the loopback capture (meaning there's a significative delay between the original process publishing an audio packet and the capture interface receiving it), and if there's a way to lower it to an unnoticable ammount.

I forked and modified the original Application Loopback sample to be able to play back the captured audio. The code for this modified sample is here: https://github.com/naguileraleal/Windows-classic-samples/tree/main/applicationloopbackaudio

I'll now present my modifications to the original sample and the tests I did to determine what's causing this issue.

Modifications to Application Loopback

Playing back captured audio

In order to determine a delay exists between a process' audio and the captured audio, the latter can be played back over the first one.
To achieve this, I initialized a new Audio Client against the same Audio Endpoint the captured process is using.
https://github.com/naguileraleal/Windows-classic-samples/blob/4310b5ddb06465f2c1a4d6dd004bc8262b4f8033/applicationloopbackaudio/cpp/ApplicationLoopback.cpp#L372C19-L372C19
Then, in the CLoopbackCapture::OnAudioSampleRequested() method, after calling IAudioCaptureClient::GetBuffer, I added calls to IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer. See https://github.com/naguileraleal/Windows-classic-samples/blob/4310b5ddb06465f2c1a4d6dd004bc8262b4f8033/applicationloopbackaudio/cpp/LoopbackCapture.cpp#L460
This plays back each captured packet through the output endpoint, right after being captured.
This callback is called every 10ms. The whole CLoopbackCapture::OnAudioSampleRequested() execution takes 1.5ms at worst, and 1ms on average. This is including the resampling step I'll mention next.

Resampling captured samples

Because the captured sample format is not always compatible with the output Audio Client supported formats, a resampling step is needed between the captured samples and the output samples. This was implemented using Media Foundation. The resampling is performed sequentially, after capturing a packet and before pushing it to the output client's buffer, in CLoopbackCapture::OnAudioSampleRequested() .
See https://github.com/naguileraleal/Windows-classic-samples/blob/4310b5ddb06465f2c1a4d6dd004bc8262b4f8033/applicationloopbackaudio/cpp/LoopbackCaptureBase.cpp#L70 for the implementation of the resampling function.

Size of the IAudioClient buffer

Originally, the buffer of the IAudioClient that makes the capture (aka the capture client ) was 2 seconds long, as stated its initialization in the original sample. I changed this value to zero, since the documentation for IAudioClient::Initialize states that this method ensures the audio buffer is big enough to meet the audio engine's requirements.
When I call IAudioClient::GetBufferSize on this client, it returns 0. Why?
I encountered some undocumented behaviour while calling some methods of the capture client. Calling IAudioClient::GetStreamLatency or IAudioClient::GetDevicePeriod returns "not implemented".

I also did this for the buffer of the IAudioClient that outputs the captured audio to the audio engine (aka the output client). In this case, after initializing it, the call to IAudioClient::GetBufferSize returns a buffer size of 1056 audio frames. GetStreamLatency and GetDevicePeriod return valid values.

Checking the production timestamp of the audio frames

In CLoopbackCapture::OnAudioSampleRequested(), when calling IAudioClient::GetBuffer on the capture client, as I understand from the documentation, the pu64DevicePosition parameter should return the location of the audio packet relative to the beginning of the stream. In my tests, this value is always 0. Why?
On the other hand, the pu64QPCPosition parameter returns a valid value. Comparing the values between successive calls to CLoopbackCapture::OnAudioSampleRequested() shows that there's a ~10ms difference between each packet. Even when I hear a noticable delay between the original and the captured audio!
Meanwhile, the IAudioClient::GetCurrentPadding calls for the output and the capture client both return 0, meaning that the capture client hasn't got any packets to give me, and the output client hasn't got any packets to send to the audio engine. If both of these things are happening, shouldn't I be hearing the latest captured audio? Does this mean that the capture client is slow?

Furthermore, I can deliberately cause the delay to increase by pausing the terminal on which the sample is running. When I pause it (by clicking on the terminal) the captured audio playback stops. When I hit Enter, the captured audio playback resumes, this time with a greater delay. But there's more: The delay is not equal to the time the process has been paused. It varies, and it seems that there's a maximum amount of delay possible. Sometimes, when a delay exists, pausing and resuming makes the delay decrease.
I believe this has to do with the buffer sizes, but I cannot explain this behaviour fully.

Implementing a "Synchronous" Loopback Capture

The LoopbackCaptureSync class implemented in my sample code does the same thing as the CLoopbackCapture class, but it does so without using Media Foundation's work queues, and without waiting for "Sample Ready" events. I was trying to simplify the capture process as much as I could to see the cause of the delay more clearly. Sadly, it did not change a thing.


Any help is much appreciated!

1

There are 1 best solutions below

1
On

This turns out too long to be a comment, so I'll turn it into an answer. But please note this is not a to-the-point answer of your question, just some hints and suggestions.

If memory serves me right, the key to low-latency shared mode audio is IAudioClient3::InitializeSharedAudioStream. In your sample code there's a comment that says "// Cant request IAudioClient2 or IAudioClient3 bc it returns with error".

I don't know much about WIL or the new async stuff, but it looks to me like you try to instantiate IAudioClient3 from the same base object as IAudioClient. That's not how it works. You get your hands on an IAudioClient first, and from that, you QueryInterface for IAudioClient3.

You also mention resampling, which by it's very nature is itself a source of extra latency (however small it may be). You might want to try without resampling first.

And finally, be aware that the wasapi shared-mode mixer adds at least another 3 ms of latency itself. My current understanding of wasapi is that, in the absolute ideal case, with no resampling, you can get loopback down to about 6 ms latency.