Can't restart Windows Core Audio exclusive mode capture: I am getting too many/too few samples, and they are probably from previous sessions

Question

Can't restart Windows Core Audio exclusive mode capture: I am getting too many/too few samples, and they are probably from previous sessions

118 Views Asked by damix911 At 20 October 2024 at 03:57

This is a weird one; see the updates at the end of the post. Basically the application in question works once, but if you run it again with the same sample rate, it fails. If you run it again with another sample rate, it works again. If you then run it again with the same, it fails. And so on. Changing the sample rate fixes the app for 1 run.

Background

I am trying to capture an exclusive mode 48 kHz audio stream using Windows Core Audio. For now I am not even trying to save the data; I made a console app that records 20 seconds and I want to use it to verify that the numbers that I am getting are reasonable, especially the number of audio frames read. When I use the internal mic of my laptop, the numbers look good.

Capture device: Microphone Array (Realtek(R) Audio)
Capture rate: 47989.2

However I am pretty sure that my app is broken because when I plug in my external USB interface, I get this.

Capture device: Microphone (USB Audio CODEC )
Capture rate: 65654.5

The question

Why is the app attempting to read way more samples that it should? There is no way that's valid data. Is it because the stream was not reset properly since last use? I tried calling IAudioClient->Reset(), to no avail. How do I completely restart a capture session?

Here is the full source code. I am going to go over it step by step at the end of the post, so that you can verify/disprove that my understanding of the API is correct.

Full source code

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <mmdeviceapi.h>
#include <Functiondiscoverykeys_devpkey.h>
#include <Audioclient.h>
#include <string>
#include <cassert>
#include <iostream>

int main()
{
    HRESULT hr;

    hr = CoInitialize(NULL);

    IMMDeviceEnumerator* pEnumerator;
    hr = CoCreateInstance(
        __uuidof(MMDeviceEnumerator),
        NULL,
        CLSCTX_INPROC_SERVER,
        __uuidof(IMMDeviceEnumerator),
        (void**)&pEnumerator
    );
    assert(hr == S_OK);
    IMMDevice* pDevice;
    hr = pEnumerator->GetDefaultAudioEndpoint(eCapture, eConsole, &pDevice);
    assert(hr == S_OK);
    pEnumerator->Release();

    IAudioClient* pAudioClient;
    hr = pDevice->Activate(
        __uuidof(IAudioClient),
        CLSCTX_INPROC_SERVER,
        NULL,
        (void**)&pAudioClient
    );
    assert(hr == S_OK);

    IPropertyStore* pPropertyStore;
    hr = pDevice->OpenPropertyStore(STGM_READ, &pPropertyStore);
    assert(hr == S_OK);
    PROPVARIANT friendlyName;
    hr = pPropertyStore->GetValue(PKEY_Device_FriendlyName, &friendlyName);
    assert(hr == S_OK);
    std::wstring name = std::wstring(friendlyName.bstrVal);
    pPropertyStore->Release();

    std::wcout << "Capture device: " << name << std::endl;

    WAVEFORMATEXTENSIBLE wfex;
    wfex.Format.cbSize = sizeof(WAVEFORMATEXTENSIBLE);
    wfex.Format.wFormatTag = WAVE_FORMAT_EXTENSIBLE;
    wfex.Format.nChannels = 2;
    wfex.Format.nSamplesPerSec = 48000;
    wfex.Format.wBitsPerSample = 16;
    wfex.Format.nBlockAlign = wfex.Format.nChannels * wfex.Format.wBitsPerSample / 8;
    wfex.Format.nAvgBytesPerSec = wfex.Format.nSamplesPerSec * wfex.Format.nBlockAlign;
    wfex.Samples.wValidBitsPerSample = wfex.Format.wBitsPerSample;
    wfex.SubFormat = KSDATAFORMAT_SUBTYPE_PCM;
    wfex.dwChannelMask = 0;
    WAVEFORMATEX* pwfx = reinterpret_cast<WAVEFORMATEX*>(&wfex);
    WAVEFORMATEX* pWfxout = nullptr;
    hr = pAudioClient->IsFormatSupported(
        AUDCLNT_SHAREMODE_EXCLUSIVE,
        pwfx,
        &pWfxout
    );
    assert(hr == S_OK);

    REFERENCE_TIME rtDefaultPeriod, rtMinPeriod;
    hr = pAudioClient->GetDevicePeriod(&rtDefaultPeriod, &rtMinPeriod);
    assert(hr == S_OK);
    double defaultPeriod = (rtDefaultPeriod * 1E2) / 1E9;
    double minPeriod = (rtMinPeriod * 1E2) / 1E9;
    REFERENCE_TIME hnsRequestedDuration = rtMinPeriod * 2;

    hr = pAudioClient->Initialize(
        AUDCLNT_SHAREMODE_EXCLUSIVE,
        0,
        hnsRequestedDuration,
        rtMinPeriod,
        pwfx,
        nullptr
    );
    assert(hr == S_OK);

    IAudioCaptureClient* pCaptureClient;
    hr = pAudioClient->GetService(
        __uuidof(IAudioCaptureClient),
        (void**)&pCaptureClient);
    assert(hr == S_OK);

    LARGE_INTEGER liPerfFreq;
    QueryPerformanceFrequency(&liPerfFreq);
    LARGE_INTEGER liStart;
    QueryPerformanceCounter(&liStart);

    hr = pAudioClient->Start();
    assert(hr == S_OK);

    double startTimeSeconds = 1.0;
    size_t totalFramesCaptured = 0;

    while (true)
    {
        Sleep(int(minPeriod * 1000));

        LARGE_INTEGER liNow;
        QueryPerformanceCounter(&liNow);
        auto elapsedSeconds = double(liNow.QuadPart - liStart.QuadPart) / double(liPerfFreq.QuadPart);

        if (elapsedSeconds > 20.0) {
            auto captureRate = totalFramesCaptured / (elapsedSeconds - startTimeSeconds);
            std::wcout << "Capture rate: " << captureRate;
            break;
        }

        if (elapsedSeconds > startTimeSeconds) {
            BYTE* pData;
            DWORD flags;

            hr = S_OK;

            while (hr == S_OK)
            {
                UINT32 numFramesAvailable;
                hr = pCaptureClient->GetBuffer(
                    &pData,
                    &numFramesAvailable,
                    &flags, NULL, NULL);

                if (hr != S_OK) {
                    hr = pCaptureClient->ReleaseBuffer(numFramesAvailable);
                    break;
                }

                totalFramesCaptured += numFramesAvailable;

                hr = pCaptureClient->ReleaseBuffer(numFramesAvailable);
                assert(hr == S_OK);
            }
        }
    }

    hr = pAudioClient->Stop();
    assert(hr == S_OK);

    CoUninitialize();

    return 0;
}

Code walkthrough

Import some standard headers and open the main() function.

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <mmdeviceapi.h>
#include <Functiondiscoverykeys_devpkey.h>
#include <Audioclient.h>
#include <string>
#include <cassert>
#include <iostream>

int main()
{

Declare the hr variable that I use throughout the code for storing error codes.

    HRESULT hr;

Initialize COM.

    hr = CoInitialize(NULL);

Create an enumerator and get the default capture device.

    IMMDeviceEnumerator* pEnumerator;
    hr = CoCreateInstance(
        __uuidof(MMDeviceEnumerator),
        NULL,
        CLSCTX_INPROC_SERVER,
        __uuidof(IMMDeviceEnumerator),
        (void**)&pEnumerator
    );
    assert(hr == S_OK);
    IMMDevice* pDevice;
    hr = pEnumerator->GetDefaultAudioEndpoint(eCapture, eConsole, &pDevice);
    assert(hr == S_OK);
    pEnumerator->Release();

Print the name of the device.

    IPropertyStore* pPropertyStore;
    hr = pDevice->OpenPropertyStore(STGM_READ, &pPropertyStore);
    assert(hr == S_OK);
    PROPVARIANT friendlyName;
    hr = pPropertyStore->GetValue(PKEY_Device_FriendlyName, &friendlyName);
    assert(hr == S_OK);
    std::wstring name = std::wstring(friendlyName.bstrVal);
    pPropertyStore->Release();
    std::wcout << "Capture device: " << name << std::endl;

Create the audio client.

    IAudioClient* pAudioClient;
    hr = pDevice->Activate(
        __uuidof(IAudioClient),
        CLSCTX_INPROC_SERVER,
        NULL,
        (void**)&pAudioClient
    );
    assert(hr == S_OK);

Create the format. On my system I often need to use WAVEFORMATEXTENSIBLE when the simpler WAVEFORMATEX would do (like in this case of simple stereo PCM) because otherwise device initialization fails (even when the WAVEFORMATEX is supposedly "supported"; weird, but maybe it's an indication of driver problems).

    WAVEFORMATEXTENSIBLE wfex;
    wfex.Format.cbSize = sizeof(WAVEFORMATEXTENSIBLE);
    wfex.Format.wFormatTag = WAVE_FORMAT_EXTENSIBLE;
    wfex.Format.nChannels = 2;
    wfex.Format.nSamplesPerSec = 48000;
    wfex.Format.wBitsPerSample = 16;
    wfex.Format.nBlockAlign = wfex.Format.nChannels * wfex.Format.wBitsPerSample / 8;
    wfex.Format.nAvgBytesPerSec = wfex.Format.nSamplesPerSec * wfex.Format.nBlockAlign;
    wfex.Samples.wValidBitsPerSample = wfex.Format.wBitsPerSample;
    wfex.SubFormat = KSDATAFORMAT_SUBTYPE_PCM;
    wfex.dwChannelMask = 0;
    WAVEFORMATEX* pwfx = reinterpret_cast<WAVEFORMATEX*>(&wfex);
    WAVEFORMATEX* pWfxout = nullptr;

Verify that the format is supported in exclusive mode.

    hr = pAudioClient->IsFormatSupported(
        AUDCLNT_SHAREMODE_EXCLUSIVE,
        pwfx,
        &pWfxout
    );
    assert(hr == S_OK);

Get the device period. Since I want to use exclusive mode, it is my understanding that I mainly care about the "minimum" period.

    REFERENCE_TIME rtDefaultPeriod, rtMinPeriod;
    hr = pAudioClient->GetDevicePeriod(&rtDefaultPeriod, &rtMinPeriod);
    assert(hr == S_OK);
    double defaultPeriod = (rtDefaultPeriod * 1E2) / 1E9;
    double minPeriod = (rtMinPeriod * 1E2) / 1E9;

Initialize the audio client. The buffer size is configured to be large enough to store 2 full periods of data; if I understand correctly, this should allow for the application and audio engine to desynchronize a bit and still avoid glitching out, because the buffer will be on average half full most of the times.

    REFERENCE_TIME hnsRequestedDuration = rtMinPeriod * 2;

    hr = pAudioClient->Initialize(
        AUDCLNT_SHAREMODE_EXCLUSIVE,
        0,
        hnsRequestedDuration,
        rtMinPeriod,
        pwfx,
        nullptr
    );
    assert(hr == S_OK);

Get the capture client.

    IAudioCaptureClient* pCaptureClient;
    hr = pAudioClient->GetService(
        __uuidof(IAudioCaptureClient),
        (void**)&pCaptureClient);
    assert(hr == S_OK);

Record the performance frequency (this is a standard pattern for high-precision time measurement in Windows).

    LARGE_INTEGER liPerfFreq;
    QueryPerformanceFrequency(&liPerfFreq);

Start the audio client and record the start time.

    LARGE_INTEGER liStart;
    QueryPerformanceCounter(&liStart);
    hr = pAudioClient->Start();
    assert(hr == S_OK);

Time to loop. At startTimeSeconds we are going to start capturing frames. The number of frames captured every iteration is added to totalFramesCaptured.

    double startTimeSeconds = 1.0;
    size_t totalFramesCaptured = 0;

    while (true)
    {

Nap for a bit, to let the audio engine do its work. It should be safe to sleep for a full period, because of the chosen buffer size.

        Sleep(int(minPeriod * 1000));

Get the current elapsed time in seconds.

        LARGE_INTEGER liNow;
        QueryPerformanceCounter(&liNow);
        auto elapsedSeconds = double(liNow.QuadPart - liStart.QuadPart) / double(liPerfFreq.QuadPart);

If 20 seconds have passed, we compute the actual measured capture rate, print it and break.

        if (elapsedSeconds > 20.0) {
            auto captureRate = totalFramesCaptured / (elapsedSeconds - startTimeSeconds);
            std::wcout << "Capture rate: " << captureRate;
            break;
        }

Otherwise, if at least startTimeSeconds have passed, we capture some samples. I start after a second to give the audio engine some "ample" margin to start up and start pumping data and filling the buffer. The code keeps calling GetBuffer() as long as it succeeds and increments the total number of captured frames.

        if (elapsedSeconds > startTimeSeconds) {
            BYTE* pData;
            DWORD flags;

            hr = S_OK;

            while (hr == S_OK)
            {
                UINT32 numFramesAvailable;
                hr = pCaptureClient->GetBuffer(
                    &pData,
                    &numFramesAvailable,
                    &flags, NULL, NULL);

                if (hr != S_OK) {
                    hr = pCaptureClient->ReleaseBuffer(numFramesAvailable);
                    break;
                }

                totalFramesCaptured += numFramesAvailable;

                hr = pCaptureClient->ReleaseBuffer(numFramesAvailable);
                assert(hr == S_OK);
            }
        }
    }

Done. Stop the client, shutdown COM, and return 0.

    hr = pAudioClient->Stop();
    assert(hr == S_OK);

    CoUninitialize();

    return 0;
}

Update 1

Oh wow, this is weird. I made some changes to the app to make it a bit more stable.

I increased the device period and buffer size.

    REFERENCE_TIME rtMinPeriod;
    hr = pAudioClient->GetDevicePeriod(NULL, &rtMinPeriod);
    assert(hr == S_OK);
    REFERENCE_TIME rtPeriod = rtMinPeriod * 10;
    REFERENCE_TIME hnsRequestedDuration = rtPeriod * 2;
    double period = (rtPeriod * 1E2) / 1E9;

    hr = pAudioClient->Initialize(
        AUDCLNT_SHAREMODE_EXCLUSIVE,
        0,
        hnsRequestedDuration,
        rtPeriod,
        pwfx,
        nullptr
    );
    assert(hr == S_OK);

And decreased the app period:

    while (true)
    {
        Sleep(int(period * 1000) / 2);

        ...
    }

I also added Release() calls at the end.

    pCaptureClient->Release();
    pAudioClient->Release();
    pDevice->Release();

Finally, I made the sample rate configurable from user input. This audio interface supports 44100 and 48000.

    std::cout << "Sample rate: " << std::flush;
    UINT32 sampleRate;
    std::cin >> sampleRate;

...

    wfex.Format.wFormatTag = WAVE_FORMAT_EXTENSIBLE;
    wfex.Format.nChannels = 2;
    wfex.Format.nSamplesPerSec = sampleRate;
    wfex.Format.wBitsPerSample = 16;
    wfex.Format.nBlockAlign = wfex.Format.nChannels * wfex.Format.wBitsPerSample / 8;
    wfex.Format.nAvgBytesPerSec = wfex.For

With these changes, a clear pattern is starting to emerge; for starters, now I am getting less samples than expected, instead of too many. Usually about 6000 samples, so way less than 44100 or 48000 which I should be getting. But anyway, the real revelation is that, every time that the user selects the "other" sampling rate, the bug is fixed for a single run of the application. When the application exits after providing the expected result, if I run it again with the same sampling rate I get a bad result, but if I run it again with the "other" sampling rate, it provides a good result.

See the following console output; every C:\ is a separate invocation of the app. It almost looks like the stream remains corrupted across restart of the application, but changing the sampling rate forces some sort of corrupted cache to be cleared and, for 1 time, the app works.

C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 48042.7
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6144.3
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6233.4
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6319.25
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 44100
Capture device: Microphone (USB Audio CODEC )
Capture rate: 44168.7
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 44100
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6145.48
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 44100
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6237.21
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 44100
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6149.16
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 48098.2
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6898.08
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6566.24
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 48000
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6647.94
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 44100
Capture device: Microphone (USB Audio CODEC )
Capture rate: 44159.4
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>CaptureRate
Sample rate: 44100
Capture device: Microphone (USB Audio CODEC )
Capture rate: 6079.4
C:\Users\wowaudioishard\source\repos\Audio\x64\Debug>

Update 2

I discovered that, if I disconnect and reconnect the USB cable of the audio interface, then sometimes I am able to re-run the app with the same sample rate (it seems to work in the 44100 case) and it works.

Update 3

I modified the app to loop over an array of predefined sample rates, and recreate the client and capture session from scratch every time. The same behavior can be reproduced. This is the output of a single run of the app.

Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 48046.9
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6146.26
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6146.21
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6145.34
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 44130.8
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 6147.93
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 6149.65
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 6147.79
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 48060.2
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6143.94
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6144.81
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6146.78
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 44094.8
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 6148.57
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 6144.14
Capture device Microphone (USB Audio CODEC ) at 44100
  Capture rate: 6147.52
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 48048.4
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6147.68
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6146.22
Capture device Microphone (USB Audio CODEC ) at 48000
  Capture rate: 6143.05

As before, every time that we change the sample rate, the app behaves correctly for one iteration; then it messes up the count again until the next sample rate change.

This is so weird! What is going on?

Original Q&A

There are 1 best solutions below

**ChipMeister** · Answer 1

I've ran your 1st version of the code with a Samson Meteor microphone on USB and noticed that the 3ms buffer with the Sleep() makes me miss incoming data (to around 35000 samples). In my case: if I only remove the Sleep(), I get ~48000 samples. I can run your program multiple times with about the same results regarding the samples captured: for both the Debug and Release build (from VS2022 and the command line).

You can try to see what happens if you make it event driven. In my case it's a stable flow - for both microphones.

Can't restart Windows Core Audio exclusive mode capture: I am getting too many/too few samples, and they are probably from previous sessions

There are 1 best solutions below

Related Questions in WINDOWS-CORE-AUDIO

Trending Questions

Popular # Hahtags

Popular Questions