Asynchronous/Overlapped File Read without kernel overhead?

164 Views Asked by At

I'm not really statisfied with the win32 ReadFile() Overlapped I/O since its really unreliable and has an unpleasent Kernel-Overhead of 8-10 milliseconds in DEBUG configuration and 25-30 milliseconds in RELEASE configuration(Visual Studio) for 400mb, the kernel overhead is proprtional to the size of bytes read... i already reported this: https://developercommunity.visualstudio.com/t/Overlapping-ReadFile-takes-significantly/10475365

steps to reproduce:

  1. create new Project in Visual Studio, and select Console App
  2. copy paste code down below into project
  3. replace “Here could be your file!!!” with a file path to a file with a size greater or equal to the READ_SIZE macro
  4. run in DEBUG mode a few times to read off an average time for the overlapped read file request
  5. run in RELEASE mode a few times to read off an average time for the overlapped read file request
  6. compare the average time…
#include <iostream>
#include <chrono>
#include <Windows.h>

#define READ_SIZE 400000000

int main()
{
    // init
    void* data = new unsigned char[READ_SIZE];

HANDLE file = CreateFileW(L"Here could be your file!!!", // set file path to file with size of READ_SIZE 
        GENERIC_READ,
        NULL,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_NO_BUFFERING | FILE_FLAG_OVERLAPPED,
        NULL);

if (file == INVALID_HANDLE_VALUE)
    {
        std::cout << "failed to CreateFile!\n";
        std::cin.ignore();
        return 1;
    }

// request overlapped read
    OVERLAPPED overlapped{};
    
std::chrono::high_resolution_clock::time_point begin = std::chrono::high_resolution_clock::now();
    ReadFile(file, data, READ_SIZE, NULL, &overlapped);
    if (GetLastError() != ERROR_IO_PENDING)
    {
        std::cout << "failed to request overlapped file read\n";
        std::cin.ignore();
        return 1;
    }
    std::chrono::high_resolution_clock::time_point end = std::chrono::high_resolution_clock::now();
    
int readFile_duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count();

// wait for overlapped read to complete
    begin = std::chrono::high_resolution_clock::now();
    DWORD bytesRead;
    GetOverlappedResult(file, &overlapped, &bytesRead, TRUE);
    end = std::chrono::high_resolution_clock::now();

int getOverlappedResult_duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count();

// print results
    std::cout << readFile_duration + getOverlappedResult_duration << "(ms) for " << bytesRead / 1000000 << "(mb)\n";
    std::cout << readFile_duration << "(ms) to request overlapped file read\n";
    std::cout << getOverlappedResult_duration << "(ms) for overlapped read to complete\n";

// cleanup
    CloseHandle(file);
    std::cin.ignore();
}

however i don't think they are going to fix this very soon, and 8 milliseconds also seems to be a lot of Kernel-Overhead too to me... what i am looking for is some way to directly tell the drivers of an SSD/HDD/M2 etc. what to do, without the kernel slowing it down.

I found that there seems to be a way using IRP's(I/O Request Package):
https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/handling-irps
however in inorder to use those my code must run in kernel mode, which as far as i can tell is a very bad idea.

if you maby know wether there is a way or not for Asynchronous/Overlapped File Read without to much kernel overhead in win32 or maby some platform agnostic way, or you might be able to point me into a new direction, do so please!

1

There are 1 best solutions below

6
Paul Sanders On

From the link that @Stuntman gave you:

File access buffer addresses for read and write operations should be physical sector-aligned, which means aligned on addresses in memory that are integer multiples of the volume's physical sector size.

I don't think you're doing that (and the alignment you are getting might vary between Debug and Release builds).

Depending on the disk, this requirement may not be enforced.

So again, given the above, there might be a secondary effect (i.e. buffer alignment) which is affecting your timings.

But see @rbmm's comment below. You still have an alignment issue, just not the one I thought you had.