I've looked into using NtQueryDirectoryFile instead of FindFirstFile/FindNextFile to speed up directory iteration.
It's a commonly used tactic in software to use NtQueryDirectoryFile, so I implemented it mostly following how git did it (my implementation).
It worked on Wine perfectly and I saw a speedup from 17 to 14 seconds for an operation that iterates through everything (mind you, this takes 2 seconds on Linux so optimization was urgent). However, the problem was that it was failing on Windows, without much explanation on why. It always faults at the NtQueryDirectoryFile routine, so I know it's a problem with the routine (or my usage of it) and not something else.
Here's the GDB and Visual Studio backtrace(sorry for not being able to post images, it requires reputation): gdb backtrace, visual studio backtrace
As you can see, it's R10 that's the problematic register (set to 0 for some reason), and it's moved from gs:[0x188]. From what I can see, this is a strictly reserved area of the TEB and I have no way of knowing what it is.
I have tried some fixes like aligning the buffer to LONGLONG, to no avail.
Is there anyone who can shed some light on this topic? Have I used this incorrectly? I'm completely stuck. I really would not appreciate answers like using FindFirstFileEx with FIND_FIRST_EX_LARGE_FETCH, which has shown not to help.
Thanks to Raymond Chen, I realized you have to link NTDLL.DLL instead of ntoskrnl.exe to use the userspace version of the routine. (this is an answer to close the question in case others view it)