Windows must do something to parse the PE header, load the executable in memory, and pass command line arguments to main().
Using OllyDbg I have set the debugger to break on main() so I could view the call stack:

It seems as if symbols are missing so we can't get the function name, just its memory address as seen. However we can see the caller of main is kernel32.767262C4, which is the callee of ntdll.77A90FD9. Towards the bottom of the stack we see RETURN to ntdll.77A90FA4 which I assume to be the first function to ever be called to run an executable. It seems like the notable arguments passed to that function are the Windows' Structured Exception Handler address and the entry point of the executable.
So how exactly do these functions end up in loading the program into memory and getting it ready for the entry point to execute? Is what the debugger shows the entire process executed by the OS before main()?
if you call
CreateProcesssystem internally callZwCreateThread[Ex]to create first thread in processwhen you create thread - you (if you direct call
ZwCreateThread) or system initialize theCONTEXTrecord for new thread - hereEip(i386)orRip(amd64)the entry point of thread. if you do this - you can specify any address. but when you call sayCreate[Remote]Thread[Ex]- how i say - the system fillCONTEXTand it set self routine as thread entry point. your original entry point is saved inEax(i386)orRcx(amd64)register.the name of this routine depended from Windows version.
early this was
BaseThreadStartThunkorBaseProcessStartThunk(in case fromCreateProcesscalled) fromkernel32.dll.but now system specify
RtlUserThreadStartfromntdll.dll. theRtlUserThreadStartusually callBaseThreadInitThunkfromkernel32.dll(except native (boot execute) applications, likesmss.exeandchkdsk.exewhich no havekernel32.dllin self address space at all ).BaseThreadInitThunkalready call your original thread entry point, and after (if) it return -RtlExitUserThreadcalled.SEHfilter. only because this we can callSetUnhandledExceptionFilterfunction. if thread start direct from your entry point, without wrapper - the functional of Top level Exception Filter become unavailable.but whatever the thread entry point - thread in user space - NEVER begin execute from this point !
early when user mode thread begin execute - system insert
APCto thread withLdrInitializeThunkas Apc-routine - this is done by copy (save) threadCONTEXTto user stack and then callKiUserApcDispatcherwhich callLdrInitializeThunk. whenLdrInitializeThunkfinished - we return toKiUserApcDispatcherwhich calledNtContinuewith saved threadCONTEXT- only after this already thread entry point begin executed.but now system do some optimization in this process - it copy (save) thread
CONTEXTto user stack and direct callLdrInitializeThunk. at the end of this functionNtContinuecalled - and thread entry point being executed.so EVERY thread begin execute in user mode from
LdrInitializeThunk. (this function with exactly name exist and called in all windows versions from nt4 to win10)DLL_THREAD_ATTACHnotification ? when new thread in process begin executed (with exception for special system worked threads, likeLdrpWorkCallback)- he walk by loaded DLL list, and call DLLs entry points withDLL_THREAD_ATTACHnotification (of course if DLL have entry point andDisableThreadLibraryCallsnot called for this DLL). but how this is implemented ? thanks toLdrInitializeThunkwhich callLdrpInitialize->LdrpInitializeThread->LdrpCallInitRoutine(for DLLs EP)EXEandntdll.dll.LdrInitializeThunkcallLdrpInitializeProcessfor this job. if very briefly:different process structures is initialized
loading all DLL (and their dependents) to which EXE statically linked - but not call they EPs !
called
LdrpDoDebuggerBreak- this function look - are debugger attached to process, and if yes -int 3called - so debugger receive exception message -STATUS_BREAKPOINT- most debuggers can begin UI debugging only begin from this point. however exist debugger(s) which let as debug process fromLdrInitializeThunk- all my screenshots from this kind debuggerimportant point - until in process executed code only from
ntdll.dll(and may be fromkernel32.dll) - code from another DLLs, any third-party code not executed in process yet.optional loaded shim dll to process - Shim Engine initialized. but this is OPTIONAL
walk by loaded DLL list and call its EPs with
DLL_PROCESS_DETACHTLS Initializations and TLS callbacks called (if exists)
ZwTestAlertis called - this call check are exist APC in thread queue, and execute its. this point exist in all version from NT4 to win 10. this let as for example create process in suspended state and then insert APC call (QueueUserAPC) to it thread (PROCESS_INFORMATION.hThread) - as result this call will be executed after process will be fully initialized, allDLL_PROCESS_DETACHcalled, but before EXE entry point. in context of first process thread.and NtContinue called finally - this restore saved thread context and we finally jump to thread EP