Short version: I get WSA_IO_PENDING when using blocking socket API calls. How should I handle it? The socket has overlapped I/O attribute and set with a timeout.
Long version:
Platform: Windows 10. Visual Studio 2015
A socket is created in a very traditional simple way.
s = ::socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
The socket has by default overlapped I/O attribute enabled. It can be verified with getsockop / SO_OPENTYPE.
- I do need overlapped attribute because I want to use timeout feature, e.g. SO_SNDTIMEO.
- And I would use the socket only in blocking (i.e., synchronous) manner.
- socket read operation runs only within a single thread.
- socket write operation can be performed from different threads synchronized with the mutex.
The socket is enabled with timeout and keep-alive with...
::setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, ...);
::setsockopt(s, SOL_SOCKET, SO_SNDTIMEO, ...);
::WSAIoctl(s, SIO_KEEPALIVE_VALS, ...);
The socket operations are done with
::send(s, sbuffer, ssize, 0);
and
::recv(s, rbuffer, rsize, 0);
I also try to use WSARecv and WSASend with both lpOverlapped
and lpCompletionRoutine
set to NULL.
[MSDN] ... If both lpOverlapped and lpCompletionRoutine are NULL, the socket in this function will be treated as a non-overlapped socket.
::WSARecv(s, &dataBuf, 1, &nBytesReceived, &flags, NULL/*lpOverlapped*/, NULL/*lpCompletionRoutine*/)
::WSASend(s, &dataBuf, 1, &nBytesSent, 0, NULL/*lpOverlapped*/, NULL/*lpCompletionRoutine*/)
The Problem:
Those send / recv / WSARecv / WSASend blocking calls would return error with WSA_IO_PENDING error code!
Questions:
Q0: any reference on overlapped attribute with blocking call and timeout?
How does it behave? in case I have a socket with overlapped "attribute" + timeout feature enable, and just use blocking socket API with "none-overlapped I/O semantics".
I could not find any reference yet about it (e.g. from MSDN).
Q1: is it expected behavior?
I observed this issue (get WSA_IO_PENDING) after migrating code from Win XP/ Win 7 to Win 10.
Here is client code part: (note: the assert is not used in real code, but just describes here that the corresponding error would be handled and a faulty socket will stop the procedure..)
auto s = ::socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
assert(s != INVALID_SOCKET);
timeval timeout;
timeout.tv_sec = (long)(1500);
timeout.tv_usec = 0;
assert(::setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, (const char*)&timeout, sizeof(timeout)) != SOCKET_ERROR);
assert(::setsockopt(s, SOL_SOCKET, SO_SNDTIMEO, (const char*)&timeout, sizeof(timeout)) != SOCKET_ERROR);
struct tcp_keepalive
{
unsigned long onoff;
unsigned long keepalivetime;
unsigned long keepaliveinterval;
} heartbeat;
heartbeat.onoff = (unsigned long)true;
heartbeat.keepalivetime = (unsigned long)3000;
heartbeat.keepaliveinterval = (unsigned long)3000;
DWORD nob = 0;
assert(0 == ::WSAIoctl(s, SIO_KEEPALIVE_VALS, &heartbeat, sizeof(heartbeat), 0, 0, &nob, 0, 0));
SOCKADDR_IN connection;
connection.sin_family = AF_INET;
connection.sin_port = ::htons(port);
connection.sin_addr.s_addr = ip;
assert(::connect(s, (SOCKADDR*)&connection, sizeof(connection)) != SOCKET_ERROR);
char buffer[100];
int receivedBytes = ::recv(s, buffer, 100, 0);
if (receivedBytes > 0)
{
// process buffer
}
else if (receivedBytes == 0)
{
// peer shutdown
// we will close socket s
}
else if (receivedBytes == SOCKET_ERROR)
{
const int lastError = ::WSAGetLastError();
switch (lastError)
{
case WSA_IO_PENDING:
//.... I get the error!
default:
}
}
Q2: How should I handle it?
Ignore it? or just close socket as a usual error case?
From the observation, once I get WSA_IO_PENDING, and if I just ignore it, the socket would become eventually not responsive anymore..
Q3: How about WSAGetOverlappedResult?
does it make any sense?
What WSAOVERLAPPED object should I give? Since there is no such one I use for all those blocking socket calls.
I have tried just create a new empty WSAOVERLAPPED and use it to call WSAGetOverlappedResult. It will eventually return with success with 0 byte transferred.
in
[WSA]GetOverlappedResult
we can only use pointer toWSAOVERLAPPED
passed to I/O request. use any another pointer is senseless. all info about I/O operationWSAGetOverlappedResult
get fromlpOverlapped
(final status, number of bytes transferred, if need wait - it wait on event from this overlapped). in general words - every I/O request must passOVERLAPPED
(IO_STATUS_BLOCK
really) pointer to kernel. kernel direct modify memory (final status and information (usually bytes transferred). because this lifetime ofOVERLAPPED
must be valid until I/O not complete. and must be unique for every I/O request. the[WSA]GetOverlappedResult
check this memoryOVERLAPPED
(IO_STATUS_BLOCK
really) - first of all look for status. if it another fromSTATUS_PENDING
- this mean that operation completed - api take number of bytes transferred and return. if stillSTATUS_PENDING
here -I/O
yet not complete. if we want wait - api usehEvent
from overlapped to wait. this event handle is passed to kernel during I/O request and will be set to signal state when I/O finished. wait on any another event is senseless - how it related to concrete I/O request ? think now must be clear why we can call[WSA]GetOverlappedResult
only with exactly overlapped pointer passed to I/O request.if we not pass pointer to
OVERLAPPED
yourself (for example if we userecv
orsend
) the low level socket api - yourself allocateOVERLAPPED
as local variable in stack and pass it pointer to I/O. as result - api can not return in this case until I/O not finished. because overlapped memory must be valid until I/O not complete (in completion kernel write data to this memory). but local variable became invalid after we leave function. so function must wait in place.because all this we can not call
[WSA]GetOverlappedResult
aftersend
orrecv
- at first we simply have no pointer to overlapped. at second overlapped used in I/O request already "destroyed" (more exactly in stack below top - so in trash zone). if I/O yet not completed - the kernel already modify data in random place stack, when it finally completed - this will be have unpredictable effect - from nothing happens - to crash or very unusual side effects. ifsend
orrecv
return before I/O completed - this will be have fatal effect for process. this never must be (if no bug in windows).how i try explain if
WSA_IO_PENDING
really returned bysend
orrecv
- this is system bug. good if I/O completed by device with such result (despite it must not) - simply some unknown (for such situation) error code. handle it like any general error. not require special processing (like in case asynchronous io). if I/O really yet not completed (aftersend
orrecv
returned) - this mean that at random time (may be already) your stack can be corrupted. effect of this unpredictable. and here nothing can be done. this is critical system error.no, this is absolute not excepted.
first of all when we create file handle we set or not set asynchronous attribute on it: in case
CreateFileW
-FILE_FLAG_OVERLAPPED
, in caseWSASocket
-WSA_FLAG_OVERLAPPED
. in caseNtOpenFile
orNtCreateFile
-FILE_SYNCHRONOUS_IO_[NO]NALERT
(reverse effect compareFILE_FLAG_OVERLAPPED
). all this information stored inFILE_OBJECT
.Flags
-FO_SYNCHRONOUS_IO
(The file object is opened for synchronous I/O.) will be set or clear.effect of
FO_SYNCHRONOUS_IO
flag is next: I/O subsystem call some driver viaIofCallDriver
and if driver returnSTATUS_PENDING
- in caseFO_SYNCHRONOUS_IO
flag set inFILE_OBJECT
- wait in place(so in kernel) until I/O not completed. otherwise return this status -STATUS_PENDING
for caller - it can wait yourself in place, or receiver callback via APC or IOCP.when we use
socket
it internal callWSASocket
-this mean file will be not have
FO_SYNCHRONOUS_IO
attribute and low level I/O calls can returnSTATUS_PENDING
from kernel. but let look howrecv
is worked:internally
WSPRecv
is called withlpOverlapped = 0
. because this -WSPRecv
yourself allocateOVERLAPPED
in stack, as local variable. before do actual I/O request viaZwDeviceIoControlFile
. because file (socket) created withoutFO_SYNCHRONOUS
flag - theSTATUS_PENDING
is returned from kernel. in this caseWSPRecv
look - arelpOverlapped == 0
. if yes - it can not return, until operation not complete. it begin wait on event (internally maintain in user mode for this socket) viaSockWaitForSingleObject
-ZwWaitForSingleObject
. in placeTimeout
used value which you associated with socket viaSO_RCVTIMEO
or 0 (infinite wait) if you not setSO_RCVTIMEO
. ifZwWaitForSingleObject
returnSTATUS_TIMEOUT
(this can be only in case you set timeout viaSO_RCVTIMEO
) - this mean that I/O operation not finished in excepted time. in this caseWSPRecv
calledSockCancelIo
(same effect asCancelIo
).CancelIo
must not return (wait) until all I/O request on file (from current thread) will be completed. after thisWSPRecv
read final status from overlapped. here must beSTATUS_CANCELLED
(but really the concrete driver decide with which status complete canceledIRP
). theWSPRecv
convertSTATUS_CANCELLED
toSTATUS_IO_TIMEOUT
. then callNtStatusToSocketError
for convert ntstatus code to win32 error. saySTATUS_IO_TIMEOUT
converted toWSAETIMEDOUT
. but if still wasSTATUS_PENDING
in overlapped, afterCancelIo
- you gotWSA_IO_PENDING
. only in this case. look like device bug, but i can not reproduce it on own win 10 (may be version play role)what can be do here (if you sure that really got
WSA_IO_PENDING
) ? first try useWSASocket
withoutWSA_FLAG_OVERLAPPED
- in this caseZwDeviceIoControlFile
never returnSTATUS_PENDING
and you never must gotWSA_IO_PENDING
. check this - are error is gone ? if yes - return overlapped attribute and removeSO_RCVTIMEO
call (all this for test - not solution for release product) and check are after this error is gone. if yes - look like device invalid cancel (withSTATUS_PENDING
?!?) IRP. sense of all this - locate where is error more concrete. anyway interesting will be build minimal demo exe, which can stable reproduce this situation and test it on another systems - are this persist ? are only for concrete versions ? if it can not be reproduced on another comps - need debug on your concrete