Centos 6 write to a socket silently failing?

175 Views Asked by At

I have two processes on a Centos 6 (Linux 2.6.32) system that are talking to each other over an AF_INET/SOCK_STREAM socket. When I stress test the link by blasting enough small packets to fill the socket and then exit the sending process, the receiving process loses the final 3/4 or so of the packets.

As soon as the sending process exits, the receiver's poll() starts returning revents of POLLIN | POLLRDHUP | POLLERR | POLLHUP. At some point, it fails to read the full packet it expects (read() returns a smaller number than the passed-in length), and the following read() returns -1 with errno set to ECONNRESET). It certainly looks to me like it has read all the data in the pipe and there is no more to come.

If I do not exit the sending process after filling the pipe (just go into an endless loop until I kill it by hand), then the receiver gets all the data.

My guess is that means that the sender's write()s are ending up being buffered somewhere, with that buffer getting tossed if it exits, instead of returning a failure. Disabling Nagle (turning on TCP_NODELAY) doesn't change this behavior.

The code that does the write is:

iov[0].iov_base = &len;
iov[0].iov_len = sizeof(uint32_t);
iov[1].iov_base = buf;
iov[1].iov_len = len;
if ((wlen = writev(fd, iov, NELEM(iov))) != (iov[0].iov_len + iov[1].iov_len)) {
    ...  // error handling

(it sends a 32-bit length followed by the data).

Can anyone lend me a clue about what is going on, and how I can reliably know whether my write()s have succeeded?

1

There are 1 best solutions below

1
On

Before closing the writter process, you should close cleanly the socket by a shutdown call:

shutdown(fd, SHUT_WR);

This will act like a bit like if you were flushing the socket.

You can also close the socket, see this question: close vs shutdown socket? for detailed information.