I met very strange situation.
In my program, the sendto()
function returns error code ENETDOWN(Network is down)
even though network is up and ping tryout success.
It's happened only when UDP stream connects to other network through several gateways. It's not always and happened sometimes.
If i run same code under same sub network, there is no error like ENETDOWN
.
So, i trace sendto()
function to Kernel area.
The neigh_hh_output()
function in ip_finish_output2()
of iop_output.c
calls hh->hh_output()
and it returns ENETDOWN
error code.
Under normal operation, hh->hh_output()
function is assigned to dev_queue_xmit()
of dev.c
and packet's sent to network.
When issue was happened, it seems assigned to neigh_blackhole()
function in neigh_destroy()
of neighbour.c
. The neigh_blackhole()
returns -ENETDOWN
code.
But, i don't know when the neigh_destroy()
is called and why that function is called.
I'm struggling with this problem for several weeks.
It is claimed that a neighbor will be deleted for a variety of reasons including the host changed its layer 2 address while retaining its layer 3 address or is no longer reachable. See this. It also can be deleted if the gateway for the neighbor sends an ICMP redirect and the processing of redirects is enabled in the kernel.
If the neighbor is in the process of being deleted, then the packet is dispatched to neigh_blackhole which unconditionally returns
-ENETDOWN
. See the code here.The man page for
sendto()
would lead you to believe that you shouldn't get-ENETDOWN
under such circumstances, but this appears to be incorrect.I would try to get a network capture when this occurs and look for ICMP messages indicating your destination is not reachable or for a change in the MAC address for the destination (or possibly a duplicate IP address) via ARP packets or the MAC addresses on the arriving packets from the destination.