Kubernetes nodes have unreachable routes

1.2k Views Asked by At

I maintain a Kubernetes cluster. The nodes are in an intranet with 10.0.0.0/8 IPs, and the pod network range is 192.168.0.0/16.

The problem is, some of the worker nodes have unreachable routes to pod networks on other nodes, like:

0.0.0.0         10.a.b.65       0.0.0.0         UG    0      0        0 eth0
10.a.b.64       0.0.0.0         255.255.255.192 U     0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.20.0    -               255.255.255.192 !     0      -        0 -
192.168.21.128  -               255.255.255.192 !     0      -        0 -
192.168.22.64   0.0.0.0         255.255.255.192 U     0      0        0 *
192.168.22.66   0.0.0.0         255.255.255.255 UH    0      0        0 cali3859982c59e
192.168.24.128  -               255.255.255.192 !     0      -        0 -
192.168.39.192  -               255.255.255.192 !     0      -        0 -
192.168.49.192  -               255.255.255.192 !     0      -        0 -
...
192.168.208.128 -               255.255.255.192 !     0      -        0 -
192.168.228.128 10.14.170.104   255.255.255.192 UG    0      0        0 tunl0

When I docker exec into the Calico container, the connections to other nodes are reported unreachable in bird:

192.168.108.64/26  unreachable [Mesh_10_15_39_59 08:04:59 from 10.a.a.a] * (100/-) [i]
192.168.112.128/26 unreachable [Mesh_10_204_89_220 08:04:58 from 10.b.b.b] * (100/-) [i]
192.168.95.192/26  unreachable [Mesh_10_204_30_35 08:04:59 from 10.c.c.c] * (100/-) [i]
192.168.39.192/26  unreachable [Mesh_10_204_89_152 08:04:59 from 10.d.d.d] * (100/-) [i]
...

As a result, the pods on the broken nodes almost can't access anything in the cluster.

I've tried to restart a broken node, remove it from cluster, run kubeadm reset, and re-join it. But all remained the same.

What's the possible cause, and how should I fix this? Many thanks in advance.

2

There are 2 best solutions below

2
On

default ip for cluster services like coredns , etc ... is 10.96.0.1 and range 10.0.0.0/8

you should to change the node ip for nodes on the cluster and rejoin them.

or

change the default route from eth0 to tunl0 it depends on your cni network.

if you use calico , give over the network rule and route to calico project.

0
On

Well, I upgraded Docker (19.03.14), Kubernetes (1.19.4) and Calico (3.17.0).

Then I re-created the cluster.

Now it works well.