I am attempting to validate ECMP functionality on a linux host with unnumbered interfaces and network namespaces.
The following example can be used to demonstrate:
# add address to loopback for unnumbered veth interfaces
ip addr add 198.51.100.0/32 dev lo
# namespace 1
ip netns add ns1
ip link add veth100 type veth peer name veth101
ip link set veth100 up
ip link set veth101 netns ns1
ip netns exec ns1 ip link set veth101 name eth0
ip netns exec ns1 ip addr add 192.0.2.1/32 dev eth0
ip netns exec ns1 ip link set eth0 up
ip netns exec ns1 ip route add 198.51.100.0/32 dev eth0
ip netns exec ns1 ip route add 0.0.0.0/0 via 198.51.100.0
ip route add 192.0.2.1/32 dev veth100
# namespace 2
ip netns add ns2
ip link add veth200 type veth peer name veth201
ip link set veth200 up
ip link set veth201 netns ns2
ip netns exec ns2 ip link set veth201 name eth0
ip netns exec ns2 ip addr add 192.0.2.2/32 dev eth0
ip netns exec ns2 ip link set eth0 up
ip netns exec ns2 ip route add 198.51.100.0/32 dev eth0
ip netns exec ns2 ip route add 203.0.113.0/32 dev eth0
ip netns exec ns2 ip route add 0.0.0.0/0 via 198.51.100.0
ip route add 192.0.2.2/32 dev veth200
# anycast / ecmp setup
ip netns exec ns1 ip addr add 203.0.113.0/32 dev lo
ip netns exec ns1 ip link set dev lo up
ip netns exec ns2 ip addr add 203.0.113.0/32 dev lo
ip netns exec ns2 ip link set dev lo up
ip route append 203.0.113.0/32 nexthop via 192.0.2.1 weight 100
ip route append 203.0.113.0/32 nexthop via 192.0.2.2 weight 100
I can see that I have two routes in my routing table:
$ ip route show
...
203.0.113.0 via 192.0.2.1 dev veth100 onlink
203.0.113.0 via 192.0.2.2 dev veth200 onlink
...
Ping to 203.0.113.0 works (as expected):
$ ping 203.0.113.0 -c 2
PING 203.0.113.0 (203.0.113.0) 56(84) bytes of data.
64 bytes from 203.0.113.0: icmp_seq=1 ttl=64 time=0.096 ms
64 bytes from 203.0.113.0: icmp_seq=2 ttl=64 time=0.079 ms
--- 203.0.113.0 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1024ms
rtt min/avg/max/mdev = 0.079/0.087/0.096/0.008 ms
I can set either veth100 or veth200 down and achieve fail over. However, the load does not appear to be shared across veth100 and veth200 at the same time. I verified this by tcpdump'ing both veth100 and veth200 at the same time.
Experimenting, I've tried adding the ecmp route this way:
ip route add 203.0.113.0/32 nexthop via 192.0.2.2 weight 10 nexthop via 192.0.2.1 weight 10
The route appears to be installed differently. I'm not sure what the difference is in reality, but it looks different.
$ ip route show
...
203.0.113.0
nexthop via 192.0.2.2 dev veth200 weight 10
nexthop via 192.0.2.1 dev veth100 weight 10
...
But, this still has the same problem as mentioned above.
I'm not sure what next steps to take. What am I doing wrong? Is there any way to achieve ECMP load sharing in this scenario?
If you're only testing with ICMP pings the behaviour is expected. ECMP's 5-tuple hash (sourceIP+sourcePort+destIP+destPort+protocol) can't work with ICMP since it doesn't use port numbers so you'll always hit the same host.
Experiment with multiple UDP and TCP and you should see the load balancing effect since at least source ports should be ephemeral (unlike the destination well-known service ports).
BTW - thanks for spelling out the steps you took since I'm currently experimenting with the same concepts in order to replace the K8S network mess with simple load-balanced, routed, IPv6 only.