TL;DR - After my ipsec tunnel is up and running for sometime, intermittently the iptables rule to MASQUERADE traffic coming from the other end of the tunnel does not work for sometime.
I have a private subnet in AWS, where all traffic needs to be routed through a gateway VM running Centos 7.3 in our data centre. The gateway VM is behind the data centre firewall and NAT'ed behind a public IP. The setup goes:
AWS Subnet (10.10.101.0/24) <- ipsec tunnel -> Gateway VM (10.10.110.245)
I exported the site-to-site VPN configuration from AWS for OpenSwan, and installed ipsec on the gateway VM. The ipsec.conf looks like this:
conn Tunnel1
type=tunnel
authby=secret
auto=start
left=%defaultroute
leftid=<Gateway VM public IP>
leftsubnets={10.10.110.245/32, 0.0.0.0/0,}
right=<AWS IP>
rightsubnet=10.10.101.0/24
ikelifetime=8h
keylife=1h
phase2alg=aes128-sha1;modp1024
ike=aes128-sha1;modp1024
keyingtries=%forever
keyexchange=ike
dpddelay=10
dpdtimeout=30
dpdaction=restart_by_peer
I configure the sysctl per AWS instruction:
net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.accept_source_route = 0
Then configure the firewall to masquerade all packets coming from the AWS subnet:
iptables -t nat -A POSTROUTING -s 10.10.101.0/24 -j MASQUERADE
Finally I start the ipsec tunnel. The tunnel is up and running and initially the machines in AWS subnet can reach out to the internet (ping 8.8.8.8). Tcpdump on the gateway VM (10.10.110.245) shows packets arriving from AWS side and getting correctly masqueraded with the VM's ip address initially.
However, after some time (around 1 hour usually), the gateway VM no longer honour the masquerade rule in the iptables. Tcpdump on the gateway VM (10.10.110.245) shows ICMP packets arriving from AWS subnet, destined for 8.8.8.8, but they do not get masqueraded. Hence no response is coming back from the remote side.
When this happens, the log in /var/log/pluto.log shows the following:
Sep 1 18:51:29.013427: "Tunnel1/2x0" #4: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEV1_ALLOW+IKEV2_ALLOW+SAREF_TRACK+IKE_FRAG_ALLOW+ESN_NO to replace #3 {using i
sakmp#1 msgid:6a19c08b proposal=AES_CBC_128-HMAC_SHA1_96-MODP1024 pfsgroup=MODP1024}
Sep 1 18:51:29.091604: "Tunnel1/2x0" #4: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode {ESP/NAT=>0x694e0f34 <0xb094bfa2 xfrm=AES_CBC_128-HMAC_SHA1_96 NATOA=none NATD=<AWS_IP>:4500 DPD=active}
Sep 1 18:54:58.974795: "Tunnel1/1x0" #5: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP+IKEV1_ALLOW+IKEV2_ALLOW+SAREF_TRACK+IKE_FRAG_ALLOW+ESN_NO to replace #2 {using isakmp#1 msgid:53ff8204 proposal=AES_CBC_128-HMAC_SHA1_96-MODP1024 pfsgroup=MODP1024}
Sep 1 18:54:59.050329: "Tunnel1/1x0" #5: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode {ESP/NAT=>0x0e9d2c4a <0xe1c78e2c xfrm=AES_CBC_128-HMAC_SHA1_96 NATOA=none NATD=<AWS_IP>:4500 DPD=active}
At 18:54:59, the ping to 8.8.8.8 from AWS subnet stopped getting response.
What could cause this to happen? How do I fix this?
I also noticed that if I create the iptables masquerade rule AFTER starting the ipsec tunnel, the rule never gets honoured and packets do not get masqueraded. I think this might be related?