Experiencing frequent Redis connection drop off/timeout since 13 December 2PM GMT+8, and getting error messages like:
- RedisException: Redis server 10.X.X.X:6379 went away
- RedisException: Connection timed out
- RedisException: read error on connection to 10.X.X.X:6379
- ErrorException: Redis::get(): send of 43 bytes failed with errno=32 Broken pipe
- ErrorException: Redis::lPush(): send of 6076 bytes failed with errno=32 Broken pipe
Steps to reproduce: It happens intermittently on certain connections to Redis. From stack trace it doesn't looks like an application bug/error.
Other information (workarounds you have tried, documentation consulted, etc):
PHP Laravel application running on GKE autopilot pods, connecting to redis using php-redis driver. No issues connecting to Redis before the issue occurred. No new deployments or code changes in the past 4 days.
Checked Redis servers are all healthy with >60% buffer between actual usage and max CPU and memory. GKE workloads are also with reasonable buffer of CPU and memory.
Tried redeploying application/restarting Pods in GKE but the same problem persists.
Occasionally experience high latency when using redis-cli on GKE pods to connect to Redis manually. Took 4-5s just to get connected which is abnormal.
Suspect it could be either:
- GKE cluster problem
- GKE network connectivity problem to Redis
- Redis memorystore problem
Connectivity issues indicate access to the instance is blocked (egress firewall rules, wrong VPC network, networking outage/partition, etc..
Check Below Possible Causes :
*1)Connecting from GKE : You cannot connect to a Memorystore for Redis instance from a GKE cluster without VPC-native/IP aliasing enabled. It is easiest to enable VPC-native/IP aliasing during cluster creation. When creating your cluster, select VPC Native under advanced options. Please see Creating VPC-native clusters using Alias IPs for more information.
2)Connecting from a different VPC Network : The instance is only reachable from within that network(Authorized or Default Network Only). Verify that you are connecting from the same VPC network the instance was provisioned in.
You may have a common misconception that the instance resides in the user project and obviously it's not true because the instance actually resides in the Google tenant project and was made available to the user project via VPC peering. You can't connect to the instance from another VPC even if it's peered to the provisioned VPC network because transitive peering is not supported.
3)Network Peering deleted : Internally, Memorystore Redis runs Redis in a VM created in a tenant project (owned by Memorystore Redis), and uses VPC network peering to allow customers to connect.
Check you may deleted the VPC network peering for the network. If you deleted, the simplest solution is to create another instance using the same authorized_network, which will re-establish the peering. Once that's done you can delete that instance.
4)Egress firewall rules : Creation of firewall rules in your project is not necessary. Verify that you have not created any egress firewall rules in their customer project that are blocking traffic to the instance's private IP endpoint.
5)Connecting from GCE : No special configuration should be required to connect to Redis from a GCE VM, provided it is created in the same VPC network and region as the instance.
6)Connecting from On-premise : Accessing a Redis instance from on-premise networks using VPN is supported with Private_Service_Access ConnectMode only.
7)VPC Service Control : Check Service project and host project not in the same VPC service control perimeter.
Check below Intermittent connectivity issues :
1)Verify there are no instances down.
2)Check Memorystore Redis issues
3)Search tenant project logs for anything suspicious (dropped packets, timeouts, etc.).