I might just be missing something very basic, but please bear with me.
I need to have my Azure Function App connect to an Azure VM that has Redis in it. The following is a simple version of my code with the irrelevant parts removed:
# main.py
import logging
import azure.functions as func
from . import redis_db
def main(req: func.HttpRequest) -> func.HttpResponse:
# Load secrets, etc here
# Then prepare the DB connection
redis_instance = redis_db.Db()
# Other checks here (client auth, etc)
# Do stuff with redis_instance
found = redis_instance.check_exists(mykey)
# redis_db.py
import logging
import redis
from . import constants
class Db:
rd = None
def __init__(self):
global rd
if rd is None:
self.connect()
else:
logging.info("[RD] Reusing connection...")
self.rd = rd
def connect(self):
hostname = os.environ.get("REDIS_HOST")
port = os.environ.get("REDIS_PORT")
password = os.environ.get("REDIS_ACCESS_KEY")
logging.info("[Redis] Connecting...")
self.rd = redis.Redis(
host=hostname, port=port, password=password,
socket_connect_timeout=constants.timeout_seconds, socket_timeout=constants.timeout_seconds,
)
rd = self.rd
return self.rd
def check_exists(self, key):
logging.info(f"[Redis] Checking if code exists for {key} ({type(key)})")
try:
result = self.rd.get(name=key)
logging.info(f"[Redis] Check result: {result}")
except Exception as e:
logging.exception(f"[Redis] Exception: {type(e)} {e}")
return result
# Other redis methods
This works when I run it locally. I have our office VPN enabled, and the Redis VM has the IP in its whitelist. All is good. After deploying the function to Azure (under a premium App Service Plan), our sysadmin ensures that the function is included in the VNet, which should also enable it to communicate with the Redis VM. The function app includes the host, port, etc in its Application Settings, and are Key Vault references (with green check marks).
But all sorts of problems come up here. When I trigger the function, 90% of the time, the function throws a redis.exceptions.TimeoutException — but not all the time. It occasionally responds with a 200 and tells me whether or not my key exists, without issue. I added a "retry" behavior, which clears the rd variable to ensure a new connection is used when a Timeout happens. But the function still times out. Any amount of socket_timeout (and socket_connect_timeout) doesn't affect it, from 5s up to 30s. (When it does work, it returns in 3s or less.)
I tried to SSH into the function app using the Portal's Web SSH feature, but it doesn't have any ping or telnet commands, so I couldn't check. The VM and the Function App appear to be in the same VNet, upon checking.
I added a self.rd.ping() at the end of the __init__() and a try-catch surrounding this call, after learning that connections are not really "loaded" until you actually send a command. The ping timing out confirms that there is a connection issue, but the VNet should have solved this, right? But then why is it inconsistent?
What might be causing this sporadic behavior? Any ideas?