I keep getting this error
ERROR:__main__:Exception occured <_InactiveRpcError of RPC that terminated with:status = StatusCode.UNAVAILABLE details = "Socket closed" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed", grpc_status:14}"
I am running a batch job where i have a grpc server and a client
server.py
def create_grpc_server(dir):
max_size = 1024 * 1024 * 1024
ping_interval = int(os.environ.get("GRPC_KEEPALIVE_TIME_MS", "300000"))
options = [
("grpc.max_send_message_length", max_size),
("grpc.max_receive_message_length", max_size),
("grpc.keepalive_time_ms", ping_interval)
]
grpc_max_workers_env: int = int(os.environ.get("GRPC_MAX_WORKERS", "1"))
pool_shutdown_timer: int = int(os.environ.get("POOL_SHUTDOWN_TIMEOUT", 30))
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=grpc_max_workers_env), options=options
)
runner = Runner()
some_pb2_grpc.add_someservicer_to_server(runner, server)
print(f"Server listening on internal port {DEFAULT_PORT}", flush=True)
print(f"Number of initialized gRPC workers {grpc_max_workers_env}", flush=True)
server.add_insecure_port(f"[::]:{DEFAULT_PORT}")
server.start()
I have a aws batch job with lets say arraysize with 4. The grpc works fine for arraysize=0,1,2 but it always fails with the above error on last arraysize of aws batch job. Plus on last arraysize job, it fails midway, I can few of them getting processed properly but all of a sudden the error comes up
- I tried this solution but still the same error