Should i engineer my microservice to handle involuntary disruptions like hardware failure?
Are these disruptions frequent enough to be handled in a service running on AWS managed EKS cluster.
Should i consider some design change in the service to handle the unexpected SIGKILL with methods like persisting the data at each step or will that be considered as over-engineering?
What standard way would you suggest for handling these involuntary disruptions if it is
a) a restful service that responds typically in 1s(follows saga pattern).
b) a service that process a big 1GB file in 1 hour.
There are couple of ways to handle those disruptions. As mentioned here here:
So:
SIGKILL
the load is automatically directed to another Pod. You can read more about this here.I hope I cleared the water a little bit for you, feel free to ask more questions.