I have recently acquired the ability to stand-up servers for my application environment from scratch. I've been evaluating Service Fabric as an application orchestrator because of its ability to run executables instead of containers, which is an attractive short term option.
I can now create a working service-fabric cluster through Terraform in AWS. I can create one from nothing, or add servers to the auto-scaling policy and the new servers will join an existing cluster. If I destroy a server, a new server commissioned by the auto scaling group will initialize and rejoin the cluster.
However, I was disappointed to find out that Service Fabric doesn't automatically manage the replacement of seed nodes, even if there are many other servers in the cluster. Destroying all of the seed nodes causes a failure of the entire cluster. I want to be able to destroy any server at any time.
Are there any recommended practices about maintaining a number of seed nodes? I was really hoping Service Fabric would maintain/vote-to-replace missing seed nodes without my intervention, but I can't find any documentation to that effect. The autoscaling group and a scale-in policy could destroy a seed node at any time, and the seed nodes don't automatically replace themselves.
I can manage a minimum number of seed nodes myself by generating cluster manifests and triggering cluster config upgrades. I was really hoping for an automatically managed alternative, though.
Can anyone provide any insight?
The reason for such behavior is that seed nodes aka primary nodes are used by the Service Fabric cluster services. By this I mean that services like ClusterManagerService, ImageStoreService, NamingService etc. are run only on primary nodes, so removing primary nodes causes failure of the cluster infrastructure services.
Please check this and this for more details about primary nodes, SKU and how this can be managed.