When should you create more than one docker container image instance with Kubernetes Replication Controller?

683 Views Asked by At

When using Kubernetes to manage your docker containers, particularly when using the replication controller, when should you increase an images running container instances to more than 1? I understand that Kubernetes can spawn as many container replicas as needed in the replication controller configuration file, but why spawn multiple running containers (for the same image) when you can just increase the Compute VM size. I would think, when you need more compute power, go ahead and increase the machine CPU / ram higher, and then only when you reach the max available compute power allowed, approx 32 cores currently at Google, then you would need to spawn multiple containers.

However, it would seem as if spawning multiple containers regardless of VM size would prove more high-availability service, but Kubernetes will respawn failed containers even in a 1 container replication controller environment. So what I can't figure out is, for what reason would I want more than 1 running container (for the same image) for a reason other than running out of VM Instance Compute size?

2

There are 2 best solutions below

0
On BEST ANSWER

I think you laid out the issues pretty well. The two kinds of scaling you described are called "vertical scaling" (increasing memory or CPU of a single instance) and "horizontal scaling" (increasing number of instances).

On availability: As you observed, you can achieve pretty good availability even with a single container, thanks to auto-restart (at the node level or replication controller level). But it can never be 100% because you will always have the downtime associated with restarting the process, either on the same machine or (if the machine failed) on a new machine. In contrast, horizontal scaling (running multiple replicas of the container) allows effectively "zero downtime" from the end-user's perspective, assuming you have some kind of load balancing or failover mechanism in place among the replicas, and your application is written in a way that allows replication.

On scalability: This is highly application-dependent. For example, vertically scaling CPU for a single-threaded application will not increase the workload it can handle, but running multiple replicas of it behind a load balancer (horizontal scaling) will. On the other hand, some applications aren't written in a way that allows them to be replicated, so for those vertical scaling is your only choice. Many applications (especially "cloud native" applications) are amenable to both horizontal and vertical scaling, but the details are application-dependent. Note that once you need to scale beyond the workload that a single node can handle (due to CPU or memory), you have no choice but to replicate (horizontal scaling).

So the short answer to your question is that people replicate for both availability and scalability.

0
On

There are a variety of reasons for why you would scale an application up or down.

The Kubernetes project is looking to provide auto-scaling in the future as a feature to dynamically size up and size down (potentially to 0) a replication controller in response to observed traffic. For a good discussion on auto-scaling, see the following write-up:

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md