Cluster, Kubernetes, Docker. What to choose for my app?

73 Views Asked by At

My app is a bash script that runs tesseract on GNU parallel. The data I need to process is to the tune of 50GB. It's too slow if I do it one VM. I need the power of cluster computing but I don't want to set up multiples VMs myself, instead I just want to launch my APP (along with the data files) on Google cluster (Kubernetes?). I don't have much clarity about these concepts. If someone can guide that would be great.

1

There are 1 best solutions below

4
On

Might be a challenge to learn all the container orchestration details from scratch when you are just concerned about this one use case.

While GNU Parrellel is nice on a single machine, there don't seem to be many starter kits for using it in distributed mode in the cloud.

I would consider google dataflow rather than spining up a K8S cluster. It allocates and cleans up easily and lets you avoid managing VMs and learning an orchestration framework.