I'm following the official AWS EKS tutorial on setting up a distributed GPU cluster for Tensorflow model training and am hitting a bit of a snag.
After creating a new cluster using eksctl
and verifying that the corresponding ~/.kube/config
file exists on my gateway node, the tutorial instructs that I download ksonnet
on the gateway node and use it to initialize a new application:
$ ks init <app-name>
When I try running this, however, I receive the following error:
INFO Using context "arn:aws:eks:us-west-2:131397771409:cluster/<cluster name>" from kubeconfig file "/home/ubuntu/.kube/config"
INFO Creating environment "default" with namespace "default", pointing to "version:v1.18.9" cluster at address <cluster address>
ERROR No Major.Minor.Patch elements found
I've done some searching around on Github/SO, but have not been able to find a resolution to this issue. I suspect the true answer is to move away from using ksonnet
, as it is no longer being maintained (and hasn't been for the last 2 years it appears), but for the time being I'd just like to be able to complete the tutorial :)
Any insight is appreciated!
Contents of my ~/.kube/config
:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <certificate>
server: <server>
name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
contexts:
- context:
cluster: arn:aws:eks:us-west-2:131397771409:cluster/<name>
user: arn:aws:eks:us-west-2:131397771409:cluster/<name>
name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
current-context: arn:aws:eks:us-west-2:131397771409:cluster/<name>
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- us-west-2
- eks
- get-token
- --cluster-name
- <name>
command: aws
On the init, you can override the api spec version (that worked for me on that particular step although I got into other issues later on):
Reference
In the end, I made it work with
ks init ${APP_NAME}
(without--api-spec
) in GCP using ksonnet v0.13.1 on old kubeflow (v0.2.0-rc.1) and GKE cluster (1.14.10) versions.BTW, I was in "Kubeflow: End to End" qwiklab from this course.