Fail to setup anaconda cluster on bare metal

167 Views Asked by At

My providers.yaml file

bare_metal:
cloud_provider: none
private_key: ~/.ssh/my-private-key

profile.yaml file

name: my_baremetal_profile
machines:
compute:
  - ip:port
head:
  - ip:port
node_id: bare_metal
node_type: bare_metal
num_nodes: 1
provider: bare_metal

user: user

then I start the cluster

acluster create xx --profile my_baremetal_profile

aborted with salt installing error

but ssh succeeded (aka. acluster ssh works)

but installing of salt failed with

FabricException: Needed to prompt for a connection or sudo password (host: ip:port), but input would be ambiguous in parallel mode
[ip:port] out: sudo password:
Fatal error: One or more hosts failed while executing task 'parallel_sudo'

Underlying exception:
Needed to prompt for a connection or sudo password (host: ip:port), but input would be ambiguous in parallel mode

Aborting.

realized the FAQ here http://docs.continuum.io/anaconda-cluster/faq

I have retried several times

======== Update ===================================================

reason of the above error is lack of passwordless sudo, after having fixed this. I still can not install notebook with command

acluster install notebook

got the following error:

FabricException: One or more hosts failed while executing task 'parallel_sudo'

Underlying exception:
sudo() received nonzero return code 2 while executing!

Requested: /opt/anaconda/envs/salt/bin/salt -G "roles:ipython.notebook" state.sls ipython.notebook.status test=True --timeout=60 --out=yaml --state_output=mixed
Executed: sudo -S -p 'sudo password:'  /bin/bash -c  "/opt/anaconda/envs/salt/bin/salt -G \"roles:ipython.notebook\" state.sls ipython.notebook.status test=True --timeout=60 --out=yaml --state_output=mixed"

============================================================================================== Standard output ==============================================================================================

No minions matched the target. No command was sent, no jid was assigned.
{}
ERROR: No return received

=============================================================================================================================================================================================================
1

There are 1 best solutions below

4
On

From the error message you presumably forgot to set up a passwordless ssh connection and a passwordless sudo. Generate an ssh key on your management box with ssh-keygen and add itthe public part of the key to the authorized_keys file of the cluster nodes. As for sudo you will have to edit the sudo file of the nodes. Both of these requirements are written in one or the other of the anaconda cluster documentations.