AWS Parallel Cluster software installation

427 Views Asked by At

I am very new to generic HPC concepts, and recently I need to use AWS parallel cluster to conduct some large-scale parallel computation.

I went through this tutorial and successfully build a cluster with the Slurm scheduler. I can successfully log in to the system with ssh. But I got stuck here. I need to install some software but I can't determine how to. Should I do a sudo apt-get install xxx and expect it is installed on every new node instantiated whenever there is a job scheduled? On one hand, it sounds like magic, but on the other hand, are the master node and new nodes initiated sharing the same storage? If so, apt-get install might work as they are using the same file system. It seems the Internet has very little material about it.

To conclude, my question is: if I want to install packages on the cluster I created on AWS, am I able to use sudo apt-get install xxx to do it? Are the new nodes instantiated sharing the same storage as the head node? If so, is it a good practice to do it? If not, what's the right way?

Thank you very much!

1

There are 1 best solutions below

1
On

On a Parallelcluster deployed cluster, the /home directory of the head node is shared by default as an NFS share across all compute nodes. So if you just install your application in the user folder (ec2-user home folder) it will be available to all compute nodes. Once you install your application you could just run your application using the scheduler.

You may have the question next that the /home is limited in space, that's why it is recommended to have an additional shared storage volume that you can attach to the head node during cluster creation this allows you to control the attributes of the shared storage such as size, type etc.. and use it. for more details here is the Parallelcluster documentation around Shared storage configuration section https://docs.aws.amazon.com/parallelcluster/latest/ug/SharedStorage-v3.html

Using an additional shared storage is the recommended way to run your production workloads as you have better control over the storage volume attributes. However for getting started you could just try running from your home folder first.

Thanks