I want to run a SLURM job on a remote cluster to analyze sensitive data stored locally (/data/mydata.csv). The job is defined in job.sh, which executes an analysis script (r_script.R) for data analysis. To avoid uploading sensitive data to the remote server, I attempted to directly load it from the local system pasting an SSH command together within r_script.R,
ssh [email protected] 'cat /data/mydata.csv'
which essentially opens a connection to load the data into memory.
I set up SSH key pairs for passwordless access and wrote a Bash script (job.sh) with a -d flag for specifying the data file location.
This setup works fine when running the script directly,
$ ./job.sh -d [email protected]:/data/mydata.csv
but fails under SLURM,
$ sbatch ./job.sh -d [email protected]:/data/mydata.csv
due to unavailable SSH or SSH keys in SLURM, resulting in a status 255 error. Tried ssh-add -L which indeed fails.
Considering an alternative to wget a password-protected .zip and pass the password with a -p flag for interactive input, I realized this exposes the password in plain text, which is not secure.
I am looking for a method, such as an sbatch flag or a secure file transfer protocol available in SLURM, to securely transfer my sensitive local data to a SLURM job while avoiding direct uploads. How can this be achieved?
I can share scripts for more context if needed. Any insights or suggestions are welcome.