I am building a SLURM multi-cluster setup, with a slurmdbd hosted on-premises and a slurmctld node in Oracle Cloud. The slurmctld is able to connect to the slurmdbd, but receives this error message when I try to connect to the database in any way:
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
sacct: error: slurmdbd: DBD_GET_JOBS_COND failure: Unspecified error
Looking at the /var/log/slurm/slurmdbd.log file on my slurmdbd cluster, it records this error:
[2022-03-11T08:29:47.541] error: Munge decode failed: Invalid credential
[2022-03-11T08:29:47.541] auth/munge: _print_cred: ENCODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] auth/munge: _print_cred: DECODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: auth_g_verify: REQUEST_PERSIST_INIT has authentication error: Unspecified error
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: Protocol authentication error
[2022-03-11T08:29:47.551] error: CONN:10 Failed to unpack SLURM_PERSIST_INIT message
To ensure that my credentials are valid, I have copied the slurmdbd's MUNGE key to the slurmctld via SCP, ensured that the UID and GID of the slurm and munge users on all nodes are identical, and made sure that the clocks are all in sync. When I munge and unmunge on either server, it successfully decodes the encrypted message. However, when I try to authenticate the credential from one server to the other using the echo foo | ssh user@server munge | unmunge
command, it gives me a response of unmunge: error: invalid credential
. What could I be doing to still receive this response? What should I do to make sure that my credential is valid?