HTCondor - Partitionable slot not working

241 Views Asked by At

I am following the tutorial on Center for High Throughput Computing and Introduction to Configuration in the HTCondor website to set up a Partitionable slot. Before any configuration I run

condor_status

and get the following output.

I update the file 00-minicondor in /etc/condor/config.d by adding the following lines at the end of the file.

NUM_SLOTS = 1 
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=4
SLOT_TYPE_1_PARTITIONABLE = TRUE

and reconfigure

 sudo condor_reconfig

Now with

condor_status

I get this output as expected. Now, I run the following command to check everything is fine

condor_status -af Name Slotype Cpus

and find [email protected] undefined 1 instead of [email protected] Partitionable 4 61295 that is what I would expect. Moreover, when I try to summit a job that asks for more than 1 cpu it does not allocate space for it (It stays waiting forever) as it should.

I don't know if I made some mistake during the installation process or what could be happening. I would really appreciate any help!

EXTRA INFO: If it can be of any help have have installed HTCondor with the command

curl -fsSL https://get.htcondor.org | sudo /bin/bash -s – –no-dry-run

on Ubuntu 18.04 running on an old p2.xlarge instance (it has 4 cores).

UPDATE: After rebooting the whole thing it seems to be working. I can now send jobs with different CPUs requests and it will start them properly.

The only issue I would say persists is that Memory allocation is not showing properly, for example:

in this case

But in reality it is allocating enough memory for the job (in this case around 12 GB).

If I run again condor_status -af Name Slotype Cpus I still get something I am not supposed to

undefined problem

But at least it is showing the correct number of CPUs (even if it just says undefined).

0

There are 0 best solutions below